[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: SAX and parallel processing
Right. In order to process a SAX stream in parallel you have to copy the data in the stream, you can't just "forward" the events. You also have to instantiate a context for each event, including at least the namespaces in scope, the Location info. I didn't mean to imply this would be excessively expensive, just not as lightweight as serially processed SAX. Bob Alan Gutierrez wrote: > * Bob Foster <bob@o...> [2004-12-31 18:03]: >>I have a question, though. What is the guaranteed lifetime of an object >>appearing in a SAX event, like an Attributes object, and any objects >>used to implement it? If, for example, Attributes were implemented as a >>collection of lightweight Attribute objects that were re-used for >>subsequent events, the event data could not be passed directly to >>parallel threads without copying it. (Or by joining at the end of every >>event, which would rather limit the parallelism.) > > > > Xerces recycles Attributes structures for each call to > startElement. > > In my library, I keep a stack of attribute structures. The > attribute structures on the stack are recycled for each element > depth, not actually popped and reallocated. > > I copy over the values in SAX Attributes to an attributes > structure on this stack, but SAX Attributes are all Strings and > in Java Strings are immutable, so this is really a bunch of > pointer assignments (and the adjustment of an array length parameter). > > Not too expensive to keep that stack around. > > (Because of this, I've come to see streaming problems as SAX > connected stacks of elements. If I need to transform a > document, I chain SAX Strategy Handlers. This, rather > than allow a Strategy to fiddle with its stack within > the handler.) > > The characters event is interesting, becuase it is an index into > the parse buffer (in theory, and on Xerces indeed), but a > characters evet is only ever at the top of the stack. I only > ever need one. > > In SAX Strategy, all of the lexemes in the events have a > getImmutable() method that will return an immutable copy (or > return itself it it is immutable) for when a series of events > needs to be recoreded. > > (Not yet implemented, but if one was buffering and releasing > nodes, they could use the mutable lexemes and events to > implement a cache.) > > I need to look harder, but I suspect that the handful of > workhorse SAX ContentHandlers I use, that I get from outside my > library, are probably self contained. Things like DOM4J's > ContentBuilder, and the SAXTransformers of Saxon, via TRAX. > > -- > Alan Gutierrez - alan@e...
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|