[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: parser models
John Cowan <jcowan@r...> wrote: | During a startElement callback, Shemp's client can ask it to start | creating a tree; all SAX events are then consumed until (but not | including) the matching endElement callback, at which time the tree | is available. Could Shemp support an "ignore subtree" option, i.e. consume and throw away all events until the matching endElement callback? This would allow the client to skip over uninteresting subtrees without hassling the details. Which brings me to my long-standing beef with SAX: state maintenance. A lot of the time, what we call state dependence is more accurately context dependence. In SAX, if you split up the event handling in the app among various classes, you have to mess with setHandler() calls explicitly and track the stack at the same time to get this right. This is a pain. In search of a more "natural" idiom, I've been experimenting with a pure push API which supports stack-based delivery of events. It's built around two mutually dependent interfaces that the consumer will have to implement which look like this (many details omitted): interface Element { void gi( String name ) ; // the element type AttList attlist( ) ; // an interface to push attribute info Content content( ) ; } interface Content { Element newChild( ) ; Content endChild( Content child ) ; void text( char[] buf, int len, int off, boolean pcData ) ; void endContent( ) ; } The parser will maintain a stack of deferred Content instances and a "current Content instance", tracking the open element hierarchy. The normal operation goes like this 1. Parser has a start-tag: - calls currContent.newChild() to get an Element instance (which is basically a context sensitive factory-like constructor). - pumps starttag info to this Element instance to let it do its thing (typically, build some application defined object). - calls content() on this to get a new Content instance for the new child. - Stacks current, makes new Content instance current. 2. Parser has data events: - delivers them to current Content instance. 3. Parser has an end-tag: - calls endContent() on current Content instance for its cleanup - pops parent Content instance off stack. - calls endChild(child) on parent, completed child Content instance as an argument to allow parent-child communication and synch. Note that the return value allows the parent to replace itself if needed. This becomes the current Content instance. It sounds more complicated than it actually is. The SAX ContentHandler has been split into two pieces, separating the "constructor" information from the content handling information. This allows you to combine app specific content classes with more generic constructor packages without interference. More importantly, there is the critical endChild(child) call - something missing entirely from the SAX interface. This is where all the state management can take place in a contextually local fashion (as it almost always is in practice). So you get to split up the state machine into appropriate classes/objects also. Cheesy example: public class HtmlTable implements Element, Content { ... Element newChild( ) { return (Element) new HtmlTr( ) ; // sexier constructors possible } Content endChild( Content child ) { if ( child instanceof HtmlTr ) { // etc } return this ; } } public class HtmlTr implements Element, Content { void gi( String name ) { if ( ! "tr".equals( name ) ) throw ScreamAndDieException ; } Content content( ) { return (Content) this ; } // etc } A lot of this is boilerplate code that can also be "hoisted". For deep or deeply recursive structures in the XML, this works very well, I've found.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|