|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] SAX and Pull options: was: Penance for misspent attributes
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Dennis, Thanks for the article pointer, good read. > -----Original Message----- > From: Dennis Sosnoski [mailto:dms@s...] > Sent: 20 May 2002 02:45 > To: Bill de hÓra > Cc: xml-dev@l... > Subject: Re: Penance for misspent attributes > > > SAX is great for generic XML handling - it's easy to hook up a > handler for building a document representation using DOM or > some > other model, for instance. It's very awkward for direct > processing by an > application, > though, and I think autogenerating state machines just add > another layer of complexity. I like to think that autogeneration, done well, encapsulates complexity. > The only real problem with using pull parsers right now is > limited availability. I cite two other problems (maybe just nits) and two other processing options. First problem: event based architectures are likely to become an basis for building application servers, particularly as we stumble into an era of machine to machine XML processing. Apache Axis is a babystep in that direction, possibly a leap if and when it moves to the non-blocking IO available in the 1.4 JDK. The problem with placing pull based /parsing/ on top of event oriented servers is that after working so hard to increase server throughput, you then re-insert the processing bottleneck by virtue of the parsing being the equivalent of blocking requests. The point isn't made against a pull oriented /API/ per se, but if the processing must block, let it block as late as possible, that is, just below the application. One way to deal with this inflection is to insert queues/buffers between the event generation and application layer. This is a pattern sometimes known as half sync, half async and is common enough in operating systems where application level services encapsulate asynchronous interrupts inside processes (who also manage the state). The application client is provided with a higher level view of the data and the need for the developer to manage state in custom data structures is avoided. To some degree, part 2 of your series implements this pattern over SAX events. Second problem: exposing conditional logic based on switch blocks instead of visiting is a lost opportunity. I have this static binding and external iteration: public void processDocument(XmlPullParser xpp) throws XmlPullParserException, IOException { int eventType = xpp.getEventType(); do { if(eventType == xpp.START_DOCUMENT) { System.out.println("Start document"); } else if(eventType == xpp.END_DOCUMENT) { System.out.println("End document"); } else if(eventType == xpp.START_TAG) { processStartElement(xpp); } else if(eventType == xpp.END_TAG) { processEndElement(xpp); } else if(eventType == xpp.TEXT) { processText(xpp); } eventType = xpp.next(); } while (eventType != xpp.END_DOCUMENT); } when I could have had a runtime binding based on the types of the visitor and visitee and internal iteration (presumably the parser is best placed to know the token type) via double dispatch. Essentially, the control code inside the while(true) blocks in PullWrapper is in the wrong place; it results from an implementation detail (typecodes). It's notable that while SAX is not quite a visitor, it doesn't require typecodes to identify events. Also worth mentioning is that XML Pull has 10 events, not 4, i.e. a full coverage of typecodes will look more like this: public void processDocument(XmlPullParser xpp) throws XmlPullParserException, IOException { int eventType = xpp.getEventType(); do { if(eventType == xpp.START_DOCUMENT) { System.out.println("Start document"); } else if(eventType == xpp.END_DOCUMENT) { System.out.println("End document"); } else if(eventType == xpp.START_TAG) { processStartElement(xpp); } else if(eventType == xpp.END_TAG) { processEndElement(xpp); } else if(eventType == xpp.TEXT) { processOther(xpp); } else if(eventType == xpp.CDSECT) { processOther(xpp); } else if(eventType == xpp.ENTITY_REF) { processOther(xpp); } else if(eventType == xpp.IGNORABLE_WHITESPACE) { processOther(xpp); } else if(eventType == xpp.PROCESSING_INSTRUCTION) { processOther(xpp); } else if(eventType == xpp.COMMENT) { processOther(xpp); }else if(eventType == xpp.DOCDECL) { processOther(xpp); } eventType = xpp.next(); } while (eventType != xpp.END_DOCUMENT); } Most of the PULLXML examples I've seen today don't have a default else{} block. Yet it's not hard to imagine futures for the API with typecode creep, such as infoset/psvi extensions. There's potential to use polymorphism, not type codes as seen the DOM in a pull based API (no slight intended toward the DOM designers, who did not have a clean slate to work off to say the least). If XMLPULL was a bespoke framework instead of a proposed public API, replace typecode with polymorphism is one the first refactorings that would come to mind. you could have something like this: XmlPullParser xpp = factory.newPullParser(); xpp.setInput (reader); xpp.accept(new XMLPullVisitorImpl()); where: class XMLPullVisitorImpl() implements XMLPullVisitor { public void visit(Start s); {System.out.println("START_TAG:"+s.getName());} public void visit(StartTag s) {System.out.println("END_TAG:"+s.getName()} public void visit(Text s) {System.out.println("TEXT:"+s.getText()} } instead of this: XmlPullParser xpp = factory.newPullParser(); xpp.setInput (reader); int eventType; while ((eventType = xpp.next()) != xpp.END_DOCUMENT) { if(eventType == xpp.START_TAG) { System.out.println("START_TAG "+xpp.getName()); } else if(eventType == xpp.END_TAG) { System.out.println("END_TAG "+xpp.getName()); } else if(eventType == xpp.TEXT) { System.out.println("TEXT "+xpp.getText()); } } Standardizing on switch blocks just doesn't seem like a good idea when you've got objects available. First processing option: there are other ways to make SAX tractable without pull or visitation and in a lightweight manner. I'm not altogether convinced that event-oriented is such an unintuitive programming style to developers, though I do acknowledge that pull-based might have more traction (the control flow is highly visible, however that might be a warning signal in an OO program, cf the aforementioned switch blocks). The problem is in the associated bookkeeping and cognitive overhead of state management. Essentially application input is chunked, as you say: [[[ However, the framework doesn't eliminate event-driven programming's complexity. That complexity springs from SAX2's divided control-parsing approach: your application passes control to the parser, which then calls handler methods within your code. The handler methods usually need to accumulate data piece-by-piece before they can finally do anything with the data. ]]] Here's a simple outline, that I think provides most of what part 2 of your article requires in terms of SAX scaffolding: class TagManager { public void startTag(Stack s){} public void endTag(Stack s) {} public void textFrag(Stack s) {} ... } class OptionTradeManager implements TagManager { public void startTag(Stack s) { s.push(new OptionTrade()); } } class SymbolManager implements TagManager { public void textFrag(String s, Stack s) { OptionTrade opt = (OptionTrade)s.peek(); opt.setSymbol(append(s)); } public String append(String s) {...} } class TrackingManager implements TagManager { public void startTag(Stack s) { s.push(new Tracking()); } } class TradeHandler extends org.xml.sax.helpers.DefaultHandler { Map m = new HashMap(); Stack s = new Stack(); TagManager tm; public Map initManager() { m.put("option-trade", new OptionTradeManager()); m.put("symbol", new SymbolManager()); m.put("tracking", new TrackingManager()); ... return m; } public TagManager getManager() { return tm; } public void setManager(String s) { tm = (TagManager)m.get(String); } public void startElement( ... ) { setManager( ... ); getManager().startTag(s); } public void characters( char ch[], int start, int len) { getManager().textFrag(new String(ch, start, len), s); } ... } It's not difficult to see how TradeHandler could be factored to a more generally useful object. Second processing option: you could just bind the XML to objects: TradeHistory data = new TradeHistory(new FileReader("stockdata.xml")); OptionTrade[] opt = data.getAll("option-trade"); // or data.getOptionTrades(); I imagine this approach might become popular in web services programming environments and IDEs. The interesting thing nonetheless is that in an event based framework for XML processing, the state management for an application can be declared and then generated using a rules based format or simple mappings from symbols to behaviour (condition action pairs in logic programming speak). That means we can blow out an application's SAX handlers /or/ standardized pull logic. Bill de hÓra -----BEGIN PGP SIGNATURE----- Version: PGP 7.0.4 iQA/AwUBPOmk/uaWiFwg2CH4EQLYVQCg0a25wR4D9s6w7MF3rua3+ziXyH8An3rL dnJGTSToPpSYYki6ggPjWOOO =cwUJ -----END PGP SIGNATURE-----
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








