[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: SAX and Pull options: was: Penance for misspent attributes
Bill de hÓra wrote: >>The only real problem with using pull parsers right now is >>limited availability. >> > >I cite two other problems (maybe just nits) and two other >processing options. > > >First problem: event based architectures are likely to become an >basis for building application servers, particularly as we stumble >into an era of machine to machine XML processing. Apache Axis is a >babystep in that direction, possibly a leap if and when it moves to >the non-blocking IO available in the 1.4 JDK. The problem with >placing pull based /parsing/ on top of event oriented servers is >that after working so hard to increase server throughput, you then >re-insert the processing bottleneck by virtue of the parsing being >the equivalent of blocking requests. The point isn't made against a >pull oriented /API/ per se, but if the processing must block, let >it block as late as possible, that is, just below the application. > There are a couple of points I'll comment on in this. The first is that SAX doesn't really function as an event based architecture component because it's relying on the application to give it control in the first place - the application thread is what executes all the parsing, as well as the call-backs to the handler. This is implicit in the SAX specification since it does not address any synchronization issues that would needed if different threads could be used. The second is that the servlet architecture that forms the basis of most application servers is not really extensible to non-blocking IO. The servlet model ties up a thread until all processing of a request is completed, so you may as well have the thread just wait for input if needed. >Second problem: exposing conditional logic based on switch blocks >instead of visiting is a lost opportunity. I have this static >binding and external iteration: > > ... > >when I could have had a runtime binding based on the types of the >visitor and visitee and internal iteration (presumably the parser >is best placed to know the token type) via double dispatch. > I think this kind of misses the point of using a pull parser. Here's the actual main loop I gave in the article for working with a structure consisting of a couple different tyes of elements in the document, each with several child elements: // Main pull parsing loop byte type; while ((type = m_parser.next()) != XmlPullParser.END_DOCUMENT) { // Ignore everything other than a start tag if (type == XmlPullParser.START_TAG) { // Process the start tags we're interested in m_parser.readStartTag(m_startTag); String lname = m_startTag.getLocalName(); if (lname.equals("stock-trade")) { parseStockTrade(); } else if (lname.equals("option-trade")) { parseOptionTrade(); } } If I wanted to process other types of document components in this loop I easily could. In this case my document only used two different types of child elements of the root, and I wasn't concerned about other types of components in the document, so I just look specifically for those two child elements. To process the stock-trade element, which looks like this: <stock-trade> <symbol>SUNW</symbol> <tracking id="7499345"> <time>08:45:19</time> <seller ident="CCC" type="agent"/> <buyer ident="ABT" type="agent"/> <exchange>XA</exchange> </tracking> <price>86.24</price> <quantity>500</quantity> </stock-trade> I have the following code: protected void parseStockTrade() throws IOException, XmlPullParserException { String symbol = parseElementContent("symbol"); TrackingData tracking = parseTracking(); double price = Double.parseDouble(parseElementContent("price")); int shares = Integer.parseInt(parseElementContent("quantity")); StockTrack.recordTrade(symbol, tracking.m_time, price, shares); } protected TrackingData parseTracking() throws IOException, XmlPullParserException { // Read id attribute from root element start tag TrackingData data = new TrackingData(); parseStartTag("tracking"); data.m_id = attributeValue("id"); // Read time as content of its own element data.m_time = parseElementContent("time"); // Read seller agent information parseStartTag("seller"); data.m_seller = attributeValue("ident"); data.m_isDirectSeller = "direct".equals(attributeValue("type")); parseEndTag("seller"); // Read buyer agent information parseStartTag("buyer"); data.m_buyer = attributeValue("ident"); data.m_isDirectBuyer = "direct".equals(attributeValue("type")); parseEndTag("buyer"); // Read exchange identifier as content of its own element data.m_exchange = parseElementContent("exchange"); // Finish with closing tag for root element parseEndTag("tracking"); return data; } Using some simple utility methods I can parse the data content of the document very easily with direct inline code, rather than having to use a state machine. I think this is a much more natural style of programming for most developers - a top-down structure in the code that reflects the structure of the document. I could wrap a pull parser in handlers to give the same effect as a SAX parser interface - in fact, Alek Slominski has actually implemented a prototype SAX2 push layer on top of a pull parser (http://www.extreme.indiana.edu/xgws/xsoap/xpp/). Trying to turn a push interface into a pull interface is much more difficult, basically requiring a separate thread and associated threading overhead. If you really want an object-oriented approach to processing XML I think data binding is the best alternative. SAX is great when you want to use a visitor-style approach, but IMHO is awkward in many situations because the data is delivered to the application one piece at a time and needs to be assembled before use - that's basically the point of the handler generator programs mentioned in this thread, as well as the handler examples you provide. Pull parsers let applications handle the assembly directly, making use of information about the document structure. - Dennis
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|