[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: A new XML parser
--- peastman@d... wrote: > I'm working on developing a new style of XML parser. Actually, based on what you describe, this has been suggested a few times, and some PoCs exist. The last person I remember suggesting it (or, as he put it "obsoliting the need for StAX") was Raf Schietekat. So while not exactly new, I guess there is some merit to the idea, since it keeps getting brought up. Some notes/comments though: ... > My idea is to create a single high level, DOM-like > API which is suitable > for both streaming and in-memory parsers. I believe I am not convinced this is a good idea. It tends to either converge to a deferred node construction (that Xerces already does, although its benefits have been debated a lot), or to just doing things the way they'd be done in streaming. That is, I see it as swiss pocketknife of two very different tools. To me it's much more natural to layer things, so that tree builder strictly sits on top of streaming parser. That's how most current systems do it (XOM, JDOM/Dom4j, even Xerces SAX+DOM). > this design has > several advantages over existing parsers: > > - It's much easier to use than other streaming > parsers like SAX or StAX, > since you get to work with a high level, object > oriented representation of Possibly, but if you use it in convenient way, you tend to lose the potential performance benefits; converging towards tree models. And to get the benefits, you must limit yourself strictly to a subset of operations, but one that your API does (and can) not limit. > the XML content. It's very similar to existing DOM > APIs. The only > restriction is that, if you're using a streaming > parser, you're required > to access the nodes in the order they appear in the > file. To me, this is the main problem however: you pretty much MUST build the tree, even if calling code _seems to_ access things in order. Unless you force that code to indicate something "I promise to process them in order, all the time", there's nothing you can do to avoid buffering all the data. And that means eager/deferred node construction. On the other hand, if you do require some kinds of hints, it's not exactly single API any more. It's a dualistic API with two very different operational modes; and its questionable if it's any easier than 2 clearly separate APIs. Another concern is the mutability: tree models generally allow modifying of the tree, and that's one of the things that complicates full-blown tree models (adds some overhead, prevents some optimizations etc). Streaming models allow very limited mutability: in SAX you can modify current event easily; in StAX you essentially have separate components (parser, serializer). In both cases you modify stream serially. You could make API read-only, but then it'd be much more limited than existing options. > - Switching from an in-memory parser to a streaming > parser (or vice versa) > is much easier than it would be with any other two > parsers, because both > of them use exactly the same API. You can even Note, though, that using the same API, and using the API same way (usage patterns) are not the same thing. ... > - Many utilities can be written once, then used with > either parser. Maybe you have examples of such use cases in mind? Having said all of above, good luck with your proposal; it can be fun developing new ways to deal with old problems. ;-) -+ Tatu +- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|