[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML Performance in a Transacation
--- Rick Jelliffe <rjelliffe@a...> wrote: ... > By rights, it seems that there should be some market > for a highly > optimized XML parser. You need high performance, you > seek high performance > libraries; if there are none, you get them made > internally or externally. > But I don't recall ever having seen any requests on > XML-DEV for high speed > parsers: certainly none with any dollars behind > them. I guess that lack of demand (not just money, but interest) has something to do with it. But I think that of two main approaches (improving the general case; focusing on specific subset, whether by domain or by feature set), the general route would be mostly fruitless. However the 'specific solution' path is a route less travelled (at least in public); and like you point out, there are lots of options one can try. ... > Hyper-efficient design is not > an optimization that can be tacked on after, it has > to be the core of the Very true. And: > design; you cannot expect a general-purpose, > cross-platform parser to be > optimal. (For example, one trick that goes as far This is also exactly right; and perhaps it does suggest domain-specific (or at least feature set specific) parsers. One problem I have seen is that there are no publicly accepted subsets. Although Soap (and others with less understanding; like the silliness of XMPP) went ahead and limited subset of xml it accepts, there's lots of resistance for using ad hoc subsets; yet very little effort at coming up with 'standard' ones. Distinction between validating and non-validating parsers seems like the only acceptable division: but for practical purposes this is not good enough. Many earlier pull parsers obviously just went ahead and chose some subset that made sense to them. Also: XML is by its foundation a hierarchical textual format. So much effort is used (wasted?) on adding type systems (like w3c schema, formerly thought of as a validation system), typing, constraints, that it should not be surprising that binding non-textual data (numbers, dates etc) is inefficient. At least when going the generic parsing route. Doing tighter type binding, well, one can devise specific parsers: but one problem is that the mechanism for feeding type info are themselves sources of major overhead. Who cares if you can get some speedup on accessing that int value, when just using w3c schema instance halves your processing speed? At least DTD processing only adds 50% of time (when DTD instance is cached). Another route is of course to forget textual background and use a binary encoding (fast infoset, Bnux). This will result in faster operation, at least in context of message processing where there is significant amount of processing by middle-men. But is the Infoset really an optimal presentation for (object) data? It still has all impedance of hierarchic data model, compared to object or relational data models, even if primitives can be typed. Plus they still need Schema... which is not supported/integrated with these binary encoding efforts (ie. there's still schema overhead at one or both end points). > back as OmniMark's > predecessor in the late 80s (I believe) was for > parsers to have two > parsers: > one optimized for the most common case and > encoding--in XML this would be > for an entity-less document--, and another to handle > all the other cases.) Yes. If you can make use of the fact that there will not be nested input streams, you can optimize many things differently. It would be good to see how much improvement this could yield. Of course, at the end of the day, one could also consider whether it is all that important to handle both the traditional text markup use case (for which XML was designed for, and where it is reasonably good choice), and the later data binding use case (where xml just stinks, even after tons of lipstick). Why not solve these using different serializations and data models? For data binding, why not use something more natural to object binding, like say, JSON? Primitives, arrays/lists, Maps/Objects... what more do you need? There is little use for mixed content; no need for obscure macro expansion (entities) beyong encoding purposes... and due to native support for native types (ie. parser knows the primitive types without need for external information), it's very simple to avoid pure textual approach. In fact, JSON is so trivially simple to parse and output that it's even weirder that no money is behind it. But what the hey; hammer is a hammer, and maybe them dang weird spiraled nails just need a bigger hammer! ;-) -+ Tatu +- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|