[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML Performance in a Transacation
Michael Champion said: > See http://lists.w3.org/Archives/Public/www-ws/2004Oct/att-0032/MNicola_CIKM_2003_1_.pdf > "XML Parsing - A Threat to Database Performance." Be forewarned that the > conclusion may be unpalatable: By rights, it seems that there should be some market for a highly optimized XML parser. You need high performance, you seek high performance libraries; if there are none, you get them made internally or externally. But I don't recall ever having seen any requests on XML-DEV for high speed parsers: certainly none with any dollars behind them. If some companies get together and say "We will pay $$$ for a higher performance XML parser" they would get one. A $10,000 first prize and $5,000 second prize for the winning parser on specified data, schema and platform would be enough stimulate a lot of hackers and researchers, not to mention prompting people with inhouse, private parsers to oen source them. When you move to an Open Source software economy, the issue for business becomes "How do we stimulate development in areas that help us?" Only this week I was listening to people from a client airline who had to write their own XML parser in PLI for optimized access to mainframe DB2. The lack of such a parser suggests to me that organizations using mainframe/transaction/high-volume databases need to adopt a new, pro-active stance in getting high performance, open source XML software written. Passivity in this area will assure they only have unsuitable implementations. If you look at, say, Apache Xerces and Xalan, you can see that hyper-efficiency plays little part of the game. The same is true, by and large, for the other open source software. Hyper-efficient design is not an optimization that can be tacked on after, it has to be the core of the design; you cannot expect a general-purpose, cross-platform parser to be optimal. (For example, one trick that goes as far back as OmniMark's predecessor in the late 80s (I believe) was for parsers to have two parsers: one optimized for the most common case and encoding--in XML this would be for an entity-less document--, and another to handle all the other cases.) My expectation is that XML parsing can be significantly sped up with better use of SSE intrinsics*, integrating parsing and transcoding, also validation and type assignment using streaming path-matching rather than automata (i.e. transform horizontal grammars into vertical paths), direct parsing to native data types for numbers, for example. I am sure many other people have a shopping list of good ideas: but there are no parsers that implement any of these things AFAIK at the moment. Parser innovation has stalled, and it surely should be an issue of serious concern (and by serious concern I mean $$$) to high-volume companies to get it restarted. The other aspect is that there is no "type aware SAX" API. Without this, Open Source or even proprietary versus public parsers are not interchangeable. Obviously this applies to Java most, but the principle is the same: we need agreements at the interfaces (a.k.a. standards). Cheers Rick Jelliffe * See http://www.oreillynet.com/digitalmedia/blog/2005/11/ and search for Intrinsics. The OReilly blog site is being altered, it is a complete mess at the moment, so sorry about the odd format for this archive.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|