[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: VTD-XML an open-source, high-performance and non-extractiv
On 10/18/05, Elliotte Harold <elharo@m...> wrote: > > On VTD-XML itself, I read on the web site that "Currently it only > supports built-in entity references(" &s; ' > <)." > That means it's not an XML parser. Given this, the comparisons you make > to other parsers are unfair and misleading. I've seen many products that > outperform real XML parsers by subsetting XML and cutting out the hard > parts. It's often the last 10% that kills the performance. :-( Well, they do say right up front: "VTD-XML is a non-validating, 'non-extractive" XML processing software API implementing Virtual Token Descriptor. Currently it only supports built-in entity references(" &s; ' > <).' Arguably an XML processing API doesn't have to be a real XML parser *if* the subset it supports is clearly stated. I would have to agree that in principle "XML" should be used to refer only to the full spec, but that battle was lost years ago -- SOAP implicitly subsets XML, RSS is often not well-formed (and thus not "XML"), but this distinction is lost on the vast majority of XML technology users who do not subscribe to xml-dev. As with most things in life, people need to just pick their poison. Given the efficiency issues, is it better to subset XML and process something that looks a lot like real XML efficiently with tools such as VTD-XML, is it better to build a more fully conformant Efficient XML Interchange (the sanitized term for what we used to call "binary XML"), is it better to lower customer expectations about performance/bandwidth consumption, or what? None of them are palatable, but people have to choose which is least toxic to their own scenario. > > The other question I have for anything claiming these speed gains is > whether it correctly implements well-formedness testing, including the > internal DTD subset. Will VTD-XML correctly report all malformed > documents as malformed? > > Finally, even if everything works out once the holes are plugged, this > seems like it would be slower than SAX/StAX for streaming use cases. > VTD, like DOM, needs to read the entire document before it can work on > any of it. I think the point is that the process that creates the XML can confirm that it is well-formed / valid, and produce a VTD associated with a document/message, then downstream processes that understand VTD can exploit it. Those that do not understand VTD can simply use the XML text. Yes this requires a level of trust in the producer that pure XML text processing does not require. I've always seen this as hitting a sweet spot (for *some* use cases!) between text XML and binary XML where the designers of an application decide that the cost of verifying that the producer got the XML right outweighs the benefits of catching the errors. We can argue about how common those scenarios are, of course, but at any point in the processing chain, a specific component can ignore the VTD and parse the XML to verify whatever needs to be verified. Obviously VTD doesn't reduce the size of the XML transmitted, so it doesn't meet the use cases that the W3C XBC / EXI folks are focused on. On the other hand, it sounds promising for messaging scenarios with multiple intermediaries that do routing, filtering, DSig verification, and perhaps encryption -- raw XML parsing is quite expensive, but could be accelerated by using the VTD to quickly find the offsets in the message that a particular intermediary knows/cares about. Obviously that doesn't work at all for infinite streams of XML. Overall, my concern is that we as an industry neither look for magic fixes that solve all known efficiency problems (which arguably the W3C is about to futilely attempt to do) nor reject approaches, e.g. VTD, that pluck some low-hanging fruit but don't handle all use cases.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|