[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Fast text output from SAX?


fast text parsing
At 10:00 AM -0400 4/14/04, Stephen D. Williams wrote:


>The fact is that creating, populating, and manipulating a data model 
>has costs.  This is true of DOM, SAX (where the data model is 
>managed by the application), esXML (where the data model is also the 
>'serialized' format so all costs are manipulation), and all other 
>applications that involve internal and external data (Corba, DCOM, 
>ONC-RPC, ASN.1/xER, etc.).  It's not fair to ignore part of the 
>processing cycle for a format (esXML) that trades some manipulation 
>overhead for all parsing/serialization/object creation/object 
>population overhead.
>

I consider creating and populating the data model to be part of 
parsing if it's done from an event stream. For instance, the time to 
build a DOM document object is significant. Sorry if that wasn't 
clear. My point is that once the object exists in memory the 
manipulations from that point until you start serializing are 
irrelevant. In my tests with my model, parsing/object creation is 
about 2/3 of the time, serialization is about 1/3, and manipulation 
is unmeasurable. Various optimizations adjust the absolute numbers, 
but the 2-1-0 ratio seems pretty consistent. Possibly other formats 
have different ratios. However, given that real world programs read 
data from input streams and write them to output streams rather than 
byte arrays like benchmarks do, it doesn't seem credible that 
in-memory XML operations like add and remove are worth optimizing.

>Additionally, the whole parsing etc. stream for XML must be 
>completely performed, in DOM cases and many SAX cases, for every 
>element of a document/object.  With esXML, if a 3000 element 
>document/object were read in and 5 elements manipulated, you only 
>spend 5*element-manipulation-overhead.

I flat out don't believe this. I think there's an underlying 
assumption here (and in some of the other binary formats) which once 
again demonstrates that they are not as much like XML as they claim. 
The only way you can limit this is by assuming the data in your 
stream is well-formed. In XML, we don't assume that. One of the 3000 
nodes you don't process may be malformed. You're assuming that's not 
the case, and therefore avoiding a lot of overhead in checking for 
it. A large chunk of any speed gain such a format achieves over real 
XML is by cutting corners on well-formedness checking.

If this is not the case for esXML and indeed it does make all 
mandated well-formedness checks, then please correct my error. 
However, I'd be very surprised that in that case that one could 
indeed limit parsing overhead to the raw I/O.
-- 

   Elliotte Rusty Harold
   elharo@m...
   Effective XML (Addison-Wesley, 2003)
   http://www.cafeconleche.org/books/effectivexml
   http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.