|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Fast text output from SAX?
At 10:00 AM -0400 4/14/04, Stephen D. Williams wrote: >The fact is that creating, populating, and manipulating a data model >has costs. This is true of DOM, SAX (where the data model is >managed by the application), esXML (where the data model is also the >'serialized' format so all costs are manipulation), and all other >applications that involve internal and external data (Corba, DCOM, >ONC-RPC, ASN.1/xER, etc.). It's not fair to ignore part of the >processing cycle for a format (esXML) that trades some manipulation >overhead for all parsing/serialization/object creation/object >population overhead. > I consider creating and populating the data model to be part of parsing if it's done from an event stream. For instance, the time to build a DOM document object is significant. Sorry if that wasn't clear. My point is that once the object exists in memory the manipulations from that point until you start serializing are irrelevant. In my tests with my model, parsing/object creation is about 2/3 of the time, serialization is about 1/3, and manipulation is unmeasurable. Various optimizations adjust the absolute numbers, but the 2-1-0 ratio seems pretty consistent. Possibly other formats have different ratios. However, given that real world programs read data from input streams and write them to output streams rather than byte arrays like benchmarks do, it doesn't seem credible that in-memory XML operations like add and remove are worth optimizing. >Additionally, the whole parsing etc. stream for XML must be >completely performed, in DOM cases and many SAX cases, for every >element of a document/object. With esXML, if a 3000 element >document/object were read in and 5 elements manipulated, you only >spend 5*element-manipulation-overhead. I flat out don't believe this. I think there's an underlying assumption here (and in some of the other binary formats) which once again demonstrates that they are not as much like XML as they claim. The only way you can limit this is by assuming the data in your stream is well-formed. In XML, we don't assume that. One of the 3000 nodes you don't process may be malformed. You're assuming that's not the case, and therefore avoiding a lot of overhead in checking for it. A large chunk of any speed gain such a format achieves over real XML is by cutting corners on well-formedness checking. If this is not the case for esXML and indeed it does make all mandated well-formedness checks, then please correct my error. However, I'd be very surprised that in that case that one could indeed limit parsing overhead to the raw I/O. -- Elliotte Rusty Harold elharo@m... Effective XML (Addison-Wesley, 2003) http://www.cafeconleche.org/books/effectivexml http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








