[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: performance comparison
7/11/2002 7:32:19 PM, zhengyu@a... wrote: > 2. Most people mention SAX can handle files larger >than memory, but I am thinking, is this really the case, >because files are read into the kernel buffer, so large >files still have to be read into the memory, just not in >user space. Am I right? DOM builders generally load the entire document into a tree structure. SAX operates at parse time; it can call a user-defined function for each element, attribute list, entity reference, etc. The application can choose to either process the XML data and throw it away (meaning that the total size of the document is independent of the memory usage) or build another data structure, store the data to DBMS, or whatever. This provides the usual tradeoff -- more work for the application programmer but more control over resource usage. > 3. DOM is memory-thirsty, according to most articles I >read. So DOM's performance lags, does anyone run any type >of profiling, and I am interested in why it is memory >hungry, and poor in terms of performance. It is quite true that if one simply defines classes that directly implement all the DOM interfaces, each Node will be fairly large because of all the properties and methods defined on the basic Node interface. The DOM exposes several different models of an XML document -- a tree with parents, children, and siblings; lists of nodes containing lists of nodes, a more OO conception of Document, Element, Attribute, etc. objects, and a more abstract model where the document is traversed via iterators. Still, this is an implementation issue, not intrinsic to the DOM API. There are some DOM implementations that are "lazy", i.e., only build actual objects implementing the DOM interface when a specific part of the document is accessed. There are also persistent DOMs, where the parser essentially loads a database that is then navigated and queried on demand. Both these techniques would be less memory hungry than a straightforward implementation of the spec. > > 4. What do people think of pull type parsers and DOM >SAX hybrids? Are these popular and stable? There's been a lot written on this, but you'll probably have to sort it out for yourself. A simple Googling for "xml pull parser performance" yields quite a number of articles. It's probably something to consider if you have lots of data and relatively constrained processors, but a well-defined application. I'd say in general that the more flexibility you need, the more you need a DOM-like API; the more you can constrain EXACTLY what the application will do with each bit of markup, the more you can exploit a streaming approach. > > 5. Is it possible for SAX to support XSLT? Well, several (most?) XSLT implementations support SAX parsers to build the tree for transformation. Strictly speaking, however, you're not getting some of the infinite document size / efficiency advantages of SAX because a conformant XSLT implementation must keep the entire document around because the stylesheet can refer to arbitrary pieces. There are extensions to SAXON, I believe, to support a more efficient use of memory by having the user tell the XSLT engine what sections of the document to look at ... see the <saxon:preview> extension element? There are also occasional discussions of "streaming XSLT" processors (I don't know if any actually exist in a stable, available form) but they would have to operate on a subset of XSLT. I should probably shut up and let someone who knows what they're talking about explain the situation :~)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|