|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML tools and big documents
David Megginson wrote: > Don Park writes: > > > > As for the memory issue, I have thought about some sort of LZW > > > compression of all of the text in a document tree. This would > > > save a lot of memory, but may slow down building the DOM tree a > > > bit. Any ideas on this? > > > > > > Your performance will suffer and memory problem still remains. > > Agreed. The overhead comes from the node objects, not from the text. > The biggest hogs can be attributes, especially in the standard SGML > DTDs which often include dozens of defaulted attributes for each > document type. If you can optimise those (allocating nodes only on > demand and then freeing them as soon as they're not needed), you're > half-way there. > > The second biggest hogs are leaf elements which contain only text. If > you can treat those as special cases and allocate only one object for > each one instead of three (element node, node list, text node), then > you're another quarter of the way there. Very true. However, in Java at least you can get around allocating a new object for the node list by having your Node implementation also implement the NodeList implementation as well. Only allocate a buffer to store the children as needed. You can do the same thing with the Element Node with regard to attributes. This saves a lot of memory and heap-based object allocation that you would have to do otherwise. Nevertheless, in Java allocating raw Objects is a memory hog to begin with. > PIs , doctype declarations, notations, etc. are rare enough that you > don't gain much by optimising them. Your mileage on comments, entity > references and CDATA sections may vary, but you're probably best > skipping them or replacing them with their contents when you build the > tree, unless your application has very specialised requirements. This is very true. For large documents both heavily document oriented or transaction oriented I still think that compressing all of the text in the document tree may have some promise. I guess before spending any more time talking about it, I should spend the necessary hours to just do it. Tyler xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








