[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XML tools and big documents (was: Re: Is there a size limitation on

  • From: Tyler Baker <tyler@i...>
  • Date: Tue, 01 Sep 1998 21:39:24 -0400

java dom parser size limitation
David Megginson wrote:

> Ingo Macherius writes:
>
>  > My afterall impression is that most available tools do well with
>  > toy examples, but any input being in the MB range easily blasts
>  > them. At least that's true for what came from MS so far.
>
> I don't think that that's true in general.  Most of the Java-based XML
> parsers I've tried seem to be able to handle Jon Bosak's XML Old
> Testament (nearly 4MB) just fine, if somewhat slowly -- I used ot.xml
> for routine testing and profiling while developing AElfred, and
> AElfred barely kicked up a sweat.
>
> The problem comes if the parser tries to build a tree rather than
> simply reporting an event stream.  Depending on the implementation,
> document trees tend to be very large.  With a naive tree
> implementation, a 10MB document might use 100MB or more of virtual
> memory for the document tree -- that'll bring most current desktop
> systems to a screeching halt.

This is especially true for Java which is very memory hungry.  Most of the memory
problems with objects can be significantly reduced if your nodes only allocate
memory for sub-arrays as needed (most implementations I would assume would use an
array rather than a Vector to store children).  Also, if there is only one child,
do not create an array just to store that one child.

In other words, you have something like this:

class Node {
  Node child;
  Node[] children;
  int nodeLength
}

if child is null, then there are no elements
if the size ever goes above 1, set child to null and copy the contents of child
into children[0] and the parameter node into children[1].

Then when you look up a child by index of name you first test to see if child is
null.  If it is not then return the child if the index requested is 0, otherwise
the index is out of bounds.  If child is null, test to see if children is null.
If children is not null, then just look up the node by index.  If children is
null then there are no elements (nothing has been added or deleted).

For a lot of trees where it is somewhat common for nodes to only have one child,
this can save you a lot of memory.  It can also speed up your tree traversals a
bit since you do not have to look up the children nodes by index in the case
where there is only one child.  Also, for building the tree, you will likely
speed your app up a lot since you will now only have to create a new array object
if the child index is greater than 1.  Otherwise it is just a reference
assignment which is about as fast as an integer assignment.

I have not had a lot of problems building trees.  For the DOM implementation, in
conjunction with the parser I have, I build a DOM tree off of Jon Bosak's ot.xml
in about 12 seconds running a JIT with JDK 1.2 b4, on an old P-120 with 64 megs
of RAM running Windows NT 4.0.  I have not been able to do any reliable memory
benchmarks because the GC seems to be invoked much frequently with SUN's JDK 1.2
VM.

I would suspect that the DOM package provided by Don Park has similiar
performance and memory consumption.  Your best bet would probably be to look at
an XSL package which takes a DOM tree of your XML data, and a DOM tree of an XSL
stylesheet and spits out the content.  That way you are not stuck with an MS,
IBM, Oracle, or whatever implementation that you are not happy with.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.