Subject: RE: Processing large XML Documents [> 50MB]
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 23 Feb 2010 08:28:12 -0000
|
> We have a need to process XML Documents that could be over 50
> megs in size.
>
> Due to the huge size of the document, XSLT is getting tough,
> with the environment we are running in.
Actually, 50Mb isn't really that big nowadays. Some people are transforming
1Gb or more.
>
> Basically, the nature of the data procesing are
>
> a) assemble around 30-40 XML documents [each with a common
> header and its own lines] into one single XML document, with
> the common header and all the lines
> b) Update the assembled document in specific locations
> c) generate multiple XML document fragments from the huge XML
> document based on query criteria. Each XML frgment is created
> by mapping specific fields in the big document. Each document
> is created for a specific key element value in the huge document.
>
> Am puzzled how to handle this one efficiently.
> Any comments are welcome.
>
It's not entirely clear why you are creating the one big document: it's
perfectly possible to work directly with the 30-40 small ones. Perhaps the
main advantage of building the big document is that you can then use a key
to search across all the data. But if you use a processor like Saxon-EE that
optimizes searches by means of implicit indexing, this might not be
necessary.
Is the 50Mb the size of the combined document, or the size of the individual
pieces?
|