[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: use XSLT or XQuery in Saxon?
> I have extremely large (over 300 MB) XML file and tens > of thousands of small xml files generated after > applying various XSLT on the one big XML file. You're right, 300Mb *is* large (I had someone recently ask how to process a large file and it turned out to be 300Kb). You have a choice between spending money on lots of memory (say 2Gb, but it depends on the actual structure) and doing more development work to split the task up. This applies equally whether you are using XSLT or XQuery - in Saxon these are really just different surface syntaxes for the same processing engine. > > I am using Saxon for XSLT and will be using it also > for XQuery. > > Is Xquery or XSLT is better solution for this problem? > Query each text node in the big xml file and verify > that this content is present in one of the results xml > files. Clearly this requires a better algorithm than searching all the small files once for each text node in the large file. One solution is to aggregate the small files into a single document and index it using a key. This would require XSLT, because keys are not available in XQuery. Some XQuery implementations might do an indexed join automatically, but Saxon doesn't (yet). Of course, aggregating the small files means even more memory. Another solution, again dependent on XSLT, is to use grouping. This doesn't require the small documents to be aggregated into a single document. If you take the union of the text nodes in the large document and the values in the small documents, and then do grouping, a group of size 1 indicates a value that is present in one file and not the other. However, if performance is really important (you don't actually say), I think I would be inclined to write this "by hand" as a SAX application. It will probably be an order of magnitude faster that way. In the past it was taken for granted that to handle 300Mb of data you needed a database. I wouldn't rule this option out: it largely depends on where the data comes from and what its lifecycle looks like. Databases are designed specifically for this kind of job. Michael Kay http://www.saxonica.com/ Based on this information generate a report > that shows which content is present and in which file > and in a separate section which content was not found > in result xml files and also show this content parent > element or other markup to indicate its position in > the big xml file. > > All the small xml files are stored as flat files in > various directories on Windows File system although > most files are in one directory. The big XML file is > fairly complex with multiple levels of nesting > elemenents. > > Any comments or suggestions? > Thank you > > > > __________________________________ > Do you Yahoo!? > Yahoo! Mail - now with 250MB free storage. Learn more. > http://info.mail.yahoo.com/mail_250
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|