Re: combining XMLEvent lists
On 09/28/2010 02:42 PM, Michael Kay wrote: > Clearly a job for XSLT 2.0, where the processing will be vastly easier than in Java. Hi Michael, it's the reduce functionality in a Hadoop MapReduce task. I'm going to split up the huge wikipedia articles revision file (several thousands of articles/pages) into <page> ... <revision><timestamp>...</timestamp></revision> <revision><timestamp>...</timestamp></revision> ... </page> chuncks (each page represents one article with several revisions which are not entirely sorted), then split into key (timestamp) and values (page-fragments) for every page concurrently: <page> <revision><timestamp>...</timestamp></revision> </page> <page> <revision><timestamp>...</timestamp></revision> </page> and then let Hadoop group and sort according to the key timestamp. Though I would need to serialize the lists temporarily, process the file with for example Saxon and then serialize the results once again. It would be great if it would be possible to build a tree representation or even streaming variant out of XMLEvents, but I think that's not possible with Saxon nor planned!? :-/ I agree that it's for sure simpler with XSLT even though I would need some assistance in writing the stylesheet :-) regards, Johannes
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format