[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: combining XMLEvent lists
On 28/09/2010 6:24 PM, David wrote: > My guess would be "XMLEvent" is refering to StAX Events. > > http://woodstox.codehaus.org/javadoc/stax-api/1.0/javax/xml/stream/events/XMLEvent.html Ah yes, you're probably right. I forgot that's what they were called... If that's the case it looks fairly easy to present a List<XMLEvent> via an XMLEventReader, which can be wrapped in a StaxSource and supplied to any Saxon interface that expects a Source, for example a DocumentBuilder. Michael Kay Saxonica > > which is a parsed XML event (startDocument, startElement , characters > ... ) > > > David A. Lee > dlee@calldei.com > http://www.xmlsh.org > > > On 9/28/2010 1:17 PM, Michael Kay wrote: >> >> On 28/09/2010 4:13 PM, Johannes.Lichtenberger wrote: >>> On 09/28/2010 04:33 PM, Michael Kay wrote: >>>> Sounds fascinating, and I wish I had time to get involved. It would >>>> certainly be elegant if you could have both the productivity of >>>> writing >>>> this declaratively in XSLT and the performance of running it on Hadoop >>>> MapReduce. Intrinsically, the two seem to fit together hand in glove, >>>> but I suspect some engineering effort is needed to make it work. >>> Hello Michael, >>> >>> I think it would be too complicated to achieve the desired grouping >>> with >>> Java. Do you think it makes sense to first serialize the results and >>> then use Saxon's XSLT 2.0 processor to achieve the results? Or do you >>> have any direct input from a List of XMLEvents to Saxon's XSLT >>> processor? I assume it reads XML-data from an InputSource or some kind >>> of a stream. >> >> I'm not sure whether "XMLEvent" is something I'm expected to know >> about: you said earlier " >> >> I've got an Iterator with Lists (Java) out of XMLEvents, which are >> serialized fragments >> >> so I assume they are just strings containing unparsed XML. That's not >> going to be a particularly efficient representation for processing, >> so the first step will be to parse each one to a tree (for example, a >> Saxon TinyTree). >> >> You then said, >> >> I want to find combine Lists which have the same page id and the same >> revision timestamp >> >> but you left out the critical information as to whether this would >> always combine elements >> that were adjacent in the list. If the groups are adjacent then you >> could potentially devise >> a strategy that avoid holding all the trees in memory at the same time. >> >> Supplying a sequence of trees as input to Saxon grouping is not a >> problem. Using the s9api interface, >> you can use a DocumentBuilder to build each tree as an XdmNode, then >> a sequence can be constructed using >> the constructor public XdmValue(Iterable<XdmItem> items), and then >> this XdmValue can be passed as a parameter >> to an XsltTransformer, and a reference to the parameter can be used >> in<xsl:for-each-group select="$param">. >> Using this approach the whole structure will be held in memory, but >> there are ways of avoiding that by going >> to lower-level interfaces. >> >> Michael Kay >> Saxonica >> >> >>> It's a special case, where two or more revisions of one article are >>> made >>> at the same time (in the same second). I would have to look through the >>> XML file with BaseX or Saxon, but I'm pretty sure such cases exist >>> somewhere in the hugh file (as of now I've only extracted a small >>> subset >>> of articles and replaced WikiText inside text-elements with XML). >>> >>> The whole task is to sort the revisions to shredder it into our XML >>> datastorage system (the deltas of the revisions), which has the >>> capability to store and retrieve revisions compactly and >>> efficiently. In >>> parallel I'm currently writing the import of a sorted XML file. >>> >>> My main task (master project and thesis) is or will be the >>> visualization >>> of temporal tree structured data to gain further insights into the >>> evolution of the data, which are otherwise very difficult to realize. >>> >>> regards, >>> Johannes >>> >> >> >> _______________________________________________________________________ >> >> XML-DEV is a publicly archived, unmoderated list hosted by OASIS >> to support XML implementation and development. To minimize >> spam in the archives, you must subscribe before posting. >> >> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ >> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org >> subscribe: xml-dev-subscribe@lists.xml.org >> List archive: http://lists.xml.org/archives/xml-dev/ >> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php > > _______________________________________________________________________ > > XML-DEV is a publicly archived, unmoderated list hosted by OASIS > to support XML implementation and development. To minimize > spam in the archives, you must subscribe before posting. > > [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ > Or unsubscribe: xml-dev-unsubscribe@lists.xml.org > subscribe: xml-dev-subscribe@lists.xml.org > List archive: http://lists.xml.org/archives/xml-dev/ > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php > >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|