[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: combining XMLEvent lists
My guess would be "XMLEvent" is refering to StAX Events. http://woodstox.codehaus.org/javadoc/stax-api/1.0/javax/xml/stream/events/XMLEvent.html which is a parsed XML event (startDocument, startElement , characters ... ) David A. Lee dlee@calldei.com http://www.xmlsh.org On 9/28/2010 1:17 PM, Michael Kay wrote: > > On 28/09/2010 4:13 PM, Johannes.Lichtenberger wrote: >> On 09/28/2010 04:33 PM, Michael Kay wrote: >>> Sounds fascinating, and I wish I had time to get involved. It would >>> certainly be elegant if you could have both the productivity of writing >>> this declaratively in XSLT and the performance of running it on Hadoop >>> MapReduce. Intrinsically, the two seem to fit together hand in glove, >>> but I suspect some engineering effort is needed to make it work. >> Hello Michael, >> >> I think it would be too complicated to achieve the desired grouping with >> Java. Do you think it makes sense to first serialize the results and >> then use Saxon's XSLT 2.0 processor to achieve the results? Or do you >> have any direct input from a List of XMLEvents to Saxon's XSLT >> processor? I assume it reads XML-data from an InputSource or some kind >> of a stream. > > I'm not sure whether "XMLEvent" is something I'm expected to know > about: you said earlier " > > I've got an Iterator with Lists (Java) out of XMLEvents, which are > serialized fragments > > so I assume they are just strings containing unparsed XML. That's not > going to be a particularly efficient representation for processing, so > the first step will be to parse each one to a tree (for example, a > Saxon TinyTree). > > You then said, > > I want to find combine Lists which have the same page id and the same > revision timestamp > > but you left out the critical information as to whether this would > always combine elements > that were adjacent in the list. If the groups are adjacent then you > could potentially devise > a strategy that avoid holding all the trees in memory at the same time. > > Supplying a sequence of trees as input to Saxon grouping is not a > problem. Using the s9api interface, > you can use a DocumentBuilder to build each tree as an XdmNode, then a > sequence can be constructed using > the constructor public XdmValue(Iterable<XdmItem> items), and then > this XdmValue can be passed as a parameter > to an XsltTransformer, and a reference to the parameter can be used > in<xsl:for-each-group select="$param">. > Using this approach the whole structure will be held in memory, but > there are ways of avoiding that by going > to lower-level interfaces. > > Michael Kay > Saxonica > > >> It's a special case, where two or more revisions of one article are made >> at the same time (in the same second). I would have to look through the >> XML file with BaseX or Saxon, but I'm pretty sure such cases exist >> somewhere in the hugh file (as of now I've only extracted a small subset >> of articles and replaced WikiText inside text-elements with XML). >> >> The whole task is to sort the revisions to shredder it into our XML >> datastorage system (the deltas of the revisions), which has the >> capability to store and retrieve revisions compactly and efficiently. In >> parallel I'm currently writing the import of a sorted XML file. >> >> My main task (master project and thesis) is or will be the visualization >> of temporal tree structured data to gain further insights into the >> evolution of the data, which are otherwise very difficult to realize. >> >> regards, >> Johannes >> > > > _______________________________________________________________________ > > XML-DEV is a publicly archived, unmoderated list hosted by OASIS > to support XML implementation and development. To minimize > spam in the archives, you must subscribe before posting. > > [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ > Or unsubscribe: xml-dev-unsubscribe@lists.xml.org > subscribe: xml-dev-subscribe@lists.xml.org > List archive: http://lists.xml.org/archives/xml-dev/ > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|