[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: combining XMLEvent lists

  • From: Michael Kay <mike@saxonica.com>
  • To: xml-dev@lists.xml.org
  • Date: Tue, 28 Sep 2010 18:46:15 +0100

Re:  combining XMLEvent lists
  On 28/09/2010 6:24 PM, David wrote:
>  My guess would be "XMLEvent" is refering to StAX Events.
>
> http://woodstox.codehaus.org/javadoc/stax-api/1.0/javax/xml/stream/events/XMLEvent.html

Ah yes, you're probably right. I forgot that's what they were called...

If that's the case it looks fairly easy to present a List<XMLEvent> via 
an XMLEventReader, which can be wrapped in a StaxSource and supplied to 
any Saxon interface that expects a Source, for example a DocumentBuilder.

Michael Kay
Saxonica

>
> which is a parsed XML event (startDocument, startElement  , characters 
> ... )
>
>
> David A. Lee
> dlee@calldei.com
> http://www.xmlsh.org
>
>
> On 9/28/2010 1:17 PM, Michael Kay wrote:
>>
>>  On 28/09/2010 4:13 PM, Johannes.Lichtenberger wrote:
>>> On 09/28/2010 04:33 PM, Michael Kay wrote:
>>>> Sounds fascinating, and I wish I had time to get involved. It would
>>>> certainly be elegant if you could have both the productivity of 
>>>> writing
>>>> this declaratively in XSLT and the performance of running it on Hadoop
>>>> MapReduce. Intrinsically, the two seem to fit together hand in glove,
>>>> but I suspect some engineering effort is needed to make it work.
>>> Hello Michael,
>>>
>>> I think it would be too complicated to achieve the desired grouping 
>>> with
>>> Java. Do you think it makes sense to first serialize the results and
>>> then use Saxon's XSLT 2.0 processor to achieve the results? Or do you
>>> have any direct input from a List of XMLEvents to Saxon's XSLT
>>> processor? I assume it reads XML-data from an InputSource or some kind
>>> of a stream.
>>
>> I'm not sure whether "XMLEvent" is something I'm expected to know 
>> about: you said earlier "
>>
>> I've got an Iterator with Lists (Java) out of XMLEvents, which are
>> serialized fragments
>>
>> so I assume they are just strings containing unparsed XML. That's not 
>> going to be a particularly efficient representation for processing, 
>> so the first step will be to parse each one to a tree (for example, a 
>> Saxon TinyTree).
>>
>> You then said,
>>
>> I want to find combine Lists which have the same page id and the same
>> revision timestamp
>>
>> but you left out the critical information as to whether this would 
>> always combine elements
>> that were adjacent in the list. If the groups are adjacent then you 
>> could potentially devise
>> a strategy that avoid holding all the trees in memory at the same time.
>>
>> Supplying a sequence of trees as input to Saxon grouping is not a 
>> problem. Using the s9api interface,
>> you can use a DocumentBuilder to build each tree as an XdmNode, then 
>> a sequence can be constructed using
>> the constructor public XdmValue(Iterable<XdmItem>  items), and then 
>> this XdmValue can be passed as a parameter
>> to an XsltTransformer, and a reference to the parameter can be used 
>> in<xsl:for-each-group select="$param">.
>> Using this approach the whole structure will be held in memory, but 
>> there are ways of avoiding that by going
>> to lower-level interfaces.
>>
>> Michael Kay
>> Saxonica
>>
>>
>>> It's a special case, where two or more revisions of one article are 
>>> made
>>> at the same time (in the same second). I would have to look through the
>>> XML file with BaseX or Saxon, but I'm pretty sure such cases exist
>>> somewhere in the hugh file (as of now I've only extracted a small 
>>> subset
>>> of articles and replaced WikiText inside text-elements with XML).
>>>
>>> The whole task is to sort the revisions to shredder it into our XML
>>> datastorage system (the deltas of the revisions), which has the
>>> capability to store and retrieve revisions compactly and 
>>> efficiently. In
>>> parallel I'm currently writing the import of a sorted XML file.
>>>
>>> My main task (master project and thesis) is or will be the 
>>> visualization
>>> of temporal tree structured data to gain further insights into the
>>> evolution of the data, which are otherwise very difficult to realize.
>>>
>>> regards,
>>> Johannes
>>>
>>
>>
>> _______________________________________________________________________
>>
>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> to support XML implementation and development. To minimize
>> spam in the archives, you must subscribe before posting.
>>
>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> subscribe: xml-dev-subscribe@lists.xml.org
>> List archive: http://lists.xml.org/archives/xml-dev/
>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.