[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: JITTs and DOM


Re:  JITTs and DOM
On Saturday 12 October 2002 07:56 am, Patrick Durusau wrote:
> Gavin Thomas Nicol wrote:
> >Part of the value of ARA is that it was explicitly design to support
> > parallel parsing of documents. I'm not sure that JITT can be used in
> > quite the same same way... or at least it'd be more complex because the
> > implicit assumption is that you are operating in the context of a tree.
>
> I am not sure what measure you are using for "complex" 

In this case, the cost of manipulating a tree in parallel (adding nodes, 
etc.). Purely from an implementation perspective, it complicates things 
considerably because you need to work on synchronization etc. In ARA, the 
output is a stream of discrete ranges, so synchronization isn't a major 
problem. In other words, it's not the model that is complex, but (based on 
set of possibly faulty assumptions!) the implementation.

> In our investigations of overlapping texts, it appears that most overlap
> is what we characterized as "localized" and hence, one need only parse a
> fragment in the alternate hierarchy to compare the alternative
> hierarchies.

This is an interesting observation... and I think a fairly important insight 
into markup. Perhaps there's some "proximity" factor in markup, where 
long-range overlapping markup structures are uncommon because most people 
cannot track them? It might be similar to depth of XML trees...

> ARA parallel parses the entire document in order to build its internal
> representation of the ranges in the document.

If you use the term "parse" in the sense "examine every character", that is 
true... but unless JITT has an external addressing mechanism, it will need to 
do that too. You do not have to construct all ranges, all at once however 
(though that is what I do in my work so far). For example, the regular 
expression:

  "<"{NameStart}{NameChar}{S}.*">"

could be used to discover/parse all the start tag ranges, but not attribute, 
or attribute value ranges. That can happen in parallel, or lazily later. In 
terms of tree construction, my current appoach is to use something akin to 
feature logic for "range stop lists" so that certain ranges are suppressed... 
though this could just as easily be something based on forest regular 
expressions, or XPath (that's one part of my work I still need to do: typing 
range constructors to forest regular expressions/schemas).
 
> In some sense that is not complex, but it certainly poses a certain
> overhead to using the ARA approach. Once the entire document has been
> processed, I would expect querying of the ranges to be quite fast. That
> would not be a drawback with largely static documents and versions of
> documents, but could pose problems with documents and sets of documents
> that are not fairly stable. 

Right. My current work is with very large sets of mostly static documents, 
and on large documents (> 100MB) that are essentially static. This is not a 
limitation of ARA, so much as a constraint of my problem domain. For single 
documents that are changing frequently, ARA would operate more-or-less 
equivalently to SAX though with filtering capabilities like JITT.

I should have my papers online today or tomorrow.





PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.