[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: JITTs and DOM
Gavin, Thanks for the great analysis while I was away at the TEI meeting! One point of difference though: Gavin Thomas Nicol wrote: >Part of the value of ARA is that it was explicitly design to support parallel >parsing of documents. I'm not sure that JITT can be used in quite the same >same way... or at least it'd be more complex because the implicit assumption >is that you are operating in the context of a tree. > I am not sure what measure you are using for "complex" but let me describe how JITTs would operate for parallel processing as compared to my understanding of ARA. (Despite Gavin and I disagreeing on some of points (both major and minor), ARA is a very interesting bit of work and I am looking forward to his next paper on this topic.) JITTs and parallel parsing: Usual case is the user seeking one view of a document so I don't see JITTs being more complex than usual document parsing in that regard. Performance gains are seen when the user wants to build partial DOM trees and fully parse fragments of that tree for presentation or other uses. In our investigations of overlapping texts, it appears that most overlap is what we characterized as "localized" and hence, one need only parse a fragment in the alternate hierarchy to compare the alternative hierarchies.(Localized is a term we need to define more precisely but the idea is that a phrase may be a member of a sentence element and a line element that cross (in traditional terminology) and yet share the save text/div/p hierarchy further up the tree. The notion is not original with us but is found in Earley's treatment of this topic in the early 70's. It would be helpful if a measure of localized overlap could be developed to assist in assessment of parallel document parsing.) Thus, assuming that I am using a string search for "That's all folks" and find it in the first hierarchy, all I need do is search for the same phrase in the alternative hierarchy and obtain the node and its parent node where the string occurs. The situation becomes more complex if the string appears in a different document ordering, such as versioning where a <p> element has changed its physical location but I am working on some thoughts about how to handle that in the JITTs model. ARA and parallel parsing: (These are largely assumptions based upon my reading of Gavin's paper from Extreme so possibly inaccurate or incorrect.) ARA parallel parses the entire document in order to build its internal representation of the ranges in the document. In some sense that is not complex, but it certainly poses a certain overhead to using the ARA approach. Once the entire document has been processed, I would expect querying of the ranges to be quite fast. That would not be a drawback with largely static documents and versions of documents, but could pose problems with documents and sets of documents that are not fairly stable. This is not a complexity issue per se, but one of system overhead. (Note that HyTime poses the same solution to building multiple parses of a single document to implement traditional CONCUR. Note 533 of ISO 10744. Not sure how ARA differs from that approach other than in the data format for the information.) Hope the weekend is going well! More meetings today! ;-) Well, they are TEI meetings and so tend to be more lively than your typical academic department meeting although less lively than a beach party. Somewhere in between. ;-) Patrick -- Patrick Durusau Director of Research and Development Society of Biblical Literature pdurusau@e...
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|