[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Does SAX make sense?


Re:  Does SAX make sense?
Jimmy,

zhengyu wrote:

>I have got a weird question in mind that I would like to toss it out.
>
>Suppose there is a way to offer DOM type interface with SAX kind of
>efficiency.
>
Matthew O'Donnnell and I have made a series of presentations on this 
particular issue. Our latest proposal is known as JITTs 
(Just-In-Time-Trees), and you can find presentations/papers at: the 
JITTs homepage, http://www.jitts.org or you can visit our homepage on 
overlapping markup at: http://www.sbl-site2.org/Overlap/.

The basic idea is that markup (and hence trees) are recognized as part 
of processing of a file and has no meaning for a parser until it has 
been told to recognize that particular markup token.

What would be required is to change the order of processing used by most 
(if not all XML parsers) to processing the DTD/Schema first and using 
the resulting tree as the basis for recognition of markup events by SAX. 
(The SAX module then only recognizing markup tokens in the tree.) The 
only problem with that approach that has been suggested to us involves 
directly nested elements, such as <div>blah, blah<div>blah, 
blah</div>blah, blah</div>, but the incidence of such markup is unknown.

The advantage to our approach is that a DomLite tree could be 
constructed that retains the unrecognized markup (unlike a SAX filter) 
and upon retreival of the container (recognized markup), the previously 
unrecognized markup could be processed for presentation to the user. 
Simulated tests of this type of processing indicates substantial gains 
in processing speed over traditional construction of full DOM trees. 
Another advantage is that it operates with standard XML syntax, unlike 
some proposals, such as LMNL, which has its own (non-XML) format.

>How long would it take for the new processing model to become really
>popular?
>  
>
Well, it has not become popular (yet!) but the rise of partial parsing 
XML parsers and the like indicate that the need for something more 
efficient than current processing models for XML. JITTs has been 
criticized because it makes well-formedness a question that is answered 
at the time of processing. Personally, I don't find well-formedness 
apart from recognition at the time of processing by a parser all that 
compelling (or even meaningful). There are substantial advantages to 
meeting the requirements of well-formedness as part of processing.

I think the first successful JITTs parser that can be applied to large 
documents, the usual posts to this list, "I have a 10 MB document and 
need to build a DOM tree...," will force a change in the current "markup 
recognition first, useful document processing later" approach. The whole 
point of markup was to enable the processing of documents, not to create 
artificial limitations to prevent it.

Patrick

>Jimmy
>----- Original Message -----
>From: "Karl Waclawek" <karl@w...>
>To: <xml-dev@l...>
>Sent: Sunday, May 25, 2003 7:00 PM
>Subject: Re:  Does SAX make sense?
>
>
>  
>
>>>There are several implementations, but I don't know of any standard
>>>interface. I have been thinking that having a standard interface just
>>>for passing XPath expressions to an event parser would be great. Anyone
>>>know of a standard being worked, implementations, or interested in
>>>starting a working group? If so, I'm in.
>>>      
>>>
>>I am working on something similar, but much simpler right now.
>>My XPaths are just straight paths, or in other words,  element types.
>>
>>My initial plan was to build a DTD (or other schema) validator
>>(on top of SAX) which has callback hooks for custom validation
>>or processing. The callbacks are registered by the application
>>based on a path - but rather a path based on the schema object
>>model and not the document object model. Every node in the SOM
>>corresponds to a separate set of callbacks.
>>
>>So far I was not thinking of anything more complex, as I think
>>this would be quite an effort.
>>
>>Karl
>>
>>-----------------------------------------------------------------
>>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>initiative of OASIS <http://www.oasis-open.org>
>>
>>The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>>To subscribe or unsubscribe from this list use the subscription
>>manager: <http://lists.xml.org/ob/adm.pl>
>>
>>    
>>
>
>
>-----------------------------------------------------------------
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://lists.xml.org/ob/adm.pl>
>
>  
>

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
Patrick.Durusau@s...
Co-Editor, ISO 13250, Topic Maps -- Reference Model





PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.