[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: parser models


zork parser
Arjun Ray writes:
> A push API shouldn't be too difficult.  By in-memory do you mean some
> analogue of DOM, where all the tokens are held in a structure of some 
> sort (like a parse tree)? 

Right now I'm thinking of something more like a list of tokens from
start to finish than a parse tree.  For example, given the document:

<hello test="zip">tart<zork /></hello>

I might want to have a list of nodes like:

elementStart:hello
attributeStart:test
text:zip
attributeEnd:test
text:tart
elementStart:zork
elementEnd:zork
elementEnd:hello

with a list like that (modulo some issues on whether I want to represent
starts and ends of tags), I can do things like search for all text nodes
containing "tart" quite easily and then build a tree out of the list
components if I feel it appropriate.   There's no need for tree-walking
or the many issues that it creates, though there may well be a need to
combine adjacent nodes according to a relatively simple set of rules.

SAX events, because they are reported as sets of strings (or sets of
strings with attribute structures attached) or characters, aren't easily
stored in such alternative structure.  They're deliberately fleeting
creatures, passing by rapidly with no easy means of storage - except
insofar as we do things like convert their information into DOMs or
other objects.

(MOE is one effort to create tangible events that can be kept around for
longer, possibly but not necessarily as trees, and which can be broken
down into somewhat finer granules than SAX provides, and I'll see what I
can do to support these kinds of options.)

I'd like to be able to play with those events using other styles of
processing.  The list approach above looks promising for some kinds of
problems, especially for querying on content and it's conveniently
tolerant of things like well-formedness failures.  Moving flexibly from
document to events to tree or list to tree or list to events again to
document again sounds interesting.

-------------
Simon St.Laurent - SSL is my TLA
http://simonstl.com may be my URI
http://monasticxml.org may be my ascetic URI
urn:oid:1.3.6.1.4.1.6320 is another possibility altogether

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.