[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: parser models
Arjun Ray writes: > A push API shouldn't be too difficult. By in-memory do you mean some > analogue of DOM, where all the tokens are held in a structure of some > sort (like a parse tree)? Right now I'm thinking of something more like a list of tokens from start to finish than a parse tree. For example, given the document: <hello test="zip">tart<zork /></hello> I might want to have a list of nodes like: elementStart:hello attributeStart:test text:zip attributeEnd:test text:tart elementStart:zork elementEnd:zork elementEnd:hello with a list like that (modulo some issues on whether I want to represent starts and ends of tags), I can do things like search for all text nodes containing "tart" quite easily and then build a tree out of the list components if I feel it appropriate. There's no need for tree-walking or the many issues that it creates, though there may well be a need to combine adjacent nodes according to a relatively simple set of rules. SAX events, because they are reported as sets of strings (or sets of strings with attribute structures attached) or characters, aren't easily stored in such alternative structure. They're deliberately fleeting creatures, passing by rapidly with no easy means of storage - except insofar as we do things like convert their information into DOMs or other objects. (MOE is one effort to create tangible events that can be kept around for longer, possibly but not necessarily as trees, and which can be broken down into somewhat finer granules than SAX provides, and I'll see what I can do to support these kinds of options.) I'd like to be able to play with those events using other styles of processing. The list approach above looks promising for some kinds of problems, especially for querying on content and it's conveniently tolerant of things like well-formedness failures. Moving flexibly from document to events to tree or list to tree or list to events again to document again sounds interesting. ------------- Simon St.Laurent - SSL is my TLA http://simonstl.com may be my URI http://monasticxml.org may be my ascetic URI urn:oid:1.3.6.1.4.1.6320 is another possibility altogether
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|