[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: parser models
Aleksander Slominski <aslom@c...> wrote: | Arjun Ray wrote: | i am not sure how many functions are needed when processing XML? | what comes to mind is tokenize XML, produce XML events and | process them doing _something_ ... What "XML events" are to be produced, though? ;-) The granddaddy of all "parser event models" in this line of work is ESIS. You can adopt it, elaborate it, or simplify it. That's taking the view of "what can we get out of an XML document?". OTOH, for applications, the view is "what do we want from an XML document (assuming it can be had)?" That's where frameworks come in. The mistake is to try to make the parser event model directly "useful" to applications. It really need not be so. |> http://pobox.com/~oleg/ftp/papers/XML-parsing.ps.gz |> |> Passing "seeds" up and down a tree is similar to the patterns I'm trying |> to develop. | | i remember this paper. it has a questionable comparison of expat | that uses reading input char-by-char (instead of buffered stream) Expat doesn't read input, so buffered stream is irrelevant. Expat gets its input pushed to it (i.e. the app repeatedly calls expat with chunks of input.) Oleg's modification was to pass Expat input chunks at a time of one character each, to simulate a similar input system in SSAX. If you say that's all really artificial, I agree (the real question would be why SSAX can't accept larger input chunks!), but it was pretty clear that he was trying to avoid a nonsensical benchmark. All he got was an irrelevant one. ;-) http://okmij.org/ftp/Scheme/SSAX-benchmark-1.html | one thing i did not get: isn't "seed" global variable that is shared by | all handlers in SSAX:make-parser/foldts? No. Scheme and Haskell are lexically scoped, and a global would be silly anyway. If you're thinking of the example that has (let ((result ((SSAX:make-parser NEW-LEVEL-SEED (lambda (elem-gi attributes namespaces expected-content seed) seed) ... That was only an example. The paper has two more examples: the lambda expression is supposed to be provided by the particular application. | also how handling of dispatching descisions is done, for example if | <table> may contain both <th> and <tr> in any order ... That's what the seed functions are all about. [Note, btw: I'm not *endorsing* SSAX, I'm just saying that it has some interesting ideas behind it. The thread *is* about parser models, right?] | so i think i will need to wait and see an example where Element/Content | framework works to see its full potential ... Fairing out the project hasn't reached top-of-stack status yet. ;-)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|