[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: parser models


make parser
Aleksander Slominski <aslom@c...> wrote:
| Arjun Ray wrote:

| i am not sure how many functions are needed when processing XML?
| what comes to mind is tokenize XML, produce XML events and
| process them doing _something_ ...

What "XML events" are to be produced, though? ;-)

The granddaddy of all "parser event models" in this line of work is ESIS.
You can adopt it, elaborate it, or simplify it.  That's taking the view of
"what can we get out of an XML document?".  OTOH, for applications, the
view is "what do we want from an XML document (assuming it can be had)?"
That's where frameworks come in.  The mistake is to try to make the parser
event model directly "useful" to applications.  It really need not be so.

|>  http://pobox.com/~oleg/ftp/papers/XML-parsing.ps.gz
|>
|> Passing "seeds" up and down a tree is similar to the patterns I'm trying
|> to develop.
| 
| i remember this paper. it has a questionable comparison of expat
| that uses reading input char-by-char (instead of buffered stream)

Expat doesn't read input, so buffered stream is irrelevant.  Expat gets
its input pushed to it (i.e. the app repeatedly calls expat with chunks of
input.)  Oleg's modification was to pass Expat input chunks at a time of
one character each, to simulate a similar input system in SSAX.  If you
say that's all really artificial, I agree (the real question would be why
SSAX can't accept larger input chunks!), but it was pretty clear that he
was trying to avoid a nonsensical benchmark.  All he got was an irrelevant
one. ;-)

 http://okmij.org/ftp/Scheme/SSAX-benchmark-1.html

| one thing i did not get: isn't "seed" global variable that is shared by
| all handlers in SSAX:make-parser/foldts? 

No.  Scheme and Haskell are lexically scoped, and a global would be silly
anyway.  If you're thinking of the example that has

 (let ((result
         ((SSAX:make-parser
            NEW-LEVEL-SEED
            (lambda (elem-gi attributes namespaces expected-content seed)
               seed)
   ...

That was only an example.  The paper has two more examples: the lambda  
expression is supposed to be provided by the particular application.


| also how handling of dispatching descisions is done, for example if 
| <table> may contain both <th> and <tr> in any order ...

That's what the seed functions are all about.

[Note, btw: I'm not *endorsing* SSAX, I'm just saying that it has some
interesting ideas behind it.  The thread *is* about parser models, right?]
 
| so i think i will need to wait and see an example where Element/Content
| framework works to see its full potential ...

Fairing out the project hasn't reached top-of-stack status yet. ;-) 

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.