[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: ANN: Amara XML Toolkit 0.9.0


jackson structured programming
Interesting. We seem to be rediscovering co-routines, plus a lot of other
machinery from Jackson structured programming. It's a powerful solution to
the push-pull dilemma, but it does need support at the programming language
level (because the process has multiple stacks). I tried to do something
similar in a very early version of Saxon, but it relied on Java threads and
became very unwieldy.

Of course if you move to a higher level of programming (say XSLT or XQuery)
then the push-pull decisions, and the mechanisms used to handle push-pull
conflicts, get hidden under the covers and programmers don't need to worry
about them.

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Uche Ogbuji [mailto:uche.ogbuji@f...] 
> Sent: 23 December 2004 08:45
> To: John Cowan
> Cc: xml-dev@l...
> Subject: Re:  ANN: Amara XML Toolkit 0.9.0
> 
> On Thu, 2004-12-23 at 00:53 -0500, John Cowan wrote:
> > Uche Ogbuji scripsit:
> > 
> > > Tenorsax (amara.saxtools.tenorsax) is a framework for 
> "linerarizing"
> > > SAX logic so that it flows more naturally, and needs a 
> lot less state
> > > machine wizardry.
> > 
> > This sounds *very* interesting.  Is there a more detailed 
> writeup somewhere?
> 
> Heh.  I should have known.  My focus in documentation was the Bindery
> (data binding) stuff (which I think is very well documented) because I
> figured the initial audience for Amara would be the typical Python
> programmer who grimaces any time he has to deal with that smells to
> XMLish (SAX and DOM are contemptible Java-isms to many Pythoneers, and
> don't even get them started on that bloated XSLT thingy).
> 
> Anyway, in focusing on documenting the ultra-Python-friendly Bindery I
> did end up neglecting the other parts a bit.  I plan to catch 
> up, and in
> fact, I plan to treat Tenorsax as a main topic in my upcoming O'Reilly
> article [1], which will cover Amara.
> 
> Just to give an idea of the technique, however, I'll post a 
> few methods
> of a sample Tenorsax handler
> 
> First a trivial case, just to set the scene:
> 
>     def handle_meta(self, end_condition):
>         name = self.params.get((None, 'name'))
>         content = self.params.get((None, 'content'))
>         print "Meta name:", name, " content:"
>         print content
>         yield None
>         raise StopIteration
> 
> This method handles XHTML meta tags: worries only about attributes and
> ignores content.
> 
> end_condition is Tenorsax plumbing.  More on it in a bit.  The first 4
> function body lines just grab attribute values and print them to
> console.  self.params within a Tenorsax handler always holds 
> the current
> SAX event.  Of course, the key to Tenorsax linearization is that you
> actually see multiple SAX events within a single method call 
> [2].  Even
> in this simple handler you see 2 events.  The start meta tag 
> comes, and
> then the "yield None" hands control back to Tenorsax, and 
> then upon the
> end meta tag, the code immediately after that line resumes, 
> with all the
> local state intact.  This means that a lot of variables you would have
> usually had to manage across methods in plain old SAX become local
> variables in Tenorsax.  the "raise StopIteration" basically 
> signals back
> to the framework "we're done here".
> 
> On to a more interesting handler:
> 
>     def handle_p(self, end_condition):
>         yield None
>         content = u''
>         while not self.event == end_condition:
>             if self.event[0] == saxtools.CHARACTER_DATA:
>                 content += self.params
>             yield None
>         #Element closed.  Wrap up
>         print "Document content para:", content
>         raise StopIteration
> 
> This time it's a p element, and it has content, so we get to see
> multiple interesting events in one handler.
> 
> The start tag isn't interesting, so we immediately pass 
> control back to
> Tenorsax ("yield None").  Then content is a local variable that will
> aggregate the text content of the p, which could come in multiple text
> events.  end_condition now comes into play: it's tenorsax's way of
> letting each handler method know what event signals the end 
> of its scope
> (e.g. the event for close p tag in this case) [3].  Each child text
> event results in another iteration of the loop, and once the 
> end tag is
> seen, we print the accumulated content.
> 
> Finally, to show more of how handlers are invoked, here's the 
> html:html
> handler:
> 
>     def handle_html(self, end_condition):
>         dispatcher = {
>             (pulldom.START_ELEMENT, XHTML_NS, u'head'):
>             self.handle_head,
>             (pulldom.START_ELEMENT, XHTML_NS, u'body'):
>             self.handle_body,
>             }
>         #Initial call corresponds to the start html element
>         curr_gen = None
>         yield None
>         while not self.event == end_condition:
>             curr_gen = tenorsax.standard_body(dispatcher, curr_gen,
> self.event)
>             yield None
>         #Element closed.  Wrap up
>         raise StopIteration
> 
> dispatcher is a Python dictionary which maps events to handlers.  In
> this case, head start tags get delegated to the 
> self.handle_head method
> and body start tags to the self.handle_body method.  The 
> curr_gen stuff
> is an unfortunate bit of boilerplate I have not yet been able 
> to refine
> away (working on it).  Every now and then I wish Python had macros.
> They would help a lot here.  tenorsax.standard_body 
> automatically checks
> the current event to see if there's a match for delegating to 
> one of the
> methods indicated in dispatcher.
> 
> I'd like to tidy things up a tad bit more, but as it is, I have found
> Tenorsax to be a huge help in writing SAX programs quickly.  The
> Scimitar code that translates Schematron to Python code is implemented
> in only about 400 lines of Python code (excluding comments, spacing,
> etc.), and this includes all the Python skeleton code for emitted
> validator scripts.  I tried implementing it in plain SAX at first.  It
> was running to 2-3 times the code length and my brain was on the verge
> of explosion from the state machine logic.
> 
> Anyway, thanks for asking, and thus helping me seed the documentation.
> More on Tenorsax to come, for sure, because I do think many 
> will find it
> very useful.
> 
> [1] http://www.xml.com/pub/au/84
> 
> [2] For those who care about the nuts and bolts the trick here is
> basically a semi-co-routine arrangement between the Tenorsax framework
> and each handler method in turn.  This is made possible by Python
> generators.  Full co-routines are not really in the cards 
> with Python at
> present, but I'm not convinced they'd make more than a cosmetic
> difference.
> 
> [3] This is a simplified case that doesn't handle nested p tags.
> Supporting nesting is a pretty simple matter.
> 
> -- 
> Uche Ogbuji                                    Fourthought, Inc.
> http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
> Use CSS to display XML - 
> http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html
> Full XML Indexes with Gnosis - 
> http://www.xml.com/pub/a/2004/12/08/py-xml.html
> Be humble, not imperial (in design) - 
> http://www.adtmag.com/article.asp?id=10286
> UBL 1.0 - 
> http://www-106.ibm.com/developerworks/xml/library/x-think28.html
> Use Universal Feed Parser to tame RSS - 
> http://www.ibm.com/developerworks/xml/library/x-tipufp.html
> Default and error handling in XSLT lookup tables - 
> http://www.ibm.com/developerworks/xml/library/x-tiplook.html
> A survey of XML standards - 
> http://www-106.ibm.com/developerworks/xml/library/x-stand4/
> The State of Python-XML in 2004 - 
> http://www.xml.com/pub/a/2004/10/13/py-xml.html
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
> 
> 


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.