[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: ANN: Amara XML Toolkit 0.9.0
On Thu, 2004-12-23 at 00:53 -0500, John Cowan wrote: > Uche Ogbuji scripsit: > > > Tenorsax (amara.saxtools.tenorsax) is a framework for "linerarizing" > > SAX logic so that it flows more naturally, and needs a lot less state > > machine wizardry. > > This sounds *very* interesting. Is there a more detailed writeup somewhere? Heh. I should have known. My focus in documentation was the Bindery (data binding) stuff (which I think is very well documented) because I figured the initial audience for Amara would be the typical Python programmer who grimaces any time he has to deal with that smells to XMLish (SAX and DOM are contemptible Java-isms to many Pythoneers, and don't even get them started on that bloated XSLT thingy). Anyway, in focusing on documenting the ultra-Python-friendly Bindery I did end up neglecting the other parts a bit. I plan to catch up, and in fact, I plan to treat Tenorsax as a main topic in my upcoming O'Reilly article [1], which will cover Amara. Just to give an idea of the technique, however, I'll post a few methods of a sample Tenorsax handler First a trivial case, just to set the scene: def handle_meta(self, end_condition): name = self.params.get((None, 'name')) content = self.params.get((None, 'content')) print "Meta name:", name, " content:" print content yield None raise StopIteration This method handles XHTML meta tags: worries only about attributes and ignores content. end_condition is Tenorsax plumbing. More on it in a bit. The first 4 function body lines just grab attribute values and print them to console. self.params within a Tenorsax handler always holds the current SAX event. Of course, the key to Tenorsax linearization is that you actually see multiple SAX events within a single method call [2]. Even in this simple handler you see 2 events. The start meta tag comes, and then the "yield None" hands control back to Tenorsax, and then upon the end meta tag, the code immediately after that line resumes, with all the local state intact. This means that a lot of variables you would have usually had to manage across methods in plain old SAX become local variables in Tenorsax. the "raise StopIteration" basically signals back to the framework "we're done here". On to a more interesting handler: def handle_p(self, end_condition): yield None content = u'' while not self.event == end_condition: if self.event[0] == saxtools.CHARACTER_DATA: content += self.params yield None #Element closed. Wrap up print "Document content para:", content raise StopIteration This time it's a p element, and it has content, so we get to see multiple interesting events in one handler. The start tag isn't interesting, so we immediately pass control back to Tenorsax ("yield None"). Then content is a local variable that will aggregate the text content of the p, which could come in multiple text events. end_condition now comes into play: it's tenorsax's way of letting each handler method know what event signals the end of its scope (e.g. the event for close p tag in this case) [3]. Each child text event results in another iteration of the loop, and once the end tag is seen, we print the accumulated content. Finally, to show more of how handlers are invoked, here's the html:html handler: def handle_html(self, end_condition): dispatcher = { (pulldom.START_ELEMENT, XHTML_NS, u'head'): self.handle_head, (pulldom.START_ELEMENT, XHTML_NS, u'body'): self.handle_body, } #Initial call corresponds to the start html element curr_gen = None yield None while not self.event == end_condition: curr_gen = tenorsax.standard_body(dispatcher, curr_gen, self.event) yield None #Element closed. Wrap up raise StopIteration dispatcher is a Python dictionary which maps events to handlers. In this case, head start tags get delegated to the self.handle_head method and body start tags to the self.handle_body method. The curr_gen stuff is an unfortunate bit of boilerplate I have not yet been able to refine away (working on it). Every now and then I wish Python had macros. They would help a lot here. tenorsax.standard_body automatically checks the current event to see if there's a match for delegating to one of the methods indicated in dispatcher. I'd like to tidy things up a tad bit more, but as it is, I have found Tenorsax to be a huge help in writing SAX programs quickly. The Scimitar code that translates Schematron to Python code is implemented in only about 400 lines of Python code (excluding comments, spacing, etc.), and this includes all the Python skeleton code for emitted validator scripts. I tried implementing it in plain SAX at first. It was running to 2-3 times the code length and my brain was on the verge of explosion from the state machine logic. Anyway, thanks for asking, and thus helping me seed the documentation. More on Tenorsax to come, for sure, because I do think many will find it very useful. [1] http://www.xml.com/pub/au/84 [2] For those who care about the nuts and bolts the trick here is basically a semi-co-routine arrangement between the Tenorsax framework and each handler method in turn. This is made possible by Python generators. Full co-routines are not really in the cards with Python at present, but I'm not convinced they'd make more than a cosmetic difference. [3] This is a simplified case that doesn't handle nested p tags. Supporting nesting is a pretty simple matter. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Full XML Indexes with Gnosis - http://www.xml.com/pub/a/2004/12/08/py-xml.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 UBL 1.0 - http://www-106.ibm.com/developerworks/xml/library/x-think28.html Use Universal Feed Parser to tame RSS - http://www.ibm.com/developerworks/xml/library/x-tipufp.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/ The State of Python-XML in 2004 - http://www.xml.com/pub/a/2004/10/13/py-xml.html
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|