[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: ANN: Amara XML Toolkit 0.9.0
Interesting. We seem to be rediscovering co-routines, plus a lot of other machinery from Jackson structured programming. It's a powerful solution to the push-pull dilemma, but it does need support at the programming language level (because the process has multiple stacks). I tried to do something similar in a very early version of Saxon, but it relied on Java threads and became very unwieldy. Of course if you move to a higher level of programming (say XSLT or XQuery) then the push-pull decisions, and the mechanisms used to handle push-pull conflicts, get hidden under the covers and programmers don't need to worry about them. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Uche Ogbuji [mailto:uche.ogbuji@f...] > Sent: 23 December 2004 08:45 > To: John Cowan > Cc: xml-dev@l... > Subject: Re: ANN: Amara XML Toolkit 0.9.0 > > On Thu, 2004-12-23 at 00:53 -0500, John Cowan wrote: > > Uche Ogbuji scripsit: > > > > > Tenorsax (amara.saxtools.tenorsax) is a framework for > "linerarizing" > > > SAX logic so that it flows more naturally, and needs a > lot less state > > > machine wizardry. > > > > This sounds *very* interesting. Is there a more detailed > writeup somewhere? > > Heh. I should have known. My focus in documentation was the Bindery > (data binding) stuff (which I think is very well documented) because I > figured the initial audience for Amara would be the typical Python > programmer who grimaces any time he has to deal with that smells to > XMLish (SAX and DOM are contemptible Java-isms to many Pythoneers, and > don't even get them started on that bloated XSLT thingy). > > Anyway, in focusing on documenting the ultra-Python-friendly Bindery I > did end up neglecting the other parts a bit. I plan to catch > up, and in > fact, I plan to treat Tenorsax as a main topic in my upcoming O'Reilly > article [1], which will cover Amara. > > Just to give an idea of the technique, however, I'll post a > few methods > of a sample Tenorsax handler > > First a trivial case, just to set the scene: > > def handle_meta(self, end_condition): > name = self.params.get((None, 'name')) > content = self.params.get((None, 'content')) > print "Meta name:", name, " content:" > print content > yield None > raise StopIteration > > This method handles XHTML meta tags: worries only about attributes and > ignores content. > > end_condition is Tenorsax plumbing. More on it in a bit. The first 4 > function body lines just grab attribute values and print them to > console. self.params within a Tenorsax handler always holds > the current > SAX event. Of course, the key to Tenorsax linearization is that you > actually see multiple SAX events within a single method call > [2]. Even > in this simple handler you see 2 events. The start meta tag > comes, and > then the "yield None" hands control back to Tenorsax, and > then upon the > end meta tag, the code immediately after that line resumes, > with all the > local state intact. This means that a lot of variables you would have > usually had to manage across methods in plain old SAX become local > variables in Tenorsax. the "raise StopIteration" basically > signals back > to the framework "we're done here". > > On to a more interesting handler: > > def handle_p(self, end_condition): > yield None > content = u'' > while not self.event == end_condition: > if self.event[0] == saxtools.CHARACTER_DATA: > content += self.params > yield None > #Element closed. Wrap up > print "Document content para:", content > raise StopIteration > > This time it's a p element, and it has content, so we get to see > multiple interesting events in one handler. > > The start tag isn't interesting, so we immediately pass > control back to > Tenorsax ("yield None"). Then content is a local variable that will > aggregate the text content of the p, which could come in multiple text > events. end_condition now comes into play: it's tenorsax's way of > letting each handler method know what event signals the end > of its scope > (e.g. the event for close p tag in this case) [3]. Each child text > event results in another iteration of the loop, and once the > end tag is > seen, we print the accumulated content. > > Finally, to show more of how handlers are invoked, here's the > html:html > handler: > > def handle_html(self, end_condition): > dispatcher = { > (pulldom.START_ELEMENT, XHTML_NS, u'head'): > self.handle_head, > (pulldom.START_ELEMENT, XHTML_NS, u'body'): > self.handle_body, > } > #Initial call corresponds to the start html element > curr_gen = None > yield None > while not self.event == end_condition: > curr_gen = tenorsax.standard_body(dispatcher, curr_gen, > self.event) > yield None > #Element closed. Wrap up > raise StopIteration > > dispatcher is a Python dictionary which maps events to handlers. In > this case, head start tags get delegated to the > self.handle_head method > and body start tags to the self.handle_body method. The > curr_gen stuff > is an unfortunate bit of boilerplate I have not yet been able > to refine > away (working on it). Every now and then I wish Python had macros. > They would help a lot here. tenorsax.standard_body > automatically checks > the current event to see if there's a match for delegating to > one of the > methods indicated in dispatcher. > > I'd like to tidy things up a tad bit more, but as it is, I have found > Tenorsax to be a huge help in writing SAX programs quickly. The > Scimitar code that translates Schematron to Python code is implemented > in only about 400 lines of Python code (excluding comments, spacing, > etc.), and this includes all the Python skeleton code for emitted > validator scripts. I tried implementing it in plain SAX at first. It > was running to 2-3 times the code length and my brain was on the verge > of explosion from the state machine logic. > > Anyway, thanks for asking, and thus helping me seed the documentation. > More on Tenorsax to come, for sure, because I do think many > will find it > very useful. > > [1] http://www.xml.com/pub/au/84 > > [2] For those who care about the nuts and bolts the trick here is > basically a semi-co-routine arrangement between the Tenorsax framework > and each handler method in turn. This is made possible by Python > generators. Full co-routines are not really in the cards > with Python at > present, but I'm not convinced they'd make more than a cosmetic > difference. > > [3] This is a simplified case that doesn't handle nested p tags. > Supporting nesting is a pretty simple matter. > > -- > Uche Ogbuji Fourthought, Inc. > http://uche.ogbuji.net http://4Suite.org http://fourthought.com > Use CSS to display XML - > http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html > Full XML Indexes with Gnosis - > http://www.xml.com/pub/a/2004/12/08/py-xml.html > Be humble, not imperial (in design) - > http://www.adtmag.com/article.asp?id=10286 > UBL 1.0 - > http://www-106.ibm.com/developerworks/xml/library/x-think28.html > Use Universal Feed Parser to tame RSS - > http://www.ibm.com/developerworks/xml/library/x-tipufp.html > Default and error handling in XSLT lookup tables - > http://www.ibm.com/developerworks/xml/library/x-tiplook.html > A survey of XML standards - > http://www-106.ibm.com/developerworks/xml/library/x-stand4/ > The State of Python-XML in 2004 - > http://www.xml.com/pub/a/2004/10/13/py-xml.html > > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://www.oasis-open.org/mlmanage/index.php> > >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|