[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] More on taming SAX (was Re: ANN: Amara XML Toolkit 0.9.0)
On Thu, 2004-12-23 at 01:45 -0700, Uche Ogbuji wrote: > On Thu, 2004-12-23 at 00:53 -0500, John Cowan wrote: > > Uche Ogbuji scripsit: > > > > > Tenorsax (amara.saxtools.tenorsax) is a framework for "linerarizing" > > > SAX logic so that it flows more naturally, and needs a lot less state > > > machine wizardry. > > > > This sounds *very* interesting. Is there a more detailed writeup somewhere? While on the topic of SAX taming features in Amara, there is also amara.saxtools.xpattern_sax_state_machine, which I didn't even bother mentioning in the announcement (too much to cram in). This module takes an XPattern (e.g. "/xbel/folder/bookmark") and generates a state machine which can be plugged into any regular SAX handler. In this way, you can automatically look for certain XPatterns which have interesting bits of code for you to process, and ignore the rest. This is sort of the opposite of Tenorsax: embrace the state machine, but automate it, rather than sweeping it unto a fancy framework. amara.domtools.pushdom uses this state machine generator to provide a function where you specify a set of XPatterns, and get back a series of DOM chunks in series from the SAX parse. It's like a pulldom, but a *lot* simpler (and more declarative). So the following three lines are *complete* code for printing all links in a, XBEL file: from amara.domtools import pushdom for docfrag in pushdom("bookmark", xbel_file): print docfrag.firstChild.getAttributeNS(None, 'href') And what's more, no more than the amount of DOM needed to represent each bookmark node is in memory at any given time (i.e. similar, friendly memory usage as SAX). If you had a terabyte XBEL file, this code would still only take up a few KB of RAM. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Full XML Indexes with Gnosis - http://www.xml.com/pub/a/2004/12/08/py-xml.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 UBL 1.0 - http://www-106.ibm.com/developerworks/xml/library/x-think28.html Use Universal Feed Parser to tame RSS - http://www.ibm.com/developerworks/xml/library/x-tipufp.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/ The State of Python-XML in 2004 - http://www.xml.com/pub/a/2004/10/13/py-xml.html
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|