[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

More on taming SAX (was Re: ANN: Amara XML Toolkit 0.9.0)


xml toolkit ibm memory
On Thu, 2004-12-23 at 01:45 -0700, Uche Ogbuji wrote:
> On Thu, 2004-12-23 at 00:53 -0500, John Cowan wrote:
> > Uche Ogbuji scripsit:
> > 
> > > Tenorsax (amara.saxtools.tenorsax) is a framework for "linerarizing"
> > > SAX logic so that it flows more naturally, and needs a lot less state
> > > machine wizardry.
> > 
> > This sounds *very* interesting.  Is there a more detailed writeup somewhere?

While on the topic of SAX taming features in Amara, there is also
amara.saxtools.xpattern_sax_state_machine, which I didn't even bother
mentioning in the announcement (too much to cram in).

This module takes an XPattern (e.g. "/xbel/folder/bookmark") and
generates a state machine which can be plugged into any regular SAX
handler.  In this way, you can automatically look for certain XPatterns
which have interesting bits of code for you to process, and ignore the
rest.  This is sort of the opposite of Tenorsax: embrace the state
machine, but automate it, rather than sweeping it unto a fancy
framework.

amara.domtools.pushdom uses this state machine generator to provide a
function where you specify a set of XPatterns, and get back a series of
DOM chunks in series from the SAX parse.  It's like a pulldom, but a
*lot* simpler (and more declarative).  So the following three lines are
*complete* code for printing all links in a, XBEL file:

from amara.domtools import pushdom
for docfrag in pushdom("bookmark", xbel_file):
    print docfrag.firstChild.getAttributeNS(None, 'href')

And what's more, no more than the amount of DOM needed to represent each
bookmark node is in memory at any given time (i.e. similar, friendly
memory usage as SAX).  If you had a terabyte XBEL file, this code would
still only take up a few KB of RAM.


-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html
Full XML Indexes with Gnosis - http://www.xml.com/pub/a/2004/12/08/py-xml.html
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
UBL 1.0 - http://www-106.ibm.com/developerworks/xml/library/x-think28.html
Use Universal Feed Parser to tame RSS - http://www.ibm.com/developerworks/xml/library/x-tipufp.html
Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html
A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/
The State of Python-XML in 2004 - http://www.xml.com/pub/a/2004/10/13/py-xml.html


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.