[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Schema-type-aware SAX processing

access psvi sax

Christopher R. Maden wrote:

>Surely I am not the first person to try doing this, but I can't seem to
>find any prior art nor any straightforward way to do this.
>I have data that may be arbitrarily large and may conform to arbitrary
>XSDL schemata.  Because of the size, I want to process the document as an
>event stream (hence SAX), and I want to make different processing
>decisions based on the declared types from the schema and based on the
>ultimate base types, if there's any type inheritance.

Here's an outline of one way to proceed using Xerces (I've only used 
Xerces-J; I don't know if what follows applies to Xerces-P):

It's unclear from your post whether you have all the schemas available 
in advance.  However, it suffices to have parsed the XSD grammars 
relevant to a particular document (into a grammar pool) before doing 
what follows.  This might involve looking at the namespace of the root 
element and any xsi:schemaLocation attribute on that element and/or 
using some custom entity resolver and fetching the relevant grammar and 
anything it imports or includes.

Having found all the grammars, you retrieve the grammar for the root 
element's namespace from the pool, and convert it to an XSModel (from 
the XML Schema API as specified on the Worldwide Web Consortium web 
site).  Given the root element's qualified name, you can get its 
XSElementDeclaration from the XSModel, from there its type declaration, 
and from there the base types.  You might also need to look at any 
xsi:type attribute on the root element in case the content is specified 
by a derived type of the declared type.  If so, you can examine that 
derived type declaration also from the information in the XSModel.  This 
can all be done in handling startElement() for the root element.

The problem is harder if you want to handle elements deeper down in the 
document whose association with components in the schema depend upon the 
details of the grammar.  The easiest way to handle these would be to 
turn on validation and PSVI annotation in your parser, and get the 
XSElementDeclaration for any element from the PSVI information.  
Probably you would have to access the PSVI from endElement().



Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.