Re: Schema-type-aware SAX processing
Christopher R. Maden wrote: >Surely I am not the first person to try doing this, but I can't seem to >find any prior art nor any straightforward way to do this. > >I have data that may be arbitrarily large and may conform to arbitrary >XSDL schemata. Because of the size, I want to process the document as an >event stream (hence SAX), and I want to make different processing >decisions based on the declared types from the schema and based on the >ultimate base types, if there's any type inheritance. > > Here's an outline of one way to proceed using Xerces (I've only used Xerces-J; I don't know if what follows applies to Xerces-P): It's unclear from your post whether you have all the schemas available in advance. However, it suffices to have parsed the XSD grammars relevant to a particular document (into a grammar pool) before doing what follows. This might involve looking at the namespace of the root element and any xsi:schemaLocation attribute on that element and/or using some custom entity resolver and fetching the relevant grammar and anything it imports or includes. Having found all the grammars, you retrieve the grammar for the root element's namespace from the pool, and convert it to an XSModel (from the XML Schema API as specified on the Worldwide Web Consortium web site). Given the root element's qualified name, you can get its XSElementDeclaration from the XSModel, from there its type declaration, and from there the base types. You might also need to look at any xsi:type attribute on the root element in case the content is specified by a derived type of the declared type. If so, you can examine that derived type declaration also from the information in the XSModel. This can all be done in handling startElement() for the root element. The problem is harder if you want to handle elements deeper down in the document whose association with components in the schema depend upon the details of the grammar. The easiest way to handle these would be to turn on validation and PSVI annotation in your parser, and get the XSElementDeclaration for any element from the PSVI information. Probably you would have to access the PSVI from endElement(). Jeff
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format