[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: [Summary] Eager and Just-in-Time loading of XML Schema doc

  • From: Michael Kay <mike@saxonica.com>
  • To: xml-dev@lists.xml.org
  • Date: Sat, 07 Aug 2010 18:41:07 +0100

Re:  [Summary] Eager and Just-in-Time loading of XML Schema	doc
 >But if you're loading the same schema over and over again on each 
validation episode it can be very expensive and have seen many scenarios 
(particularly industry standards) where the set of schema documents are 
several orders of magnitude larger than the typical instance documents 
being validated.

Yes, that is certainly true of FpML to take one example. Most instance 
documents use a tiny subset of the declarations defined in the schema, 
because they cover one kind of financial transaction when the schema 
allows for hundreds of different kinds.

But that's not the only reason. Loading a schema involves a lot more 
than just parsing the source XML documents that define the schema. It's 
necessary to validate that the schema meets all the constraints defined 
in the spec, some of which (like the rules for UPA and for A being a 
valid restriction of B) are highly complex; and it's typically necessary 
to generate and determinize finite state automata for each complex type 
defined in the schema: in the worst case, the memory and processing 
requirements of the textbook algorithms for doing this can be very high.

(Saxon actually creates the FSA for each complex type in the schema 
eagerly, rather than waiting until an instance of that type needs to be 
validated. That's because some errors in the schema, for example UPA 
violation, are detected as a spin-off of the algorithm for FSA 
generation; and I don't like the idea of detecting and reporting schema 
errors during instance validation, especially while validating the 100th 
instance document when 99 others have already been successfully 
validated. This might be a case where a user switch could help: if the 
user is prepared to assert that the schema is already known to be valid, 
Saxon could organize the processing in a way that trades better 
performance for worse error diagnostics.)

Michael Kay
Saxonica



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.