[Summary] Eager and Just-in-Time loading of XML Schema documents,compile

From: "Costello, Roger L." <costello@mitre.org>
To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
Date: Sat, 7 Aug 2010 09:46:30 -0400

Play the video

Hi Folks,

Here is a summary of the recent discussions. Please notify me of any errors.  /Roger

--------------------------------------------------------------------------------

The following XML document references two XML Schemas: Library.xsd and Book.xsd

<?xml version="1.0"?>
<Library xmlns="http://www.library.org"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation=
                    "http://www.library.org
                     Library.xsd">
    <Books>
        <Book xmlns=http://www.book.org
              xsi:schemaLocation=
                           "http://www.book.org
                            Book.xsd">
                <Title>My Life and Times</Title>
                <Author>Paul McCartney</Author>
                <Date>1998</Date>
                <ISBN>1-56592-235-2</ISBN>
                <Publisher>Macmillan Publishing</Publisher>
        </Book>
        ... 
    </Books>
</Library>

When does an XML Schema validator load (into memory) the XML Schema documents? When will Library.xsd and Book.xsd be loaded?

Answer: It depends on whether they are coupled or independent. 

-------------------------
        CASE #1
-------------------------
Suppose Library.xsd and Book.xsd are coupled, i.e., Library.xsd imports Book.xsd.

Here's a snippet of Library.xsd:

<xs:import namespace="http://www.book.org" schemaLocation="Book.xsd"/>

<xs:complexType name="BooksType">
   <xs:sequence>
       <xs:element xmlns:bk="http://www.book.org" ref="bk:Book"/>
   </xs:sequence>
</xs:complexType>


Both schemas will be loaded at the same time--when the validator hits the <Library> element.

This is called eager loading. The validator loads the schemas that schemaLocation references, plus (recursively) all the schemas it imports and includes.


-------------------------
        CASE #2
-------------------------
Suppose Library.xsd and Book.xsd are independent. 

Here's a snippet of Library.xsd:

<xs:complexType name="BooksType">
   <xs:sequence>
       <xs:any namespace="http://www.book.org"/>
   </xs:sequence>
</xs:complexType>


Library.xsd will be loaded when the validator hits the <Library> element. Book.xsd won't be loaded until the validator hits the <Book> element.

This is called just-in-time loading. The validator loads the schema only when it's needed. 


-------------------------
        CASE #3
-------------------------
Suppose Library.xsd imports and includes some XML Schemas (but not Book.xsd). 

Here's a snippet of Library.xsd:

<xs:import namespace="http://www.example.org" schemaLocation="Example.xsd"/>

<xs:include schemaLocation="Author.xsd"/>

<xs:include schemaLocation="Title.xsd"/>

<xs:include schemaLocation="Date.xsd"/>


When Library.xsd is loaded, the schemas it imports and includes will also be loaded (eager loading). Book.xsd is not loaded until the validator hits the <Book> element (just-in-time loading). Thus, here we see a combination of eager and just-in-time loading.



I have confirmed that the following XML Schema validators have the eager and just-in-time loading behavior described above: 

    SAXON (Java and .NET) and Xerces-J

I have no information on these validators: 

    Xerces-C++, Xerces-Perl, Libxml, MSXML, or XSV.


EXPLOITING JUST-IN-TIME LOADING TO ENHANCE PERFORMANCE

Consider this scenario: 

1. Your XML document is very large.

2. The XML Schemas that will be used to validate the XML document are independent (or, the XML Schemas can be partitioned into independent sets).

One way to design your XML document is to specify all the XML Schemas upfront:

<?xml version="1.0"?>
<Document xmlns="http://www.library.org"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation=
                    "http://www.s1.org
                     S1.xsd
                     http://www.s2.org
                     S2.xsd 
                     ...
                     http://www.sn.org
                     Sn.xsd"> 

The disadvantage of this approach is that all the schemas will be loaded at once (eager loading). If there are a lot of schemas this could be slow.


A second approach is to specify a schema at the point where it's first needed. This will enable you to exploit the just-in-time loading capability of schema validators. This is illustrated here:

<?xml version="1.0"?>
<Document xmlns="http://www.library.org"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation=
                    "http://www.s1.org
                     S1.xsd">

    <Element-A>...</Element-A>
    <Element-B>...</Element-B>
    <Element-C xsi:schemaLocation=
                    "http://www.s2.org
                     S2.xsd
        <Element-D>...</Element-D>
        <Element-E>...</Element-E>
        ...
    </Element-C>

S1.xsd will be loaded when the validator hits the <Document> element. S2.xsd won't be loaded until the validator hits the <Element-C> element. And so forth. This approach exploits just-in-time loading of XML Schema documents.

If the XML document is streamed then this approach may yield significant performance savings.


USING COMPILED XML SCHEMAS TO ENHANCE PERFORMANCE

Another technique that may be used to enhance performance is to compile the XML Schema documents and save the compiled version. Then, when you want to validate the XML document, you use the compiled file (rather than loading the XML Schema documents, compiling them, and then validating).

SAXON supports this ability to compile schemas. Michael Kay writes:

    With Saxon, for example, I would advise you to save a 
    .SCM file representing the compiled schema; reloading the schema from a 
    .SCM file should be significantly faster than rebuilding it from source 
    schema documents. 

Rich Salz reports that the DataPower products also compile their files first:

    The DataPower products work this way.  XSLT, XSD, WSDL, XACML, etc., files 
    are compiled to object code the first time they're used (or you can 
    pre-load the object cache). Then when actually "used" the object code is 
    executed directly by the CPU(s).

I do not know if the other schema validators provide the option to compile XML Schemas.

Follow-Ups:
- Re: [Summary] Eager and Just-in-Time loading of XML Schema documents, compiled documents, enhancing performance, streaming
  - From: Mukul Gandhi <gandhi.mukul@gmail.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >