[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Creating a single XML vocabulary that is appropriately customized to dif

  • From: "Costello, Roger L." <costello@m...>
  • To: <xml-dev@l...>
  • Date: Wed, 9 Jul 2008 10:48:11 -0400

Creating a single XML vocabulary that is appropriately customized to dif

Hi Folks,

I frequently encounter the situation of a community wanting to create a
single XML vocabulary, but within the community are sub-groups that
have different perspectives on what data is relevant and needed. Below
is a discussion on how to deal with this situation.  I am interested in
hearing your thoughts on this.  /Roger


ISSUE

How do you create a single XML vocabulary, and validate that XML
vocabulary, for a community that has sub-groups that have overlapping
but different data needs?


EXAMPLE

Consider the book community.  It is comprised of:

   - book sellers
   - book distributors
   - book printers

They have overlapping, but different data needs.

For example, the data needed by a book seller is:

   - the title of the book
   - the author of the book
   - the date of publication
   - the ISBN
   - the publisher

The book distributor has many of the same data needs, but also some
differences:

   - the title of the book
   - the author of the book
   - the size of the book
   - the weight of the book
   - the mailing cost

And the book printer has overlapping but different needs:

   - the size of the book
   - the number of pages

How does the book community deal with such differing needs?


APPROACH #1 - MAKE EVERYTHING OPTIONAL

One approach is to define a schema where everything is optional, e.g.

---------------------------------------------------------
book.rng
---------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<element name="Book" xmlns="http://relaxng.org/ns/structure/1.0"
         ns="http://www.books.org">
      <optional><element name="Title"><text/></element></optional>
      <optional><element name="Author"><text/></element></optional>
      <optional><element name="Date"><text/></element></optional>
      <optional><element name="ISBN"><text/></element></optional>
      <optional><element name="Publisher"><text/></element></optional>
      <optional><element name="Size"><text/></element></optional>
      <optional><element name="Weight"><text/></element></optional>
      <optional><element
name="MailingCost"><text/></element></optional>
      <optional><element name="NumPages"><text/></element></optional>
</element>

Then, each sub-group in the book community uses just the elements they
need, ignoring the others.  

Thus,

- the book seller creates XML instance documents comprised of Title,
Author, Date, ISBN, and Publisher, e.g.

---------------------------------------------------------
book-seller.xml
---------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<Book xmlns="http://www.books.org">
    <Title>The Wisdom of Crowds</Title>
    <Author>James Surowiecki</Author>
    <Date>2005</Date>
    <ISBN>0-385-72170-6</ISBN>
    <Publisher>Anchor Books</Publisher>
</Book>

- the book distributor creates XML instance documents comprised of
Title, Author, Size, Weight, and MailingCost, e.g.

---------------------------------------------------------
book-distributor.xml
---------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<Book xmlns="http://www.books.org">
    <Title>The Wisdom of Crowds</Title>
    <Author>James Surowiecki</Author>
    <Size>5" x 8"</Size>
    <Weight>15oz</Weight>
    <MailingCost>$3.90</MailingCost>
</Book>

- and the book printer creates XML instance documents comprised of Size
and NumPages, e.g.

---------------------------------------------------------
book-printer.xml
---------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<Book xmlns="http://www.books.org">
    <Size>5" x 8"</Size>
    <NumPages>301</NumPages>
</Book>


DISADVANTAGE

The disadvantage of Approach #1 is that validation is very weak.  For
example, a book seller may accidentally add NumPages to his instance
document:

<?xml version="1.0" encoding="UTF-8"?>
<Book xmlns="http://www.books.org">
    <Title>The Wisdom of Crowds</Title>
    <Author>James Surowiecki</Author>
    <Date>2005</Date>
    <ISBN>0-385-72170-6</ISBN>
    <Publisher>Anchor Books</Publisher>
    <NumPages>301</NumPages>
</Book>

Validation would not catch this error.


APPROACH #2 - LAYERED VALIDATION

On July 7, 2008 Rick Jelliffe wrote on the xml-dev list:

> start off with a generic and open/extensible schema, and 
> to put version constraints as another layer (you guessed
it...Schematron).

Yes!

That's it Rick!

I will use the generic grammar-based schema above, and then add a
Schematron business-rules layer on top to constrain it appropriately.

Here is the Schematron schema that applies the constraints needed by
the book seller:

---------------------------------------------------------
book-seller.sch
---------------------------------------------------------
<?xml version="1.0"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">

   <sch:ns uri="http://www.books.org"
           prefix="bk" />

   <sch:pattern name="Book Sellers">

      <sch:p>The book data required for a seller is 
             title, author, date, ISBN, and publisher.</sch:p> 

      <sch:rule context="bk:Book">

         <sch:assert test="count(bk:Title) = 1 and
                           count(bk:Author) = 1 and
                           count(bk:Date) = 1 and
                           count(bk:ISBN) = 1 and
                           count(bk:Publisher) = 1 and
                           count(*[not(self::bk:Title or 
                                       self::bk:Author or 
                                       self::bk:Date or 
                                       self::bk:ISBN or 
                                       self::bk:Publisher)]) = 0">
             The book data required for a seller is 
             title, author, date, ISBN, and publisher.
         </sch:assert>

      </sch:rule>

   </sch:pattern>

</sch:schema>

Here is the Schematron schema that applies the constraints needed by
the book distributor:

---------------------------------------------------------
book-distributor.sch
---------------------------------------------------------
<?xml version="1.0"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">

   <sch:ns uri="http://www.books.org"
           prefix="bk" />

   <sch:pattern name="Book Distributors">

      <sch:p>The book data required for a distributor is 
             title, author, size, weight, and mailing cost.</sch:p> 

      <sch:rule context="bk:Book">

         <sch:assert test="count(bk:Title) = 1 and
                           count(bk:Author) = 1 and
                           count(bk:Size) = 1 and
                           count(bk:Weight) = 1 and
                           count(bk:MailingCost) = 1 and
                           count(*[not(self::bk:Title or 
                                       self::bk:Author or 
                                       self::bk:Size or 
                                       self::bk:Weight or 
                                       self::bk:MailingCost)]) = 0">
             The book data required for a seller is 
             title, author, size, weight, and mailing cost.
         </sch:assert>

      </sch:rule>

   </sch:pattern>

</sch:schema>

And here is the Schematron schema that applies the constraints needed
by the book printer:

---------------------------------------------------------
book-printer.sch
---------------------------------------------------------
<?xml version="1.0"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">

   <sch:ns uri="http://www.books.org"
           prefix="bk" />

   <sch:pattern name="Book Distributors">

      <sch:p>The book data required for a printer is 
             the size and number of pages.</sch:p> 

      <sch:rule context="bk:Book">

         <sch:assert test="count(bk:Size) = 1 and
                           count(bk:NumPages) = 1 and
                           count(*[not(self::bk:Size or 
                                       self::bk:NumPages)]) = 0">
             The book data required for a printer is 
             the size and number of pages.
         </sch:assert>

      </sch:rule>

   </sch:pattern>

</sch:schema>

ADVANTAGES

Now we have strong validation. If a book seller accidentally adds
NumPages, the error will be caught.

This approach separates the definition of the community's XML
vocabulary from the constraints needed by each sub-group within the
community.  There is a nice separation of concerns.  New Schematron
rules can be added to support new data business needs.  The grammar
schema that defines the XML vocabulary - book.rng - is simple and easy
to maintain.


CONCURRENT VALIDATION

Now, to tie things together, what is needed is to validate an XML
instance document against the grammar-based schema plus the appropriate
Schematron schema.  This "concurrent validation" is nicely accomplished
using NVDL.

---------------------------------------------------------
book-seller.nvdl
--------------------------------------------------------- 
<?xml version="1.0"?>
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0">

   <namespace ns="http://www.books.org">
     <validate schema="book.rng" />
     <validate schema="book-seller.sch" />
   </namespace>

</rules>

---------------------------------------------------------
book-distributor.nvdl
--------------------------------------------------------- 
<?xml version="1.0"?>
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0">

   <namespace ns="http://www.books.org">
     <validate schema="book.rng" />
     <validate schema="book-distributor.sch" />
   </namespace>

</rules>

---------------------------------------------------------
book-printer.nvdl
--------------------------------------------------------- 
<?xml version="1.0"?>
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0">

   <namespace ns="http://www.books.org">
     <validate schema="book.rng" />
     <validate schema="book-printer.sch" />
   </namespace>

</rules>


SUMMARY

The above discussion illustrates how a community can create a single
XML vocabulary that can be appropriately customized to the needs of
differing sub-groups within the community.  The approach used is a
layering approach.  A simple grammar-based schema defines the XML
vocabulary.  Schematron rules are defined to constrain the XML
vocabulary in a way appropriate to each sub-group within the community.
And NVDL is used to tie together the grammar-based schema with the
Schematron schema.  
   


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.