[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Creating a single XML vocabulary that is appropriately customized to dif
Hi Folks, I frequently encounter the situation of a community wanting to create a single XML vocabulary, but within the community are sub-groups that have different perspectives on what data is relevant and needed. Below is a discussion on how to deal with this situation. I am interested in hearing your thoughts on this. /Roger ISSUE How do you create a single XML vocabulary, and validate that XML vocabulary, for a community that has sub-groups that have overlapping but different data needs? EXAMPLE Consider the book community. It is comprised of: - book sellers - book distributors - book printers They have overlapping, but different data needs. For example, the data needed by a book seller is: - the title of the book - the author of the book - the date of publication - the ISBN - the publisher The book distributor has many of the same data needs, but also some differences: - the title of the book - the author of the book - the size of the book - the weight of the book - the mailing cost And the book printer has overlapping but different needs: - the size of the book - the number of pages How does the book community deal with such differing needs? APPROACH #1 - MAKE EVERYTHING OPTIONAL One approach is to define a schema where everything is optional, e.g. --------------------------------------------------------- book.rng --------------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?> <element name="Book" xmlns="http://relaxng.org/ns/structure/1.0" ns="http://www.books.org"> <optional><element name="Title"><text/></element></optional> <optional><element name="Author"><text/></element></optional> <optional><element name="Date"><text/></element></optional> <optional><element name="ISBN"><text/></element></optional> <optional><element name="Publisher"><text/></element></optional> <optional><element name="Size"><text/></element></optional> <optional><element name="Weight"><text/></element></optional> <optional><element name="MailingCost"><text/></element></optional> <optional><element name="NumPages"><text/></element></optional> </element> Then, each sub-group in the book community uses just the elements they need, ignoring the others. Thus, - the book seller creates XML instance documents comprised of Title, Author, Date, ISBN, and Publisher, e.g. --------------------------------------------------------- book-seller.xml --------------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?> <Book xmlns="http://www.books.org"> <Title>The Wisdom of Crowds</Title> <Author>James Surowiecki</Author> <Date>2005</Date> <ISBN>0-385-72170-6</ISBN> <Publisher>Anchor Books</Publisher> </Book> - the book distributor creates XML instance documents comprised of Title, Author, Size, Weight, and MailingCost, e.g. --------------------------------------------------------- book-distributor.xml --------------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?> <Book xmlns="http://www.books.org"> <Title>The Wisdom of Crowds</Title> <Author>James Surowiecki</Author> <Size>5" x 8"</Size> <Weight>15oz</Weight> <MailingCost>$3.90</MailingCost> </Book> - and the book printer creates XML instance documents comprised of Size and NumPages, e.g. --------------------------------------------------------- book-printer.xml --------------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?> <Book xmlns="http://www.books.org"> <Size>5" x 8"</Size> <NumPages>301</NumPages> </Book> DISADVANTAGE The disadvantage of Approach #1 is that validation is very weak. For example, a book seller may accidentally add NumPages to his instance document: <?xml version="1.0" encoding="UTF-8"?> <Book xmlns="http://www.books.org"> <Title>The Wisdom of Crowds</Title> <Author>James Surowiecki</Author> <Date>2005</Date> <ISBN>0-385-72170-6</ISBN> <Publisher>Anchor Books</Publisher> <NumPages>301</NumPages> </Book> Validation would not catch this error. APPROACH #2 - LAYERED VALIDATION On July 7, 2008 Rick Jelliffe wrote on the xml-dev list: > start off with a generic and open/extensible schema, and > to put version constraints as another layer (you guessed it...Schematron). Yes! That's it Rick! I will use the generic grammar-based schema above, and then add a Schematron business-rules layer on top to constrain it appropriately. Here is the Schematron schema that applies the constraints needed by the book seller: --------------------------------------------------------- book-seller.sch --------------------------------------------------------- <?xml version="1.0"?> <sch:schema xmlns:sch="http://www.ascc.net/xml/schematron"> <sch:ns uri="http://www.books.org" prefix="bk" /> <sch:pattern name="Book Sellers"> <sch:p>The book data required for a seller is title, author, date, ISBN, and publisher.</sch:p> <sch:rule context="bk:Book"> <sch:assert test="count(bk:Title) = 1 and count(bk:Author) = 1 and count(bk:Date) = 1 and count(bk:ISBN) = 1 and count(bk:Publisher) = 1 and count(*[not(self::bk:Title or self::bk:Author or self::bk:Date or self::bk:ISBN or self::bk:Publisher)]) = 0"> The book data required for a seller is title, author, date, ISBN, and publisher. </sch:assert> </sch:rule> </sch:pattern> </sch:schema> Here is the Schematron schema that applies the constraints needed by the book distributor: --------------------------------------------------------- book-distributor.sch --------------------------------------------------------- <?xml version="1.0"?> <sch:schema xmlns:sch="http://www.ascc.net/xml/schematron"> <sch:ns uri="http://www.books.org" prefix="bk" /> <sch:pattern name="Book Distributors"> <sch:p>The book data required for a distributor is title, author, size, weight, and mailing cost.</sch:p> <sch:rule context="bk:Book"> <sch:assert test="count(bk:Title) = 1 and count(bk:Author) = 1 and count(bk:Size) = 1 and count(bk:Weight) = 1 and count(bk:MailingCost) = 1 and count(*[not(self::bk:Title or self::bk:Author or self::bk:Size or self::bk:Weight or self::bk:MailingCost)]) = 0"> The book data required for a seller is title, author, size, weight, and mailing cost. </sch:assert> </sch:rule> </sch:pattern> </sch:schema> And here is the Schematron schema that applies the constraints needed by the book printer: --------------------------------------------------------- book-printer.sch --------------------------------------------------------- <?xml version="1.0"?> <sch:schema xmlns:sch="http://www.ascc.net/xml/schematron"> <sch:ns uri="http://www.books.org" prefix="bk" /> <sch:pattern name="Book Distributors"> <sch:p>The book data required for a printer is the size and number of pages.</sch:p> <sch:rule context="bk:Book"> <sch:assert test="count(bk:Size) = 1 and count(bk:NumPages) = 1 and count(*[not(self::bk:Size or self::bk:NumPages)]) = 0"> The book data required for a printer is the size and number of pages. </sch:assert> </sch:rule> </sch:pattern> </sch:schema> ADVANTAGES Now we have strong validation. If a book seller accidentally adds NumPages, the error will be caught. This approach separates the definition of the community's XML vocabulary from the constraints needed by each sub-group within the community. There is a nice separation of concerns. New Schematron rules can be added to support new data business needs. The grammar schema that defines the XML vocabulary - book.rng - is simple and easy to maintain. CONCURRENT VALIDATION Now, to tie things together, what is needed is to validate an XML instance document against the grammar-based schema plus the appropriate Schematron schema. This "concurrent validation" is nicely accomplished using NVDL. --------------------------------------------------------- book-seller.nvdl --------------------------------------------------------- <?xml version="1.0"?> <rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"> <namespace ns="http://www.books.org"> <validate schema="book.rng" /> <validate schema="book-seller.sch" /> </namespace> </rules> --------------------------------------------------------- book-distributor.nvdl --------------------------------------------------------- <?xml version="1.0"?> <rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"> <namespace ns="http://www.books.org"> <validate schema="book.rng" /> <validate schema="book-distributor.sch" /> </namespace> </rules> --------------------------------------------------------- book-printer.nvdl --------------------------------------------------------- <?xml version="1.0"?> <rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"> <namespace ns="http://www.books.org"> <validate schema="book.rng" /> <validate schema="book-printer.sch" /> </namespace> </rules> SUMMARY The above discussion illustrates how a community can create a single XML vocabulary that can be appropriately customized to the needs of differing sub-groups within the community. The approach used is a layering approach. A simple grammar-based schema defines the XML vocabulary. Schematron rules are defined to constrain the XML vocabulary in a way appropriate to each sub-group within the community. And NVDL is used to tie together the grammar-based schema with the Schematron schema.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|