|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML Schemas: Best Practices
This is definately a topic of interest to me - the two options you give both have their advantages but I can see the point you make regarding the higher level of control to be gained from using the 'any' element. Allowing the use of a derived type is in danger of giving too much scope for virtually any number of elements to be added. Sorry I'm not adding much to the discussion here...... I just realised I've done nothing more than agree with your suggestions ! As an aside - various papers I have looked at have specified XML schema to be a 'closed' content model. Although these techniques do not make the model strictly 'open', this level of extensibility and control over such would seem to suggest a half way house...... perhaps semi-open ;-) Regards Caroline "Roger L. Costello" wrote: > Hi Folks, > > I would like to start on a new issue. I think that this issue will > generate a lot of interest, as it is critical to designing robust > schemas. > > Issue: What is Best Practice for creating extensible content models? > > Below I have jotted down some initial thoughts on this subject. I > am sure that I have missed many techniques for creating extensible > content models. What are your thoughts on this topic? > > Techniques for Creating Extensible Content Models > > [1] Use types to create extensible content models. Consider this > schema snippet: > > <element name="BookCatalogue"> > <complexType> > <sequence> > <element name="Book" minOccurs="0" > maxOccurs="unbounded"> > <complexType> > <sequence> > <element name="Title" type="string"/> > <element name="Author" type="string"/> > <element name="Date" type="year"/> > <element name="ISBN" type="string"/> > <element name="Publisher" type="string"/> > </sequence> > </complexType> > </element> > </sequence> > </complexType> > </element> > > This schema snippet dictates that in instance documents <Book> elements > must always be comprised of exactly 5 elements <Title>, <Author>, > <Date>, <ISBN>, and <Publisher>. For example: > > <Book> > <Title>The First and Last Freedom</Title> > <Author>J. Krishnamurti</Author> > <Date>1954</Date> > <ISBN>0-06-064831-7</ISBN> > <Publisher>Harper & Row</Publisher> > </Book> > > The schema creates instance documents that are completely static and > non extensible. > > On the other hand, consider this version of the schema, where I have > defined Book's content model with a type definition: > > <complexType name="BookType"> > <sequence> > <element name="Title" type="string"/> > <element name="Author" type="string"/> > <element name="Date" type="year"/> > <element name="ISBN" type="string"/> > <element name="Publisher" type="string"/> > </sequence> > </complexType> > <element name="BookCatalogue"> > <complexType> > <sequence> > <element name="Book" type="c:BookType" minOccurs="0" > maxOccurs="unbounded"/> > </sequence> > </complexType> > </element> > > Recall that via the mechanism of type substitutability, the contents > of <Book> can be substituted by any type that derives from BookType. > For example, if we create a type which derives from BookType: > > <complexType name="BookTypePlusReviewer"> > <complexContent> > <extension base="c:BookType" > > <sequence> > <element name="Reviewer" type="string"/> > </sequence> > </extension> > </complexContent> > </complexType> > > then instance documents can create a <Book> element that > contains a <Reviewer> element, along with the other five elements: > > <Book xsi:type="BookTypePlusReviewer"> > <Title>My Life and Times</Title> > <Author>Paul McCartney</Author> > <Date>1998</Date> > <ISBN>94303-12021-43892</ISBN> > <Publisher>McMillin Publishing</Publisher> > <Reviewer>Roger Costello</Reviewer> > </Book> > > In my example, I defined BookTypePlusReviewer within the same > schema as BookType. In general, however, this may not be the case. > Other schemas can import the BookCatalogue schema and define types > which derive from BookType. Thus, the contents of Book may be > extended, without modifying the BookCatalogue schema! > > This type substitutability mechanism is a powerful extensibility > mechanism. However, it suffers from two problems: > > [1] Location Restricted Extensibility: The extensibility is restricted > to appending elements onto the end of the content model > (after the <Publisher> element). What if we wanted to extend > <Book> by adding elements to the beginning (before <Title>), or in > the middle, etc? We can't do it with this mechanism. > > [2] Unexpected Extensibility: If you look at the declaration for Book: > > <element name="Book" type="c:BookType" minOccurs="0" > maxOccurs="unbounded"/> > > and the definition for BookType: > > <complexType name="BookType"> > <sequence> > <element name="Title" type="string"/> > <element name="Author" type="string"/> > <element name="Date" type="year"/> > <element name="ISBN" type="string"/> > <element name="Publisher" type="string"/> > </sequence> > </complexType> > > it is easy to be fooled into thinking that in instance documents the > <Book> elements will always contain just <Title>, <Author>, <Date>, > <ISBN>, and <Publisher>. It is easy to forget that someone could > extend the content model using the type substitutability mechanism. > Extensibility is unexpected! Consequently, if you write a program to > process BookCatalogue instance documents, you may forget to take into > account the fact that a <Book> element may contain more than five > children. > > It would be nice if there was a way to explicitly flag places where > extensibility may occur: "hey, instance documents may extend <Book> at > this point, so be sure to write your code taking this possibility into > account." In addition, it would be nice if we could extend Book's > content model at locations other than just the end ... The <any> > element gives us these capabilities beautifully: > > <element name="BookCatalogue"> > <complexType> > <sequence> > <element name="Book" type="minOccurs="0" > maxOccurs="unbounded"> > <complexType> > <sequence> > <element name="Title" type="string"/> > <element name="Author" type="string"/> > <element name="Date" type="year"/> > <element name="ISBN" type="string"/> > <element name="Publisher" type="string"/> > <any namespace="##any" minOccurs="0"/> > </sequence> > </complexType> > </element> > </sequence> > </complexType> > </element> > > In this version of the schema I have made explicit the fact that after > the <Publication> element any well-formed XML element may occur and > the XML element may come from any namespace. > > Note that I could have put the <any> element within a BookType: > > <complexType name="BookType"> > <sequence> > <element name="Title" type="string"/> > <element name="Author" type="string"/> > <element name="Date" type="year"/> > <element name="ISBN" type="string"/> > <element name="Publisher" type="string"/> > <any namespace="##any" minOccurs="0" maxOccurs="1"/> > </sequence> > </complexType> > > and then declared Book to be of type BookType: > > <element name="Book" type="c:BookType" minOccurs="0" > maxOccurs="unbounded"/> > > However, then we are back to the "unexpected extensibility" problem. > Namely, after the <Publication> element any well-formed XML element > may occur. After that, anything could be present. > > Thus, I chose not to use a type so that I could control the > extensibility. > > There is another way to control the extensibility and still use a type. > I can use the BookType and add a block attribute to Book: > > <element name="Book" type="c:BookType" block="#all" > minOccurs="0" maxOccurs="unbounded"/> > > The block attribute prohibits derived types from being used in > Book's content model. I prefer this later way of controlling > extensibility than the in-line version because it creates a reusable > component (BookType), and yet we still have control over the > extensibility. > > With the <any> element we have complete control over where, and how > much extensibility we want to allow. For example, suppose that we > want to enable there to be at most two new elements at the top of > Book's content model. Here's how to specify that using the <any> > element: > > <complexType name="BookType"> > <sequence> > <any namespace="##any" minOccurs="0" maxOccurs="2"/> > <element name="Title" type="string"/> > <element name="Author" type="string"/> > <element name="Date" type="year"/> > <element name="ISBN" type="string"/> > <element name="Publisher" type="string"/> > </sequence> > </complexType> > > Note how I have placed the <any> element at the top of the content > model, and have set maxOccurs="2". Thus, in instance documents the > <Book> content will always end with <Title>, <Author>, <Date>, <ISBN>, > and <Publisher>. Prior to that, two well-formed XML elements may > occur. > > I must admit that I am biased towards using the <any> element as a > mechanism for achieving content model extensibility. It provides much > greater control for where extensibility occurs and how much occurs. In > addition, I like the fact that it alerts me to where extensibility may > occur, so I can write my programs to process the content model > appropriately. I don't like surprises in my data. > > What are your thoughts on this topic? I am sure that in my bias, I > am missing some disadvantages of using the <any> element. Can you > think of any disadvantages? What other techniques are there for > extending content models? /Roger
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








