[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] XML Schemas: Best Practices
Hi Folks, I would like to see if we can collectively come up with a set of "best practices" in designing XML Schemas. I realize that the specifics of designing a schema are heavily dependent upon the task at hand. However, I firmly believe that there are guidelines that can be employed in creating a schema, and those guidelines hold true irrespective of the specific task. It is this set of guidelines that I am hoping we can shed some light upon. I would like to get things started by listing some of the things that must be considered in designing a schema. It is by no means an exhaustive list. For example, it doesn't address when to block a type from derivation, when to create a schema without a namespace, when to make an element or a type abstract, etc. Nonetheless, it is a start to some hopefully useful discussions. First, a quick list of the issues: [1] Element versus Type Reuse [2] Local versus Global [3] elementFormDefault - to qualify or not to qualify [4] Evolvability/versioning [5] One namespace versus many namespaces (import verus include) [6] Capturing semantics of elements and types Now, details of each issue: [1] Element versus Type Reuse: from my own experience in building schemas I have found that it is oftentimes not obvious whether to declare something as an element and then reuse that element, or to declare it as a type and reuse the type. Let's consider the two cases by looking at an example: Element Reuse - Declare an element for reuse: |<element name="Elevation"> | <simpleType base="integer"> | <minInclusive value="-1290"/> | <maxInclusive value="29028"/> | </simpleType> |</element> - Reusing the element: |<element name="Boston"> | <complexType> | <sequence> | <element ref="city:Elevation"/> | </sequence> | </complexType> |</element> Type Reuse - Declare a type for reuse: |<simpleType name="Elevation" base="integer"> | <minInclusive value="-1290"/> | <maxInclusive value="29028"/> |</simpleType> - Reusing the type: |<element name="Boston"> | <complexType> | <sequence> | <element name="Elevation" type="city:Elevation"/> | </sequence> | </complexType> |</element> Which is preferred - declare Elevation as an element and reuse that element, or declare Elevation as a type and reuse the type? Here are some things to consider: - Declaring it as an element will allow equivClasses to be created, thus enabling the Elevation element to be substituted by members of the equivClass. - Declaring it as a type will allow derived types to be created, thus enabling the Elevation type to be substituted by derived types. - Someone once said that XML Schemas is a "type-based system". I am not sure what that means, but perhaps it means that the idea behind XML Schemas is to reuse types? - In programming languages types are the items typically that get reused. Does that apply to XML Schemas, or not? What are your thoughts on type versus element reuse? What guidelines would you recommend to someone struggling to decide whether he/she should make an item as an element or as a type? [2] Local versus Global: when should an element or type be declared globally versus when should it be nested within something else (i.e., local)? Again, let's take an example: - Everything Global |<element name="Book" type="cat:Listing"/> |<complexType name= "Listing"> | <sequence> | <element ref="cat:Title"/> | <element ref="cat:Author"/> | </sequence> |</complexType> |<element name="Title" type="string"/> |<element name="Author" type="string"/> - Everything Local |<element name="Book"> | <complexType> | <sequence> | <element name="Title" type="string"/> | <element name="Author" type="string"/> | </sequence> | </complexType> |</element> What guidance can we provide a schema designer in deciding whether or not to "hide" a type or element (by nesting it)? Someone once asked me when it would be desirable to make an element or type local. I was hard pressed to think of a situation. Thus, I was not able to provide guidance on when to use elements/types locally. It is easy to see the benefit of declaring elements/types globally - they can be reused, not only within a schema but also across schemas. It is not so easy for me to see the benefit of hiding elements/types. Can someone provide guidance on this issue? Does the OO encapsulation principle apply to XML Schemas? If so, why? If not, why not? [3] elementFormDefault - to qualify or not to qualify: elementFormDefault is an attribute of <schema>. It is used to dictate what elements are to be namespace-qualified in instance documents: a value of "qualified" means that everything is namespace-qualified in the instance document, whereas a value of "unqualified" means that only global items are namespace-qualified. Personally, I find that for simplicity it is easiest to use "qualified" and then in the instance document use a default namespace declaration. It is not real clear to me the advantages of using "unqualified". In other words, I would not be able to provide good guidance on when to use "unqualified". If someone asked you to list the scenarios when it would be desirable to use "unqualified" what guidance would you give? [4] Evolvability/versioning: in today's rapidly changing marketplace, there is no question that schemas will need to change (evolve). What guidance do you provide a schema designer in engineering his/her schema to support change? When a schema is changed, how do you indicate that it is a new version - with a new namespace? I have thought quite a bit about schema evolution. At the end of this message I expound quite a bit this subject. As for versioning, that is something that I would be hard pressed to provide guidance upon. When a new version of a schema is created, what techniques should one use to signify the new version? One idea is to create a new namespace for the new version. Another idea is to simply change the version attribute on <schema>. How would you indicate a new version? [5] One namespace versus many namespaces (import versus include): I think that in a typical project many schemas will be created. A question will then arise, "shall we define one namespace for all the schemas or shall we create a different namespace for each schema?" What are the tradeoffs in creating multiple namespaces versus a single namespace? What guidance would you give someone starting on a project that will create multiple namespaces - create a namespace for each schema or one umbrella namespace? [6] Capturing semantics of elements and types: a schema creates elements, defines the relationships between the elements, and defines the datatypes of the elements. However, that by itself doesn't define the semantics of the elements. For example, consider this element declaration: <element name= "jdkdsfjkds"> <simpleType base= "string"> <pattern value= "[a-zA-Z]+\d"/> </simpleType> </element> Does this tell you the meaning of "jdkdsfjkds"? Probably not. Something more is needed. What guidelines would you give someone wishing to document the semantics of the items created in a schema? Here are some guidelines that Mary Pulvermacher sent to me: "Our current thinking is to capture as much of the semantics as possible in the XML schema itself. We plan to do this by using the XML Schema provided annotation element and having a convention that every element or attribute has an annotation that provides information on the meaning. Of course this is not perfect but it does carry some advantages. - The XML schema will capture the data structure, meta-data and relationships between the elements. - Use of strong typing will capture much of the data content. - The annotations can capture definitions and other explanatory information - The structure of the "definitions" will always be consistent with the structure used in the schema since they are linked. - Since the schema itself is an XML document, we can use XSL to transform this information into a format suitable for human consumption." Do you have any other thoughts on capturing the semantics of elements and types created by a schema? What guidance would you give to someone wishing to capture the semantics of the elements and types? -------------------------------------------------------------------- Some thoughts on enabling schema evolution (expansion of [4] above) In today's rapidly changing market static schemas will be less commonplace, as the market pushes schemas to quickly support new capabilities. For example, consider the cellphone industry. Clearly, this is a rapidly evolving market. Any schema that the cellphone community creates will soon become obsolete as hardware/software changes extend the cellphone capabilities. For the cellphone community rapid evolution of a cellphone schema is not just a nicety, the market demands it! Suppose that the cellphone community gets together and creates a schema, cellphone.xsd. Imagine that every week NOKIA sends out to the various vendors an instance document (conforming to cellphone.xsd), detailing its current product set. Now suppose that a few months after cellphone.xsd is agreed upon NOKIA makes some breakthroughs in their cellphones - they create new memory, call, and display features, none of which are supported by cellphone.xsd. To gain a market advantage NOKIA will want to get information about these new capabilities to its vendors ASAP. Further, they will have little motivation to wait for the next meeting of the cellphone community to consider upgrades to cellphone.xsd. They need results NOW. How does open content help? That is described next. Suppose that the cellphone schema is declared "open". Immediately NOKIA can extend its instance documents to incorporate data about the new features. How does this change impact the vendor applications that receive the instance documents? The answer is - not at all. In the worst case, the vendor's application will simply skip over the new elements. More likely, however, the vendors are showing the cellphone features in a list box and these new features will be automatically captured with the other features. Let's stop and think about what has been just described ? Without modifying the cellphone schema and without touching the vendor's applications, information about the new NOKIA features has been instantly disseminated to the marketplace! Open content in the cellphone schema is the enabler for this rapid dissemination. Clearly some types of instance document extensions may require modification to the vendor's applications. Recognize, however, that thevendors are free to upgrade their applications in their own time. The applications do not need to be upgraded before changes can be introduced into instance documents. At the very worst, the vendor's applications will simply skip over the extensions. And, of course, those vendors do not need to upgrade in lock-step To wrap up this example ? suppose that several months later the cellphone community reconvenes to discuss enhancements to the schema. The new features that NOKIA first introduced into the marketplace are then officially added into the schema. Thus completes the cycle. Changes to the instance documents have driven the evolution of the schema.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|