|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] [Summary] Constrain the Number of Occurrences of Elements in your XML Sc
Hi Folks,
Outstanding discussion! Many
thanks for all the comments. I think that this is an important
issue. Below I have tried to summarize the discussion (it doesn't include
the most recent comments). Also, at the bottom of this message I have a
proposal for getting the best of both viewpoints by using a combination of XML
Schemas and Schematron. /Roger
IssueShould unbounded occurrences be permitted in an XML Schema? Two Approaches to Allowing an Unbounded Number of OccurrencesThere are two approaches in XML Schemas for allowing an unbounded number of occurrences. Below I discuss these two approaches. Following that I discuss two viewpoints on whether unbounded number of occurrences should or should not be permitted. I then discuss the advantages and disadvantages of each viewpoint. Finally, I propose a compromise of the two viewpoints. Allowing Unbounded Occurrences using maxOccursThe first approach to allowing an unbounded number of occurrences is to explicitly state that you want an unbounded number of occurrences by using maxOccurs="unbounded". For example, the following declaration says that Bookstore can contain an unbounded number of Book elements:<element name="Bookstore">
<complexType>
<sequence>
<element name="Book" type="..." maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
Allowing Unbounded Occurrences using a Recursive ExpressionThe second approach to allowing an unbounded number of occurrences is less obvious. Unboundedness occurs implicitly when you create a recursive structure. In the following example there is no limit to the depth of the Section elements. That is, a Section can contain a Section which contains a Section which contains a Section ... <element name="Section" type="SectionType"/>
<complexType name="SectionType">
<sequence>
<element name="Title" type="..."/>
<element name="Section" type="SectionType"/>
</sequence>
</complexType>
Comparison of the Two ApproachesBoth of the above approaches allow an unbounded number of occurrences. Let's compare the two approaches:
Permit Unbounded Occurrences or Constrain the Occurrences?Now that we have seen the two approaches for allowing an unbounded number of occurrences we return to the main issue: when designing an XML Schema should you permit an unbounded number of occurrences? There are two viewpoints on this issue. One viewpoint is that you should design your Schemas to permit an unbounded occurrences. The other viewpoint is that you should not permit an unbounded number of occurrences. To keep the discussion concrete, let's take the above Bookstore Schema as the example. Suppose that it is decided that Bookstore should not allow more than 30,000 Books. Should the Schema be designed to allow an unbounded number of Books: <element name="Bookstore">
<complexType>
<sequence>
<element name="Book" type="..." maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
Or, should the Schema be designed to explicitly state 30,000 as the maximum: <element name="Bookstore">
<complexType>
<sequence>
<element name="Book" type="..." maxOccurs="30000"/>
</sequence>
</complexType>
</element>
Viewpoint 1: Permit an Unbounded Number of OccurrencesThis viewpoint says that it's better to use maxOccurs="unbounded". There is a technical problem with setting maxOccurs="30000". Michael Kay nicely summarizes the problem: "the classical algorithms for turning grammars into finite state machines produce very inefficient machines when there are occurrence limits that are large but finite. Many schema processors break or consume seriously large amounts of memory if you specify a maxOccurs value (other than unbounded) that's greater than a couple of hundred." In other words, a Schema validator will choke if you specify maxOccurs="30000". If the Bookstore wants, at a later date, to expand to accommodate, say, 35,000 Books then no change will be needed to the Schema. The example being considered is just one element. A Schema is, of course, usually comprised of many elements. Suppose that each element is constrained as precisely as possible. Then a document may be rejected because one element exceeded its constraints while all other others were within their constraints. Viewpoint 2: Constrain the Number of OccurrencesThis viewpoint says that it's better to use maxOccurs="30000". It is important to distinguish between theoretical constraints and practical constraints. Theoretically, a Bookstore may have an unbounded number of Books, but for performance/capacity/security reasons this Bookstore can only handle 30,000 Books. Another example of the difference between theoretical and practical limits: theoretically an HTML document may have an infinite number of <p> elements, but in practice a browser supports only a specific maximum. Every system has practical constraints. There are many possible reasons for the constraint such as performance, capacity, or security constraints. The particulars are irrelevant. What is relevant, however, is that it is guaranteed that no system is infinite and consequently all systems have practical constraints. Expressing the practical limits in an XML Schema are especially important for Service Level Agreements (SLAs). So, there are theoretical constraints and there are practical constraints. And as we've seen, typically the two are not identical. The purpose of an XML Schema is to express constraints. Which constraints should a Schema express - theoretical constraints or practical constraints? Viewpoint 2 says that a Schema should express the practical constraints. Thus, for the Bookstore example, set maxOccurs="30000". See Greg Hunt's messages for an excellent discussion of this viewpoint. Viewpoint 1: Permit an Unbounded Number of Occurrences
Viewpoint 2: Constrain the Number of Occurrences
Proposal: Constraining Data without the Validator InefficienciesAs was described above the current implementation of Schema validators are very inefficient when you specify a large number for the value of maxOccurs. So, even if you want to express practical limits you cannot, due to this Schema validator implementation problem. Here is a proposal to avoid the inefficiency: express the theoretical limits using XML Schemas and express the practical limits using Schematron assertions. Let's consider the Bookstore example. Below I show that XML Schemas is to indicate that Bookstore is comprised of an unbounded number of Books, and I use Schematron to indicate that the practical limit to the number of Books must not exceed 30,000: <element name="Bookstore">
<complexType>
<sequence>
<element name="Book" type="..." maxOccurs="unbounded">
<schematron:assert test="count(Book) <= 30000"/>
</element>
</sequence>
</complexType>
</element>
Comments?
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








