|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: [Summary] Constrain the Number of Occurrences of Elementsi
I agree that Greg Hunt made some good points about "operational constraints," but patching up your solution with Schematron doesn't address them. Still to be discussed: - What different constraints might be suitable for models of the data store vs. models of transactions against the store? - Different kinds of transaction (e.g., the traditional "batch" and "interactive") might impose different constraints. - What constraints might be better expressed as implementation bounds vs. data model? E.g., depth of recursion or sheer number of elements may be a problem regardless of element type. Bob Foster http://xmlbuddy.com/ Roger L. Costello wrote: > Hi Folks, > > Outstanding discussion! Many thanks for all the comments. I think that > this is an important issue. Below I have tried to summarize the > discussion (it doesn't include the most recent comments). Also, at the > bottom of this message I have a proposal for getting the best of both > viewpoints by using a combination of XML Schemas and Schematron. /Roger > > > Issue > > /Should unbounded occurrences be permitted in an XML Schema?/ > > > Two Approaches to Allowing an Unbounded Number of Occurrences > > There are two approaches in XML Schemas for allowing an unbounded number > of occurrences. Below I discuss these two approaches. Following that I > discuss two viewpoints on whether unbounded number of occurrences should > or should not be permitted. I then discuss the advantages and > disadvantages of each viewpoint. Finally, I propose a compromise of the > two viewpoints. > > > Allowing Unbounded Occurrences using maxOccurs > > The first approach to allowing an unbounded number of occurrences is to > explicitly state that you want an unbounded number of occurrences by > using maxOccurs="unbounded". For example, the following declaration says > that Bookstore can contain an unbounded number of Book elements: > > <element name="Bookstore"> > <complexType> > <sequence> > <element name="Book" type="..." *maxOccurs="unbounded"*/> > </sequence> > </complexType> > </element> > > > Allowing Unbounded Occurrences using a Recursive Expression > > The second approach to allowing an unbounded number of occurrences is > less obvious. Unboundedness occurs implicitly when you create a > recursive structure. In the following example there is no limit to the > depth of the Section elements. That is, a Section can contain a Section > which contains a Section which contains a Section ... > > <element name="Section" type="SectionType"/> > > <complexType name="SectionType"> > <sequence> > <element name="Title" type="..."/> > <element name="Section" type="SectionType"/> > </sequence> > </complexType> > > > Comparison of the Two Approaches > > Both of the above approaches allow an unbounded number of occurrences. > Let's compare the two approaches: > > 1. *Explicit vs Implicit:* With the first approach you explicitly > state that you are allowing an unbounded number of occurrences. > With the second approach unboundedness is implicit. Although it is > obvious in the example presented, in a large Schema where the > recursion extends through many complexTypes it may not be obvious > that an unbounded number of occurrences are being allowed. > > 2. *Ability to "throttle back" on the Number of Occurrences:* With > the first approach is it easy to reduce the number of occurrences. > If you don't want an unbounded number of occurrences, and want, > say, only 10 occurrences then you simply specify maxOccurs="10". > With the second approach there is no means to control the depth of > the recursion. There is no means to say, "There cannot be more > than 10 deep Section elements". > > > Permit Unbounded Occurrences or Constrain the Occurrences? > > Now that we have seen the two approaches for allowing an unbounded > number of occurrences we return to the main issue: when designing an XML > Schema should you permit an unbounded number of occurrences? There are > two viewpoints on this issue. One viewpoint is that you should design > your Schemas to permit an unbounded occurrences. The other viewpoint is > that you should not permit an unbounded number of occurrences. > > To keep the discussion concrete, let's take the above Bookstore Schema > as the example. Suppose that it is decided that Bookstore should not > allow more than 30,000 Books. Should the Schema be designed to allow an > unbounded number of Books: > > <element name="Bookstore"> > <complexType> > <sequence> > <element name="Book" type="..." *maxOccurs="unbounded"*/> > </sequence> > </complexType> > </element> > > Or, should the Schema be designed to explicitly state 30,000 as the maximum: > > <element name="Bookstore"> > <complexType> > <sequence> > <element name="Book" type="..." *maxOccurs="30000"*/> > </sequence> > </complexType> > </element> > > > Viewpoint 1: Permit an Unbounded Number of Occurrences > > This viewpoint says that it's better to use maxOccurs="unbounded". > > There is a technical problem with setting maxOccurs="30000". Michael Kay > nicely summarizes the problem: "the classical algorithms for turning > grammars into finite state machines produce very inefficient machines > when there are occurrence limits that are large but finite. Many schema > processors break or consume seriously large amounts of memory if you > specify a maxOccurs value (other than unbounded) that's greater than a > couple of hundred." In other words, a Schema validator will choke if you > specify maxOccurs="30000". > > If the Bookstore wants, at a later date, to expand to accommodate, say, > 35,000 Books then no change will be needed to the Schema. > > The example being considered is just one element. A Schema is, of > course, usually comprised of many elements. Suppose that each element is > constrained as precisely as possible. Then a document may be rejected > because one element exceeded its constraints while all other others were > within their constraints. > > > Viewpoint 2: Constrain the Number of Occurrences > > This viewpoint says that it's better to use maxOccurs="30000". > > It is important to distinguish between *theoretical constraints* and > *practical constraints*. Theoretically, a Bookstore may have an > unbounded number of Books, but for performance/capacity/security reasons > this Bookstore can only handle 30,000 Books. > > Another example of the difference between theoretical and practical > limits: theoretically an HTML document may have an infinite number of > <p> elements, but in practice a browser supports only a specific maximum. > > Every system has practical constraints. There are many possible reasons > for the constraint such as performance, capacity, or security > constraints. The particulars are irrelevant. What is relevant, however, > is that it is guaranteed that no system is infinite and consequently all > systems have practical constraints. > > Expressing the practical limits in an XML Schema are especially > important for Service Level Agreements (SLAs). > > So, there are theoretical constraints and there are practical > constraints. And as we've seen, typically the two are not identical. The > purpose of an XML Schema is to express constraints. Which constraints > should a Schema express - theoretical constraints or practical > constraints? Viewpoint 2 says that a Schema should express the practical > constraints. Thus, for the Bookstore example, set maxOccurs="30000". > > See Greg Hunt's messages for an excellent discussion of this viewpoint. > > > Viewpoint 1: Permit an Unbounded Number of Occurrences > > * > > > Advantages > > o *Schema Validator Efficiency:* Schema validators are more > efficient when you use maxOccurs="unbounded" than when you > set maxOccurs to a large number > o *Accommodates Growth:* as a system is expanded to support > larger amounts of data, there is no need to change the Schema. > * > > > Disadvantages > > o *Pushes the Constraint-Checking Problem to Another Part of > the System:* if the practical limits are not expressed in > the Schema, then they have to be expressed somewhere else - > somewhere less visible, less maintainable, and with less > tool support. > o *Vulnerable to Denial of Service Attack:* XML guards depend > upon an XML Schema to indicate whether an XML instance > document should be allowed to pass. It will be unable to > detect a denial of service attack since the Schema sets a > theoretical limit and not a practical limit. > > > Viewpoint 2: Constrain the Number of Occurrences > > * > > > Advantages > > o *Constraints are Visible, Maintainable, and with Tool > Support:* since the practical limits are expressed in the > Schema, they are visible, maintainable, and with tool support. > o *Prevent Denial of Service Attack:* XML guards depend upon > an XML Schema to indicate whether an XML instance document > should be allowed to pass. It will be able to detect a > denial of service attack since the Schema sets a practical > limit and not a theoretical limit. > * > > > Disadvantages > > o *Schema Validator is Inefficient:* Schema validators are not > efficient when you set maxOccurs to a large number > o *Requires Periodic Update:* as a system is expanded to > support larger amounts of data, the Schema will need to be > updated. > o *Exposing System Limits:* by setting maxOccurs="30000" you > are giving information to people about the limitations of > your system. > > > Proposal: Constraining Data without the Validator Inefficiencies > > As was described above the current implementation of Schema validators > are very inefficient when you specify a large number for the value of > maxOccurs. So, even if you want to express practical limits you cannot, > due to this Schema validator implementation problem. > > Here is a proposal to avoid the inefficiency: express the theoretical > limits using XML Schemas and express the practical limits using > Schematron assertions. Let's consider the Bookstore example. Below I > show that XML Schemas is to indicate that Bookstore is comprised of an > unbounded number of Books, and I use Schematron to indicate that the > practical limit to the number of Books must not exceed 30,000: > > <element name="Bookstore"> > <complexType> > <sequence> > <element name="Book" type="..." *maxOccurs="unbounded"*> > *<schematron:assert test="count(Book) <= 30000"/>* > </element> > </sequence> > </complexType> > </element> > > Comments? > > >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








