[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: [Summary] Constrain the Number of Occurrences of Elementsi


schema recursive constrained depth
I agree that Greg Hunt made some good points about "operational 
constraints," but patching up your solution with Schematron doesn't 
address them.

Still to be discussed:

- What different constraints might be suitable for models of the data 
store vs. models of transactions against the store?

- Different kinds of transaction (e.g., the traditional "batch" and 
"interactive") might impose different constraints.

- What constraints might be better expressed as implementation bounds 
vs. data model? E.g., depth of recursion or sheer number of elements may 
be a problem regardless of element type.

Bob Foster
http://xmlbuddy.com/

Roger L. Costello wrote:
 > Hi Folks,
 >
 > Outstanding discussion!  Many thanks for all the comments.  I think that
 > this is an important issue.  Below I have tried to summarize the
 > discussion (it doesn't include the most recent comments).  Also, at the
 > bottom of this message I have a proposal for getting the best of both
 > viewpoints by using a combination of XML Schemas and Schematron.  /Roger
 >
 >
 >     Issue
 >
 > /Should unbounded occurrences be permitted in an XML Schema?/
 >
 >
 >     Two Approaches to Allowing an Unbounded Number of Occurrences
 >
 > There are two approaches in XML Schemas for allowing an unbounded number
 > of occurrences. Below I discuss these two approaches. Following that I
 > discuss two viewpoints on whether unbounded number of occurrences should
 > or should not be permitted. I then discuss the advantages and
 > disadvantages of each viewpoint. Finally, I propose a compromise of the
 > two viewpoints.
 >
 >
 >     Allowing Unbounded Occurrences using maxOccurs
 >
 > The first approach to allowing an unbounded number of occurrences is to
 > explicitly state that you want an unbounded number of occurrences by
 > using maxOccurs="unbounded". For example, the following declaration says
 > that Bookstore can contain an unbounded number of Book elements:
 >
 > <element name="Bookstore">
 >     <complexType>
 >         <sequence>
 >             <element name="Book" type="..." *maxOccurs="unbounded"*/>
 >         </sequence>
 >     </complexType>
 > </element>
 >
 >
 >     Allowing Unbounded Occurrences using a Recursive Expression
 >
 > The second approach to allowing an unbounded number of occurrences is
 > less obvious. Unboundedness occurs implicitly when you create a
 > recursive structure. In the following example there is no limit to the
 > depth of the Section elements. That is, a Section can contain a Section
 > which contains a Section which contains a Section ...
 >
 > <element name="Section" type="SectionType"/>
 >
 > <complexType name="SectionType">
 >     <sequence>
 >         <element name="Title" type="..."/>
 >         <element name="Section" type="SectionType"/>
 >     </sequence>
 > </complexType>
 >
 >
 >     Comparison of the Two Approaches
 >
 > Both of the above approaches allow an unbounded number of occurrences.
 > Let's compare the two approaches:
 >
 >    1. *Explicit vs Implicit:* With the first approach you explicitly
 >       state that you are allowing an unbounded number of occurrences.
 >       With the second approach unboundedness is implicit. Although it is
 >       obvious in the example presented, in a large Schema where the
 >       recursion extends through many complexTypes it may not be obvious
 >       that an unbounded number of occurrences are being allowed.
 >
 >    2. *Ability to "throttle back" on the Number of Occurrences:* With
 >       the first approach is it easy to reduce the number of occurrences.
 >       If you don't want an unbounded number of occurrences, and want,
 >       say, only 10 occurrences then you simply specify maxOccurs="10".
 >       With the second approach there is no means to control the depth of
 >       the recursion. There is no means to say, "There cannot be more
 >       than 10 deep Section elements".
 >
 >
 >     Permit Unbounded Occurrences or Constrain the Occurrences?
 >
 > Now that we have seen the two approaches for allowing an unbounded
 > number of occurrences we return to the main issue: when designing an XML
 > Schema should you permit an unbounded number of occurrences? There are
 > two viewpoints on this issue. One viewpoint is that you should design
 > your Schemas to permit an unbounded occurrences. The other viewpoint is
 > that you should not permit an unbounded number of occurrences.
 >
 > To keep the discussion concrete, let's take the above Bookstore Schema
 > as the example. Suppose that it is decided that Bookstore should not
 > allow more than 30,000 Books. Should the Schema be designed to allow an
 > unbounded number of Books:
 >
 > <element name="Bookstore">
 >     <complexType>
 >         <sequence>
 >             <element name="Book" type="..." *maxOccurs="unbounded"*/>
 >         </sequence>
 >     </complexType>
 > </element>
 >
 > Or, should the Schema be designed to explicitly state 30,000 as the 
maximum:
 >
 > <element name="Bookstore">
 >     <complexType>
 >         <sequence>
 >             <element name="Book" type="..." *maxOccurs="30000"*/>
 >         </sequence>
 >     </complexType>
 > </element>
 >
 >
 >     Viewpoint 1: Permit an Unbounded Number of Occurrences
 >
 > This viewpoint says that it's better to use maxOccurs="unbounded".
 >
 > There is a technical problem with setting maxOccurs="30000". Michael Kay
 > nicely summarizes the problem: "the classical algorithms for turning
 > grammars into finite state machines produce very inefficient machines
 > when there are occurrence limits that are large but finite. Many schema
 > processors break or consume seriously large amounts of memory if you
 > specify a maxOccurs value (other than unbounded) that's greater than a
 > couple of hundred." In other words, a Schema validator will choke if you
 > specify maxOccurs="30000".
 >
 > If the Bookstore wants, at a later date, to expand to accommodate, say,
 > 35,000 Books then no change will be needed to the Schema.
 >
 > The example being considered is just one element. A Schema is, of
 > course, usually comprised of many elements. Suppose that each element is
 > constrained as precisely as possible. Then a document may be rejected
 > because one element exceeded its constraints while all other others were
 > within their constraints.
 >
 >
 >     Viewpoint 2: Constrain the Number of Occurrences
 >
 > This viewpoint says that it's better to use maxOccurs="30000".
 >
 > It is important to distinguish between *theoretical constraints* and
 > *practical constraints*. Theoretically, a Bookstore may have an
 > unbounded number of Books, but for performance/capacity/security reasons
 > this Bookstore can only handle 30,000 Books.
 >
 > Another example of the difference between theoretical and practical
 > limits: theoretically an HTML document may have an infinite number of
 > <p> elements, but in practice a browser supports only a specific maximum.
 >
 > Every system has practical constraints. There are many possible reasons
 > for the constraint such as performance, capacity, or security
 > constraints. The particulars are irrelevant. What is relevant, however,
 > is that it is guaranteed that no system is infinite and consequently all
 > systems have practical constraints.
 >
 > Expressing the practical limits in an XML Schema are especially
 > important for Service Level Agreements (SLAs).
 >
 > So, there are theoretical constraints and there are practical
 > constraints. And as we've seen, typically the two are not identical. The
 > purpose of an XML Schema is to express constraints. Which constraints
 > should a Schema express - theoretical constraints or practical
 > constraints? Viewpoint 2 says that a Schema should express the practical
 > constraints. Thus, for the Bookstore example, set maxOccurs="30000".
 >
 > See Greg Hunt's messages for an excellent discussion of this viewpoint.
 >
 >
 >     Viewpoint 1: Permit an Unbounded Number of Occurrences
 >
 >     *
 >
 >
 >           Advantages
 >
 >           o *Schema Validator Efficiency:* Schema validators are more
 >             efficient when you use maxOccurs="unbounded" than when you
 >             set maxOccurs to a large number
 >           o *Accommodates Growth:* as a system is expanded to support
 >             larger amounts of data, there is no need to change the 
Schema.
 >     *
 >
 >
 >           Disadvantages
 >
 >           o *Pushes the Constraint-Checking Problem to Another Part of
 >             the System:* if the practical limits are not expressed in
 >             the Schema, then they have to be expressed somewhere else -
 >             somewhere less visible, less maintainable, and with less
 >             tool support.
 >           o *Vulnerable to Denial of Service Attack:* XML guards depend
 >             upon an XML Schema to indicate whether an XML instance
 >             document should be allowed to pass. It will be unable to
 >             detect a denial of service attack since the Schema sets a
 >             theoretical limit and not a practical limit.
 >
 >
 >     Viewpoint 2: Constrain the Number of Occurrences
 >
 >     *
 >
 >
 >           Advantages
 >
 >           o *Constraints are Visible, Maintainable, and with Tool
 >             Support:* since the practical limits are expressed in the
 >             Schema, they are visible, maintainable, and with tool 
support.
 >           o *Prevent Denial of Service Attack:* XML guards depend upon
 >             an XML Schema to indicate whether an XML instance document
 >             should be allowed to pass. It will be able to detect a
 >             denial of service attack since the Schema sets a practical
 >             limit and not a theoretical limit.
 >     *
 >
 >
 >           Disadvantages
 >
 >           o *Schema Validator is Inefficient:* Schema validators are not
 >             efficient when you set maxOccurs to a large number
 >           o *Requires Periodic Update:* as a system is expanded to
 >             support larger amounts of data, the Schema will need to be
 >             updated.
 >           o *Exposing System Limits:* by setting maxOccurs="30000" you
 >             are giving information to people about the limitations of
 >             your system.
 >
 >
 >     Proposal: Constraining Data without the Validator Inefficiencies
 >
 > As was described above the current implementation of Schema validators
 > are very inefficient when you specify a large number for the value of
 > maxOccurs. So, even if you want to express practical limits you cannot,
 > due to this Schema validator implementation problem.
 >
 > Here is a proposal to avoid the inefficiency: express the theoretical
 > limits using XML Schemas and express the practical limits using
 > Schematron assertions. Let's consider the Bookstore example. Below I
 > show that XML Schemas is to indicate that Bookstore is comprised of an
 > unbounded number of Books, and I use Schematron to indicate that the
 > practical limit to the number of Books must not exceed 30,000:
 >
 > <element name="Bookstore">
 >     <complexType>
 >         <sequence>
 >             <element name="Book" type="..." *maxOccurs="unbounded"*>
 >                 *<schematron:assert test="count(Book) <= 30000"/>*
 >             </element>
 >         </sequence>
 >     </complexType>
 > </element>
 >
 > Comments?
 >
 >
 >



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.