[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Constrain the Number of Occurrences of Elements inyour XML


count elements in xml
Um, have you noticed the consequences of setting maxOccurs="30000" in 
today's validators? I've seen out-of-memory errors with maxOccurs="1000".

There is a way to avoid the quadratic blowup (probably more than one). I 
talked about one in:

http://jroller.com/comments/bobfoster/FullSpeedAhead/derivatives_of_bounded_repitition

and I believe C. M. Sperberg-McQueen is giving a presentation at the 
next Extreme that covers the topic, but right now, that's really not 
good advice.

Bob Foster
http://xmlbuddy.com/

Roger L. Costello wrote:
 > Hi Folks,
 >
 > Below I have jotted down a few thoughts regarding XML Schemas which
 > permit an unbounded number of occurrences.  Namely, I recommend against
 > using maxOccurs="unbounded" in an XML Schema.  I am interested in
 > hearing your thoughts on this.  /Roger
 >
 >
 >
 >   Constrain the Number of Occurrences of Elements in your XML Schema
 >
 > *by Roger L. Costello*
 > August 5, 2005
 >
 >
 >     Constrain your Data!
 >
 > In this message I will argue that you should never create XML Schemas
 > that permit an unbounded number of occurrences.
 >
 > There are two ways in XML Schemas to permit an unbounded number of
 > occurrences. The first way is to explicitly state that you are
 > permitting an unbounded number of occurrences. For example, this
 > declaration says that Bookstore can contain an unbounded number of Book
 > elements:
 >
 > <element name="Bookstore">
 >     <complexType>
 >         <sequence>
 >             <element name="Book" type="..." *maxOccurs="unbounded"*/>
 >         </sequence>
 >     </complexType>
 > </element>
 >
 > The second way of permitting an unbounded number of occurrences is less
 > obvious. Unboundedness occurs implicitly when you create a recursive
 > structure. In this example there is no limit to the depth of the Section
 > elements. That is, a Section can contain a Section which contains a
 > Section which contains a Section ...
 >
 > <element name="Section" type="SectionType"/>
 >
 > <complexType name="SectionType">
 >     <sequence>
 >         <element name="Title" type="..."/>
 >         <element name="Section" type="SectionType"/>
 >     </sequence>
 > </complexType>
 >
 > Both of the above forms permit an unbounded number of occurrences. I
 > recommend that you never use either form. That is, never declare an
 > element with maxOccurs="unbounded", and never declare a recursive
 > structure. Below I explain why.
 >
 >
 >     Writing a Journal Article? Your Word Count is Limited!
 >
 > The situation with specifying the number of occurrences of an element in
 > an XML Schema is analogous to the situation with specifying the number
 > of words authors can use in an article.
 >
 > Suppose that you want to write an article for a journal. How many words
 > can you use in your article? All journals have an upper limit on the
 > number of words that you can use. Why don't the journals set the word
 > limit to unbounded? Answer: there are editors that have to check the
 > articles for correctness, readability, etc. The editors have limited
 > resources (i.e., time). Thus, it is necessary to limit the length of the
 > article. Perhaps at a later date the journal will increase the word
 > limit (perhaps they hire some full-time editors). But they always have a
 > definite upper limit. They never allow articles of unbounded length. The
 > reason is because of limited resources.
 >
 >
 >     Error! Infinite Loop!
 >
 > The situation with specifying the number of occurrences of an element in
 > an XML Schema is analogous to an infinite loop in programming languages.
 > Why are infinite loops deemed "bad" in programming languages, yet
 > unbounded occurrences are embraced in data?
 >
 > Let's see why infinite loops are bad in programming languages. Suppose
 > that a program has a loop, and a computer begins to process the loop. It
 > requires a certain amount of resources (memory, cpu cycles) for the
 > computer to perform one iteration. Two iterations will require a bit
 > more resources. Three iterations require still more. ... Infinite
 > iterations require infinite resources. Thus, infinite loops are bad
 > because they require infinite resources.
 >
 > The situation is analogous with data. Consider the Bookstore declaration
 > above. It declares that an unbounded number of Book elements are
 > permitted within Bookstore. A program that must process XML instances
 > conforming to the declaration must have the necessary resources (memory,
 > cpu cycles). To process one Book element will require a certain amount
 > of resources. To process a second Book element will require a bit more
 > resources. A third book will require still more resources. ... Infinite
 > Books require infinite resources. Even though XML instance documents are
 > always finite, the schema indicates that there is a "potential" for an
 > infinite number of Book elements. A program that is designed to process
 > "any" XML instance document that conforms to the schema must therefore
 > have an infinite amount of resources.
 >
 >
 >     Okay, then what Value should I use for maxOccurs?
 >
 > "Suppose that I anticipate that Bookstore will never have more than
 > 30,000 Books, so I set maxOccurs='30000'. After some time the
 > requirements change and BookStore now needs to be able to hold 35,000
 > Books. Won't I have to change the Schema every time my needs change?
 > Wouldn't it be easier if I simply declared maxOccurs='unbounded'?"
 >
 > Answer: yes, you will need to change the Schema whenever your
 > requirements change. Yes, it is easier to simply declare
 > maxOccurs='unbounded'. But don't do it! The number that you use for
 > maxOccurs should be as big as your programs are willing and able to cope
 > with, and no more. If at some point the number of actual books exceeds
 > that number then they must either (1) extend your program's resources to
 > handle the expanded number, or (2) refuse to allow more books.
 >
 >
 >     Recap
 >
 >    1. Don't use maxOccurs="unbounded"
 >    2. Don't use recursive constructions
 >    3. Set maxOccurs to a number no larger than the amount of resources
 >       you have available



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.