[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Most XML vocabularies are too large and inevitably have lots of"holes"

  • From: "Costello, Roger L." <costello@mitre.org>
  • To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
  • Date: Sat, 17 Dec 2011 19:50:10 +0000

Most XML vocabularies are too large and inevitably have lots of"holes"
Hi Folks,

Recently I have been learning Lambda Calculus [1].

A fascinating thing about Lambda Calculus is its richness, despite it being extraordinarily simple.

The set of expressions (lambda-terms) that can be created in Lambda Calculus is defined as follows:

a. All variables are lambda-terms

b. If M and N are any lambda-terms, the (M N) is a lambda-term (called an application)

c. If M is any lambda-term and x is any variable, then (\x -> M) is a lambda-term (called an abstraction) 


With just a few items and a few combination rules, an entire field was spawned.

Because it is limited it has been possible to formally characterize Lambda Calculus.

A few days ago Michael Kay made this startling statement regarding XML Schema

      ... the more you read the XSD spec, the more holes you find.

And on the xmlschema-dev list Michael Kay recently stated this

      ... the schema construction model is not defined very formally ...

Let's think about this:

1. XML Schema is a comparatively small XML vocabulary. I haven't counted the number of elements and attributes but let me guess that the total is 100 (probably less).

2. XML Schema is pretty rigorously specified.

Yet despite its smallness and fairly rigorous specification it still has "holes" in it.

ASSERTION: An XML vocabulary consisting of 100 items (or more) is too much. It can never be formally specified and it will forever have "holes."

Let's do a little math. Suppose an XML vocabulary consists of 5 elements -- A, B, C, D, E -- and one of them must be the root element which must contain only one child element. Here are some valid instances




And so forth.

With this extremely constrained XML vocabulary there are: 5 * 4 = 20 permutations (XML instances with differing arrangements of markup).

If we allow the root element to have one or two child elements then there are: 5 * 4  + 5 * 2**4 = 100 permutations.

The complexity grows at an breathtaking rate as the size of the vocabulary increases and as the ways of combining the vocabulary increases.

How will you possibly avoid "holes" in an XML vocabulary that has a complexity space that is in the trillions of trillions of trillions of permutations?

You can't.

ASSERTION: Large XML vocabularies must be avoided.

So, what's the solution?

The solution is to do what Lambda Calculus has done and what Simon Peyton-Jones has described in his article "How to write a financial contract". That is, create a small set of simple, well-specified primitives and a few combination rules.

So, how many primitives and how many combination rules?

Let me toss out a number: an XML vocabulary should not contain more than a dozen primitive elements and a handful of combination rules. That should be enough to generate all the richness one could possibly ever need. And you just might be able to formally specify your XML vocabulary and ensure that it has no "holes."  

Clearly this is the only way to go for mission-critical applications.



[1] This is a fabulous book on Lambda Calculus (but be prepared to study it, not just read it): http://www.amazon.com/Lambda-Calculus-Combinators-Introduction-Roger-Hindley/dp/0521898854/ref=sr_1_1?ie=UTF8&qid=1324146163&sr=8-1

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.