[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Schema vs Schema-free (was: RE: XML Binary Characterization WG


fast schema
Elliote Rusty Harold wrote:
> this does rule out a number of the proposals that have 
> been made, particularly those that try to represent 
> numbers and similar things as binary rather than text,
> and thus lose the distinction between 1, 0001, and +1.
	Certainly, it is sometimes important to ensure that leading
zeros, gratuitous "+" signs, etc. are passed through systems
untouched. However, it is important to understand that the importance
varies depending on whether or not you are working with a schema.
	In a schema based system, if a field is defined as having
"integer" type and if the definition of integer states that "1,"
"00001", "+001", and "+1" are equivalent, then no one should feel
compelled to preserve any syntactic sugar unless there is some other
normative information that requires it to be preserved. Basically, the
author of the schema, in defining the schema and using type
definitions that establish an equivalence between representations, has
implicitly "authorized" the substitutions.
	On the other hand, in a schema-free system, there is no way
for a processor to "know" that something "is" a number. In the absence
of a schema, or some other normative document explicitly defines
equivalent representations of a value, a processor of the data should
faithfully pass the data, byte-for-byte.
	Thus, in a schema-free system, "1," "00001", "+001", and "+1"
must be considered as distinct, non-equivalent, and non-substitutable
values. On the other hand in a schema-based system, there is no
problem with substituting one form for another as long as the schema
permits it.
	Those who try to apply the rules of schema-free systems to
schema-based systems are doing something very, very dangerous. Because
"1" and "001" mean the same thing in a schema-based system, those who
argue that the precise original forms of value should be preserved are
probably under the false impression that there is some semantics
associated with those forms. Yet, unless the schema defines such
semantics, there is none and any assumptions concerning such
unspecified semantics are likely to be proved wrong -- potentially
with disastrous impacts.
	These disasters often happen when people get sloppy in their
interpretation of schemas. For instance a company might have a 5 digit
"employee" code that should have leading zeros. Some sloppy coder
might decide to map this to an integer... Of course, it is inevitable
that some down-stream code will look at an integer like "00001" and
convert it to "1"... (i.e. treat it like the schema says it can be
treated.) The results could be unpleasant. Of course, what the coder
*should have done* was recognize that the employee code is *NOT* an
integer, rather it is a five digit value that is composed only of
numeric characters and should be padded with leading "0" characters.
The field should be defined as such and appropriate types provided in
the schema.
	The distinction between these two environments is recognized
in the distinction between the ASN.1 based "Fast Schema" and "Fast
Infoset" systems. Fast Schema is only to be used when a schema is
present and, depending on the schema, will convert things that are
declared to be integers to binary forms with no guarantee of
preserving distinctions between things like "1", "001" and "+1". On
the other hand, "Fast Infoset" is for use in schema-free processing.
Thus, Fast Infoset (X.finfo) would preserve the exact forms of "1",
"001", and "+1". This is how it should be.
	Rules which apply to schema-based systems do not necessarily
apply to schema-free systems. Sometimes, but not always, "1," "001,"
and "+1" are really the same value.

		bob wyman


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.