[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] XML Schema Datatypes - token, string, normalizedString - any difference?
Hi Folks, Below is a rather intriguing thing I discovered yesterday about the XML Schema datatypes token, string, and normalizedString. Consider these three declarations of element Title: <element name="Title" type="token"/> <element name="Title" type="string"/> <element name="Title" type="normalizedString"/> Note that each declaration uses a different datatype - token, string, normalizedString. Consider this instance of Title: <Title>_______</Title> Will the above declarations produce any differences with regards to validation? That is, are there certain values that yield "valid" with one datatype, but "invalid" with the other datatypes? Scroll down to see the answer .... One would intuitively think, "Of course they produce different validation results. After all, why have different datatypes if they produce the same results." In fact they all yield the same validation results! For example, this yields "valid" <Title>My Life and Times</Title> for all three declarations. And this yields "invalid" <Title>�</Title> for all three declarations. No matter what value you put within <Title> all three datatypes yield the same result. Pretty weird, right? Want to know why all three datatypes yield the same validation result? Scroll down ... Here's what the datatypes specification says: normalizedString: The value space of normalizedString is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters. token: The value space of token is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces. From these descriptions, you might conclude that this is an illegal normalizedString value <Title>My Life and Times</Title> since a normalizedString cannot have a carriage return. And you might conclude that this is an illegal token <Title> My Life and Times </Title> since a token cannot have leading/trailing spaces. However, they are both legal. Here's why: The default value of the whitespace facet for normalizedString is "replace". This means that before validating this instance <Title>My Life and Times</Title> the carriage return is replaced with a space, to produce <Title>My Life and Times</Title> And this is clearly a valid normalizedString. Likewise, the default value of the whitespace facet for token is "collapse". This means that before validating this instance <Title> My Life and Times </Title> the leading/trailing spaces are removed, to produce <Title>My Life and Times</Title> And this is clearly a valid token. Summary: With regards to validation, these three forms are identical: <element name="Title" type="token"/> <element name="Title" type="string"/> <element name="Title" type="normalizedString"/> Note: the PSVI for the three forms are different (i.e., the PSVI will tell you in the first case that the datatype is token, it will tell you in the second case the datatype is string, and so forth). If you are using Relax NG (which does not generate a PSVI) then the three forms are identical in every way. Thanks to Michael Kay, George Bina and Jerry Sheehan for explaining this to me. /Roger
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|