[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

The String Datatype is the Worst Datatype Ever Created

  • From: "Costello, Roger L." <costello@mitre.org>
  • To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
  • Date: Wed, 23 Sep 2015 11:27:16 +0000

The String Datatype is the Worst Datatype Ever Created

Hi Folks,

Highlights

String elements are cesspools of garbage and hacker exploits.

Use enumerations.

Enumerations are symbols. Symbolic processing is the key to success.

Scope of discussion

The following discussion applies only to data that is to be processed exclusively by machines (i.e., machine-to-machine processing). It does not apply to data that is to be processed by humans.

Lately I have been studying the IP protocol

IP has a header with numerous fields. Some of the fields have numeric values: Version, Header Length, Fragment Offset, etc. Some of the fields are "text" fields (I quote the word "text" because IP is actually a binary format; nonetheless, these "text" fields denote symbols with well-defined semantics): Type of Service, Flags, Protocol, etc. What is significant about these text fields is that their allowable values are enumerated:

Type of Service: Normal Delay, Low Delay, Normal Throughput, High Throughput, Normal Reliability, High Reliability

Flags: May Fragment, Don't Fragment, Last Fragment, More Fragments

Protocol: TCP, UDP

I've also been studying the TCP protocol

TCP also has a header with numerous fields. Some of the fields are numeric, some are text. The text fields have enumerated values, e.g.

Control Field: URG (urgent), ACK (acknowledgement), PSH (push), RST (rest connection), SYN (synchronize), FIN (finish)

What do these data formats (protocols) have in common?

Answer: they don't allow text fields to contain arbitrary (unspecified) strings. The allowable values are enumerated and clearly defined.

That makes sense, right? After all, how would machines (routers, gateways) make routing decisions on arbitrary strings? Answer: they can't.

Likewise, machines cannot process XML documents that contain arbitrary strings. Don't use the string datatype in XML Schemas. Ever.

You might argue …

But Roger, your favorite example is a Book:

<Book>
   
<Title>Illusions The Adventures of a Reluctant Messiah</Title>
   
<Author>Richard Bach</Author>
   
<Date>1977</Date>
   
<ISBN>0-440-34319-4</ISBN>
   
<Publisher>Dell Publishing Co.</Publisher>
</Book>

How do you intend to enumerate all the authors in the world? All the book titles in the world? All the publishers in the world?

Answer: I will remove those elements. Author, Title, and Publisher have no business being in an XML document that is to be processed by machines. If you can't enumerate it, don't include it. In this example, ISBN is sufficient to identify the book. None of the other fields are needed. The Title, Author, and Publisher (string) elements are simply cesspools for garbage and hacker exploits.

What about the cost to update enumeration lists?

Modifying a schema to include a new enumeration value is expensive, that's why we use strings. I'm not buying that argument. So what if a string datatype enables your XML instances to use new data; if the machines haven't been updated to understand the new data, you have achieved nothing.

Symbolic Processing

An enumeration is a symbol. When you use enumerations, you are in the realm of symbolic processing. A couple months ago I attended a talk by Stephen Wolfram and he said a key to his company's success is symbolic processing. I believe this is what he was referring to.

Are constrained strings okay?

No.  Suppose you set maxLength to 5 (you don't constrain the character set). Well, the number of permutations of 5 characters over the entire Unicode character set is astronomical. There's no way you are going to be able to specify the semantics of each permutation. Stick with enumerations.

Comments?

 

/Roger

 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.