[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: The String Datatype is the Worst Datatype Ever Created

  • From: Thomas Passin <list1@tompassin.net>
  • To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
  • Date: Thu, 24 Sep 2015 09:23:13 -0400

Re:  The String Datatype is the Worst Datatype Ever Created
You will not be able to interchange textual data that comes from a database, then, because (practically by definition), databases contain variable - and usually not only enumerated - data.

I wonder what kind of XML files you envision that don't contain data from databases. For computer to computer processing, it would seem that you could only send pre-arranged command signals. Not much need for XML for that.

Even the IP packets you brought up as an exemplar are used to transmit varying textual data. It's only for the control slots that enumerations are used, not in the contents that the packets carry.

TomP

On 9/23/2015 7:27 AM, Costello, Roger L. wrote:
Hi Folks,


  Highlights

String elements are cesspools of garbage and hacker exploits.

Use enumerations.

Enumerations are symbols. Symbolic processing is the key to success.


  Scope of discussion

The following discussion applies only to data that is to be processed
exclusively by machines (i.e., machine-to-machine processing). It does
not apply to data that is to be processed by humans.


  Lately I have been studying the IP protocol

IP has a header with numerous fields. Some of the fields have numeric
values: Version, Header Length, Fragment Offset, etc. Some of the fields
are "text" fields (I quote the word "text" because IP is actually a
binary format; nonetheless, these "text" fields denote symbols with
well-defined semantics): Type of Service, Flags, Protocol, etc. What is
significant about these text fields is that their allowable values are
enumerated:

Type of Service: Normal Delay, Low Delay, Normal Throughput, High
Throughput, Normal Reliability, High Reliability

Flags: May Fragment, Don't Fragment, Last Fragment, More Fragments

Protocol: TCP, UDP


  I've also been studying the TCP protocol

TCP also has a header with numerous fields. Some of the fields are
numeric, some are text. The text fields have enumerated values, e.g.

Control Field: URG (urgent), ACK (acknowledgement), PSH (push), RST
(rest connection), SYN (synchronize), FIN (finish)


  What do these data formats (protocols) have in common?

Answer: they don't allow text fields to contain arbitrary (unspecified)
strings. The allowable values are enumerated and clearly defined.

That makes sense, right? After all, how would machines (routers,
gateways) make routing decisions on arbitrary strings? Answer: they can't.

Likewise, machines cannot process XML documents that contain arbitrary
strings. Don't use the string datatype in XML Schemas. Ever.


  You might argue …

/But Roger, your favorite example is a Book:/

<Book>
<Title>Illusions The Adventures of a Reluctant Messiah</Title>
<Author>Richard Bach</Author>
<Date>1977</Date>
<ISBN>0-440-34319-4</ISBN>
<Publisher>Dell Publishing Co.</Publisher>
</Book>

/How do you intend to enumerate all the authors in the world? All the
book titles in the world? All the publishers in the world?/

Answer: I will remove those elements. Author, Title, and Publisher have
no business being in an XML document that is to be processed by
machines. If you can't enumerate it, don't include it. In this example,
ISBN is sufficient to identify the book. None of the other fields are
needed. The Title, Author, and Publisher (string) elements are simply
cesspools for garbage and hacker exploits.


  What about the cost to update enumeration lists?

/Modifying a schema to include a new enumeration value is expensive,
that's why we use strings/. I'm not buying that argument. So what if a
string datatype enables your XML instances to use new data; if the
machines haven't been updated to understand the new data, you have
achieved nothing.


  Symbolic Processing

An enumeration is a symbol. When you use enumerations, you are in the
realm of symbolic processing. A couple months ago I attended a talk by
Stephen Wolfram and he said a key to his company's success is symbolic
processing. I believe this is what he was referring to.


  Are constrained strings okay?

No.  Suppose you set maxLength to 5 (you don't constrain the character
set). Well, the number of permutations of 5 characters over the entire
Unicode character set is astronomical. There's no way you are going to
be able to specify the semantics of each permutation. Stick with
enumerations.


  Comments?

/Roger







[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.