[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: The String Datatype is the Worst Datatype Ever Created

  • From: "Renner, Scott A." <sar@mitre.org>
  • To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
  • Date: Fri, 25 Sep 2015 15:32:41 +0000

RE: The String Datatype is the Worst Datatype Ever Created

I see that Roger has started another interesting discussion.  Let me throw in a thought or two.

 

It is impossible to completely separate the data from the people.  People supply the meaning of all data, which is always relative to the purpose of at least one person.  Machines don’t think.  This is Dr. Scott’s Rule of Data #1.

 

It is difficult but useful to distinguish between data for machine processing and data for user presentation.  The key factor is the situation of the person who understands the data and provides the meaning relative to the purpose.  For machine processing, that person is the programmer, who understands the data specification but never sees the actual data at runtime.  I have slides for this concept in various NIEM briefings.

 

Some data is for both machine processing and user presentation.  Consider the street address.  One might think this is always for user presentation, but not so.  Machine processing reads that address and calculates carrier routes for the post office, allocates packages and does route planning for trucks, etc.

 

For some kinds of machine processing it is usually better to use enumerations instead of strings when feasible.  For example, if the data drives a case statement, like this

 

                switch (country_code) {

                case CAN:

                                printf(“I think you mean North Montana :-)\n”);

                                break;

 

Database index fields are another place where you would rather have enumerations than strings.  Short messages over a bandwidth-constrained network is yet another.  But there are always exceptions. Difficulties in configuration management sometimes mean you must allow a string value in addition to your enumeration / controlled vocabulary.  Some data elements cannot be feasibly enumerated, like National Stock Number.  And so forth...

 

So I think that properly qualified, a variant of Roger’s rule might make sense, something like:  For machine-processed data, an enumeration / controlled vocabulary is usually preferable to an unconstrained string, when feasible.  Feasibility is high when the set of terms is small and relatively stable.  Feasibility is low when the set is large and fast-changing.  An enumeration of state codes is feasible; an enumeration of book titles, um... not so much.

 

cheers,

-- scott

 

--

Dr. Scott Renner, The MITRE Corporation

+1 703-983-1206 (office); +1 978-831-2598 (cell)

 

 

 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.