[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: The impact of data format selection on application develop

  • From: Norman Gray <norman.gray@glasgow.ac.uk>
  • To: Roger L Costello <costello@mitre.org>
  • Date: Tue, 12 Jul 2022 13:29:33 +0100

Re:  The impact of data format selection on application develop
Roger, hello.

On 12 Jul 2022, at 12:51, Roger L Costello wrote:

> Missouri River	2,341
>  Mississippi River	2,340
>  Yukon River	1,979
>  Rio Grande	1,759
>
>  When I provide that data (file) to someone I will inform them:
>
>  Hey, the file consists of the lengths of rivers in the U.S. Each line of the file contains two fields: the U.S. name of a river and its length. The fields are separated by a tab. The length is expressed in miles as an integer and groups of digits are separated by the comma symbol (such as 1,759).

A better description would be 'a TSV file with river name in column 1 and integer length in km in column 2'.  That's 'simpler' because it's even shorter than yours, but remains clear, to certain people, what's required to process it.

The word 'simpler' is in quotes, there, because as has already been discussed in this thread, there's significantly more to CSV or TSV than meets the eye (escapes, line-endings, and so on), so this is 'simpler' only for a recipient who has seen this before and knows what to do.  _In that context_, the data description is short, and appears simple.

So 'simple' data formats are actually 'high-context' data formats (compare [1]).

Note that in that description I didn't mention that I'd expect the integer _not_ to include a comma (which is useful only for display, and which would conventionally be regarded as hostile in a transmission format), and I did choose to add a little explicit context in mentioning the units of the second column (you _did_ mean km, didn't you,... hmm?).  So I've thoughtfully chosen what context to make explicit, and expected that the recipient of the description will know the Right Thing To Do.

So what JSON or XML would be doing, in the alternative choice of file format, would be providing explicit context in a different conventional way.

(It also occurs to me that, by saying 'TSV', the explanation above is also arguably _usefully opaque_ to someone who doesn't have the context I'm guessing they have, so if they get back to me, puzzled, I can tell that about them and advise differently.)

>  Without that explanation, the file (data) is useless. But that holds true for an XML file containing the same data and a JSON file containing the same data. One might argue that with XML the tags describe the data, so an accompanying explanation is not needed. But relying on XML tags to explain data is folly (e.g., what if the developer uses generic tags such as <li>, such tags hardly "explain" the data). I would argue, regardless of the data format, there needs to be some accompanying explanation about the data. And if that's the case, then heck, use the simplest possible data format (use the super-simple data format shown above) and take advantage of the plethora of tools available for processing super-simple data formats.

'Simple' formats are great.  If all you were sending me was a list of names and lengths, then I'd thank you for sending me something as simple as the above, because I'm confident I could easily turn it into whatever I wanted.

But -- and I think this is the key point -- simple formats run out of steam really quickly, and if requirements change, then the simple format, hackily extended in this direction and that, will start to look more hellish, faster, than any of the more sophisticated formats.  Or: simplicity is sometimes brittle.

Thus there's a matter of technological taste, and judgement here.

Best wishes,

Norman


[1] https://en.wikipedia.org/wiki/High-context_and_low-context_cultures

-- 
Norman Gray  :  https://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.