[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

How to make better files?

  • From: Roger L Costello <costello@mitre.org>
  • To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
  • Date: Fri, 12 Aug 2022 20:32:21 +0000

How to make better files?

Hi Folks,

Scenario: There is a file.

What’s in the file? What kind of file is it? Who produced it? When? What kind of data does it hold? Is it safe to open?

Where will you find answers to those question?

Old school Unix used a stream-of-bytes metaphor for files.  Every file is just a sequence of bytes. Some authors refer to this as formatless files. Michael Kay points out that, in reality, the files are not formatless; rather, their format is simply not known at some level of the system, and it is up to applications to determine the file’s format. Michael Kay wrote:

Applications are left to guess by making inferences from
             the file name extension, or by sniffing the content, all of
             which is unreliable and insecure.

Liam pointed out that there is a Unix command called “file” which does a pretty decent job of inspecting files and figuring out what they are.

There is a spectrum of “file knowingness.” At one end of the spectrum is old school Unix: a file is a stream of bytes. Nothing is known about the file. You need to sniff its content and make inferences. What lies at the other end of the spectrum? How would you characterize that end of the spectrum? How about this characterization: We know virtually everything about files. We know its character encoding. We know what application produced it. How long it is. When it was created. Where it was created. What kind of data it contains. What kind of applications can process it. Whether it is or isn’t safe to open. Do you agree with that characterization? What else would you add?

At which end of the spectrum do you want your files? Is one end of the spectrum better? Better in what way? Should we all strive to transition our files to one end of the spectrum?

Where does XML live in the spectrum? I suspect it lives somewhere in the middle. Michael Kay argues that XML doesn’t do a particularly good job of “file knowingness,” as he wrote:

Conventions like putting the encoding in a header or using
strings like xmlns="..." to identify the vocabulary, are ad-hoc
and unsystematic, and they're very often at the wrong level
of the system (you should know the encoding before you start
trying to interpret the characters).

How can we make better XML?

How can we make better files?

/Roger

 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.