[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: The illusion of simplicity and low cost in data designand

  • From: Michael Kay <mike@saxonica.com>
  • To: Roger L Costello <costello@mitre.org>
  • Date: Sun, 14 Aug 2022 15:31:27 +0100

Re:  The illusion of simplicity and low cost in data designand


On 14 Aug 2022, at 14:45, Roger L Costello <costello@mitre.org> wrote:

Michael Kay wrote this regarding whether information about a file should be inside or outside the file:

 

  • Inside when viewed at one layer, outside when viewed at a different layer.
  • Or to put it another way, you don't need to know. You don't care whether the bits you get to see are contiguous on disk or not.

 

I need a concrete example please. Suppose I have a program that can only process XML documents that are encoded using the 8859-8 character set (Latin/Hebrew). An XML document arrives. How will my program determine whether or not it can process the XML document?

 



You've already muddled the layers. Your program only cares that it's XML, it doesn't care what the encoding is. You program calls something like

if (file.hasContentType("application/xml")) {
  parseXml(file);
}

The XML parser does

Reader reader = file.decode();

The operating system knows the encoding of the file and decodes it as characters.

Of course, there's always a possibility that the operating system doesn't know the encoding of the file, because no-one told it. So you need some kind of API like

file.setEncoding("iso-8859-8")

which would normally be done automatically when you write a file using a character-based Writer.

Similarly there's a possibility the operating system doesn't know the media type of the file, so you need an API like

file.setContentType("application/xml")

Again, one would hope that applications that write XML to filestore will call this API to register the media type.

Of course, this can be wrong, just as HTTP content headers can be wrong. But it's a lot more likely to be right than if you just use guesswork.

Doing this isn't actually fundamentally difficult, it just means making the inode data (that holds metadata about files) extensible. Only when you start trying to make things secure (for example restricting access to a file to a particular application) does it start to affect the system architecture.

Michael Kay
Saxonica






[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.