[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Why is Encoding Metadata (e.g. encoding="UTF-8) putIns

  • From: Rick Marshall <rjm@z...>
  • To: Dave Pawson <davep@d...>
  • Date: Thu, 20 Sep 2007 23:15:55 +1000

Re:  Why is Encoding Metadata (e.g. encoding="UTF-8)     putIns
Already there is a significant problem with traditional file systems.

Linux (... Unix) keeps a small amount of meta data - user, group, 
access, etc in the inode structure. selinux stores acl.

Windows had nothing more than a file name - but now contains acl 
information.

However, when transferring files this information is not necessarily 
transferred, particularly when non-system utilities are used. What hope 
is there if more meta information is stored in the file indexing system. 
Even this is a struggle to define because linux directories are files 
referring to inodes, while windows has directories...

Then if we move meta information into the files so that it moves with 
the files, how do we get that to line up eg the owning user may not 
exist on the target system.

Meta information in the file is also a conflict with the basic concept 
that a file is a stream of bytes.

And so it goes.  This is something that will probably remain messy forever.

The current XML solution of PI's is good for XML without breaking the 
many other (often not standardised) file structures.

Rick

Dave Pawson wrote:
> Rick Jelliffe wrote:
>> Jonathan Robie said:
>>> Michael Kay wrote:
>>>>> Why? Shouldn't metadata be external to a document?
>>>>>
>>>> Sadly, most of us are using file systems based on 1960s thinking that
>>>> don't
>>>> allow metadata to be held anywhere other than in the content of the 
>>>> file
>>>> (or potentially in its name).
>
>
>> There has always been a split between systems based on "magic 
>> numbers" (in
>> the UNIX sense) which the XML encoding header is an elaborate example 
>> of,
>> systems based on richer file structures (e.g. old Mac) and systems using
>> registries. But it is the file read and write APIs that are the weak 
>> links
>> in the chain: information about encoding is lost when writing out a 
>> file,
>> and the only way to maintain it is to write it somewhere. And the only
>> place to write it that is cross-platform and cross-application and
>> transparent is inside the file itself.
>>
>> Actually, it continues the trend of web resources being self-identifying
>> rather than requiring external metadata;
>
>
>> For XML we looked at two different mechansisms: Gavin Nicol suggested 
>> that
>> we should just use the existing MIME header syntax at the start of the
>> file. This had two drawbacks: first, when you use EBCDIC it means a file
>> in two different encodings, and second the file was not longer an
>> acceptable SGML entity. So the PI syntax was adopted instead, even 
>> though
>> it meant a disconnect from MIME header syntax.
>
>
> Is there anything that could be proposed, based on these two ideas.
> Clearly XML only half got it right with the PI notation + internal to 
> the file metadata.
> Mike rightly decries the ancient filesystems we're using for not
> addressing encoding.
> URI's extend to the file system, what (if anything) has been found 
> succesful when working with files in order to address encoding?
>
> Do any of the OS's use something else? I can't understand how this
> problem (which must have bugged most readers on this list at one
> time in the past) hasn't been faced up to in IETF or W3C or NISO.
>
> What might it look like when solved? A directory based meta container?
> Ricks idea of something at the file read/write level?
>
> Puzzled.
>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.