[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Re: Where does the "nothing left but toolkits" mythcome fr


Re:  Re: Where does the "nothing left but toolkits" mythcome fr
Fair enough. I wasn't thinking at that level of round-tripping, which I 
agree is problematic. What worried me about ERH's example was the 
potential for not even being able to round-trip text -- an issue that 
hasn't come up before (modulo entity references).

The problem is not limited just to values, such as would occur with 
binary representations of real numbers. It also applies to formats. 
Dates and numbers have multiple formats, some of which may inadvertently 
carry information.

For example, French geneological data might represent dates from the 
Napoleonic period using the Napoleonic calendar; since this is how the 
data is originally recorded, it should probably be continued to be 
represented that way, even though these dates can be converted to modern 
date systems.

Similarly, a transcription of notes written by a criminal suspect might 
include dates in a particular format. Since this format might be a clue 
to the suspect's nationality or background, changing the format would 
mean losing information.

Obviously, this additional information could be represented by 
additional metadata. But it is naive to think that all document 
designers will add such metadata.

-- Ron

Bob Foster wrote:
> Ronald Bourret wrote:
>  > This points out something that should be a requirement for binary XML:
>  > lossless roundtripping. In other words, you should be able to go from
>  > the text serialization to the binary serialization and back losslessly
>  > (within the confines of canonical XML). Same is true for binary <=>
>  > text, binary <=> binary, and (of course) text <=> text.
> 
> Of course text <=> text? This doesn't work today. I don't keep a list, 
> but off the top of my head. Information in the text such as character 
> references and internal general entity references in attribute values 
> are removed by parsers (e.g., SAX) and are not available to write back 
> out again. This is a perennial source of XSLT questions. Until SAX2 
> Extensions 1.1, SAX didn't report the xml declaration, so the 
> application didn't know the original encoding. The application couldn't 
> tell which attribute values were specified in the document and which 
> came from the DTD as defaults. As ERH points out, canonicalization loses 
> the DOCTYPE declaration. And so on.


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.