[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Unrecognized encodings (was Re: XML 1.0 Conformance Test Resu lts)

  • From: Rob Lugt <roblugt@e...>
  • To: Eric Vermetten <EVermetten@n...>, 'Tim Bray' <tbray@t...>
  • Date: Mon, 11 Jun 2001 23:55:01 +0100

xml utf8 validator
Title: RE: Unrecognized encodings (was Re: XML 1.0 Conformance Test Results)
Firstly, I have to admit that the ElCel validator does not accept UTF8 as an alias for UTF-8.  In my earlier post I stated that it accepts some encoding aliases.  In fact it doesn't currently accept any aliases, only the IANA names.  I should look at the code before making such assertions!
 
I was interested why I had this false memory.  On looking back over our decisions, I see that we did consider accepting aliases, mainly because Java InputStreamReader works this way and we modelled some of our C++ io classes on Java.  However we decided that the XML 1.0 rec recommends being strict so that is what we implemented.  Tim Bray's comments have raised some doubt that this is the best approach.
 
Our general philosophy when writing the XML Validator was to be as strict as possible.  After all, one task of the XML validator is to give as much assurance as possible that documents passing through successfully are guaranteed not to be rejected by another conforming processor down the line.  However, we do accept ISO-8859-1 and US-ASCII encodings, which other processors are not guaranteed to accept, so that partially diminishes our validity guarantee.
 
Regards
Rob Lugt
----- Original Message -----
Sent: 11 June 2001 22:42
Subject: RE: Unrecognized encodings (was Re: XML 1.0 Conformance Test Resu lts)

Tim Bray wrote:
>is the word "should".  In any case, I'd write software to accept
>UTF8, but I'd complain at anyone who sent me data so labeled. -Tim

Perhaps a bit hard to argue with a veteran such as Tim Bray, but
from what I know of the history of  SGML and
XML, I wonder: when designing XML, was not one of
the main issues to make something with
less optional features than SGML?
XML has made a clear choice for the standard support of
the Unicode/UCS character set.
Shoudn't the (most commonly used?) Unicode
encodings "UTF-8" and "UTF-16" and their labeling
be treated as one of the cornerstones for XML(parsers)?

Personally I like it when something complains
heavily (i.e. fatal error). It contributes
to clarity and stability. For XML parser writers
as well as for users who switch between then
this then that brand of XML parser.
For such issues, flexibility leads to less security IMHO.

Furthermore, I don't quite see the difference between:
a) writing flexible software (by ones own hand, I presume)
while at the same time
b) complaining when a not so accurate encoding labeling
is received.
Perhaps is this perceived as a bit more personal?

Regards,
Eric Vermetten


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.