[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: relax UTF-8 default? was: Towards XML 2.0

  • From: Andrew Welch <andrew.j.welch@gmail.com>
  • To: David Carlisle <davidc@nag.co.uk>
  • Date: Fri, 10 Dec 2010 10:43:32 +0000

Re:  relax UTF-8 default? was:  Towards XML 2.0
On 10 December 2010 09:28, David Carlisle <davidc@nag.co.uk> wrote:
> On 10/12/2010 08:56, Stephen Green wrote:
>> Does newXML being treatable as a string mean the *UTF-8 default*
>> requirement
>> is better relaxed in some way? I mean, a developer writing a string
>> doesn't want
>> to have to ensure it is all written in UTF-8 do they?
> why would any person ever have to know what the utf8 encoding is? If you
> want an "a" then you can enter an a without knowing what the latin1 or ascii
> or utf8 encodings of an a are. They happen to all be the same in that case.
> If you pick another letter such as pound sign, or e acute they happen to be
> different, but since typically a human doesn't know any of the numbers it
> doesn't make any difference, it's just a matter of what your text editor
> does when you hit save.

Yep - the "UTF-8/16 only" suggestion is to solve the problem of the
potential mismatch between the encoding in the prolog and the actual
encoding.. add to that the content-type when http is involved and you
have 3 areas to look at to determine the encoding...

This manifests itself as the common problem of "funny characters" in
the output, where UTF-8 has been parsed as windows 1252 or latin 1.
Or vice-versa where you get the "invalid byte sequence" error message.

One common cause of this is simply someone editing the xml file in a
text editor such as notepad... someone updates a value in a config
file and bang, the xml won't parse any more.

Making it UTF-8/16 only fixes the widespread "funny characters"
problem by always parsing in UTF-8/16, and on the flip side can
replace the obscure "Invalid byte sequence.." error message with "This
document is not UTF-8/16, please fix this by blah blah blah" or some
other more helpful message.

It also fixes the 3-way xml-over-http whats-the-encoding fun...

It also makes removing the prolog easier, and should allow a better
error message when parsing an empty file etc.

Andrew Welch

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.