[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Heed this warning about Postel's Prescription

  • From: Rick Jelliffe <rjelliffe@allette.com.au>
  • To: "Roger L. Costello" <costello@mitre.org>
  • Date: Tue, 30 Jun 2015 11:41:07 +1000

RE:  Heed this warning about Postel's Prescription

Nonono! 

The BOM way 1. is allowed by XML and arose because of a gap in the Unicode specifications, and therefor an early ambiguity inherited by XML.

But way 2 completely goes against XML draconian error WF rules, and is the kind of muddle-headed hacking that has made i18n too difficult for most developers to understand or ever get right, with systems acting differently. Developers are, in general, fantastically willing to come up with the wrong theory about what is causing an encoding error, only matched by their determination to avoid looking at the actual byte codes directly, using a hex editor. The most common cause of 'decoding errors' is that the XML is being read using  an encoding that does not match the actual encoding of the resource [ie the XML header was generated wrong at write time, and/or is not being used at read time] : allowing silent 'resynchronizing' corrupts the data, delays problem detection, and allows the developer to defraud their bosses by claiming to have implemented XML when all they have done is disable error detection.

Rick

On 29/06/2015 5:59 AM, "Costello, Roger L." <costello@mitre.org> wrote:
How might Postel's Law be applied to web services that receive XML and sends out XML?

Here are two ways:

1. The web service is willing to receive UTF-8 XML documents containing a pseudo-BOM. The web service sends out UTF-8 XML documents without a pseudo-BOM. [1]

2. The web service is willing to receive XML character streams with Unicode decoding errors: it processes the character stream by replacing the offending bytes by the Unicode replacement character U+FFFD until it manages to resynchronize the UTF-{8,16} byte stream. The web service sends out XML documents without character decoding errors. [2]

/Roger

[1] See Rick Jelliffe's post on the xml-dev list: http://lists.xml.org/archives/xml-dev/201506/msg00065.html

[2] See Daniel Bunzli's post on the unicode list: http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0247.html


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.