[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Possible changes for XML 2nd Edition

  • From: Rick JELLIFFE <ricko@g...>
  • To: xml-editor@w..., "xml-dev@x..." <xml-dev@x...>
  • Date: Thu, 25 May 2000 04:04:41 +0800

xml editors utf 16be
John Cowan wrote:
 
> Issue PE28:
> 
> Currently the XML Recommendation is silent about the handling of
> documents that contain "impossible" bytes.  For example, the byte 0xFF
> cannot appear in any UTF-8 encoded document.  We are considering making
> such violations of the encoding a fatal error.
> 
> PRO: an improperly encoded document is not really a text document at all;
> nothing should be done on the basis of it.  XML's draconian error handling rule
> should lead to a "fatal error", which means the rest of the document must
> not be parsed.
> 
> CON: Some parsers may be relying on libraries supplied by the OS, which may
> not properly signal erroneous input.  Is it too great a burden on the
> parser implementor to impose this restriction?
 
I think this goes too far, for basic WF.

Instead, I would propose another level of validity "character validity"
which XML processors should be encouraged, but not required, to support,
or to support as much as they can. Unlike validity, which sits on top
of well-formedness, "character validity" sits more-or-less underneath
well-formedness as XML's soft underbelly.

An XML document that was "character valid" would
 1) not have any impossible bytes in any entity
 2) not have a BOM if the encoding="utf16le" or "utf16be" (and any other
encoding constraints)
 3) all names in markup must follow the NAMECHAR conventions.
 4) all data Unicode-normalized

This would keep a basic XML implementation that did not support
"character
validity" simple:
 1) it can use any library for transcoding
 2) it does not have to have any special BOM handling for utf16xe
 3) it can tokenize tags based on whitespace and delimiters rather than
NAMECHAR or NAMESTRT
 4) normalization not checked/enforced

A character-validating processor should be the goal for any XML
processor
not specifically aimed at ultra-lightweight uses.


Rick Jelliffe

***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@x...&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.