[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Namespace Identifiers - URI, URN, URL?

Subject: Re: Namespace Identifiers - URI, URN, URL?
From: "Michael Beddow" <mbnospam@xxxxxxxxxxx>
Date: Wed, 29 Aug 2001 17:25:00 +0100
utf 8 encoded document
Sorry about the previous empty posting!

> No.  Just be sure your document uses only UTF-8 characters if you
> don't put it, because that's the default character set defined by the
> XML Recommendation.  Any non-UTF-8 character sequences in your XML
> document (such as extended ASCII/ISO-8859-1 characters) will cause
> your XML document to become invalid, and hence unparseable by any
> conformant XML parser.  It's better to put the XML declaration in and
> explicitly state the character set you use, e.g.:

David has already commented on this, but to draw out the misleading
things a bit more explicitly:

It's probably best to avoid a phrase like "ISO-8859-1 characters"
altogether, because it's dangerously ambiguous. It can mean (1)
"abstract characters to which ISO-8859-1 assigns code points" OR (2)
"character data which has been encoded using the code-points assigned to
abstract characters by ISO-8859-1 and where those code points are
represented as single 8-bit numbers"  Sense (1) simply means that such
abstract characters are present, but makes no statement about how they
are encoded. So in sense (1) you can have as many "ISO-8859-1"
characters as you like in an XML document declared explicitly or by
default to be utf-8 encoded, provided they are indeed utf-8 encoded. But
if you try to use "ISO-8859-1 characters" in sense (2) in a supposedly
utf-8 encoded document, the parser will throw a fatal error if your
characters include any outside the ascii subset, because many of those
8-bit values are illegal in utf-8 except in certain positions in a
multi-byte sequence.

The trouble is its very easy to get sense (2) characters if you have to
process data handed to you by people who neither know nor care about
encoding issues.

So the original posting ought to have read:
"Just be sure your document uses  UTF-8 encoding if you
don't put it, because that's the default encoding..."
then continued:
"Any bytes in your XML  document that are not part of a valid
utf-8 encoding sequence will cause your XML document to become invalid"
and concluded with:
"It's better to put the XML declaration in and explicitly state the
encoding used"

Apologies again to those who know all this, but until the enigma of how
to get it into the FAQ in a generally understandable way is solved the
struggle has to continue....

Michael
---------------------------------------------------------
Michael Beddow   http://www.mbeddow.net/
XML and the Humanities page:  http://xml.lexilog.org.uk/
---------------------------------------------------------



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.