[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Document encodings

  • From: Rick Jelliffe <ricko@a...>
  • To: xml-dev@l...
  • Date: Fri, 06 Jul 2001 19:20:48 +0800

determining document encoding
Yes. There are a succession of features looked at, one after another until a
fixed result is determined.

 1) EXTERNAL: Information sent in the MIME header
 2) BOM: Presence or absense of Byte Order Mark (BOM) which is a
Unicode signal that allows you to know if you are using
16 or 32 bit characters, and the "endianness"
 3) FAMILY SIGNATURE: Presence of expected codes at the beginning of the
file (enough to know whether 8 bit codes are used, and
if they are ASCII-based or EBCDIC-based) for "<?xml"
 4) ENCODING: knowing the family signature is enough to read
the encoding parameter of the XML header.
 5) DEFAULT: otherwise UTF-8 (which also encompasses ASCII)

The important thing is that this is not guesswork. There is no scope for
one parser determining one encoding and another parser determining another
encoding: all XML processors should be able to say "Yes I can handle this
entity" or "no I cannot handle this entity".

All processors are required to support UTF-8 and UTF-16 encodings.

There are some character sets which have some instability about them:
see http://www.w3.org/TR/japanese-xml/  but this is an exception.

Cheers
Rick Jelliffe

----- Original Message -----
From: "Phil Ruelle" <philr@i...>
To: <xml-dev@l...>
Sent: Friday, 6 July 2001 PM 04:16
Subject: Document encodings


> A quick question:
>
> How do parsers work out what encoding an XML document is in
> (i.e. how is it able to read the 'encoding' attribute of the
> declaration)?
>
> I'm guessing that all the encodings XML supports have a common
> 'root' so the XML declaration can always be read using the 'base'
> character set. Is this correct or am I way off the mark?
>
> Many thanks,
>
> Phil Ruelle
>
> ------------------------------------------------------------------
> The xml-dev list is sponsored by XML.org, an initiative of OASIS
> <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To unsubscribe from this elist send a message with the single word
> "unsubscribe" in the body to: xml-dev-request@l...


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.