Re: [Summary] Why is Encoding Metadata (e.g. encoding="UTF-8")

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

From: "Pete Cordell" <petexmldev@t...>
To: "David Carlisle" <davidc@n...>, <costello@m...>
Date: Thu, 20 Sep 2007 16:46:45 +0100

----- Original Message From: "David Carlisle" <davidc@n...>

>> Now that it knows the "real" encoding it interprets the rest of the
>> document using the encoding it found in the XML declaration.
>
> That still makes it sound as if the encoding declaration is read using a
> different encoding from the rest of the document. Once an encoding has
> been determined then the encoding declaration line itself must be
> consistent with that encoding.

For me, the above statement isn't correct.  If an XML document starts out 
with:

<?xml version="1.0" encoding="iso-8859-2"?>

The parser will analyse the first character by reading up to 4 bytes of 
input (as described in the algorithm mentioned).  In this case it will work 
out that the first character corresponds to the single byte ASCII code for 
'<'.  On that basis, it will assume that it is UTF-8.  It will then proceed 
to read the rest of the XML decl and on interpreting the encoding attribute 
will revise it's guess to be iso-8859-2.

In general, having guessed UTF-8 (variable number of bytes per character), 
the encoding attribute could change it to the various Latin character sets 
(iso-8859-* - single byte, but having values 0-255), or something like 
Shift-JIS which uses an escape sequence to escape out of the ASCII plane.

Pete.
--
=============================================
Pete Cordell
Codalogic
for XML Schema to C++ data binding visit
 http://www.codalogic.com/lmx/
=============================================

Follow-Ups:
- Re: [Summary] Why is Encoding Metadata (e.g. encoding="UTF-8") put Inside the XML Document?
  - From: David Carlisle <davidc@n...>

References:
- Why is Encoding Metadata (e.g. encoding="UTF-8) put Inside the XML Document?
  - From: "Costello, Roger L." <costello@m...>
- RE: Why is Encoding Metadata (e.g. encoding="UTF-8) put Inside the XML Document?
  - From: "Michael Kay" <mike@s...>
- Re: Why is Encoding Metadata (e.g. encoding="UTF-8) putInside the XML Document?
  - From: Jonathan Robie <jonathan.robie@r...>
- Re: Why is Encoding Metadata (e.g. encoding="UTF-8) putInside the XML Document?
  - From: "Rick Jelliffe" <rjelliffe@a...>
- [Summary] Why is Encoding Metadata (e.g. encoding="UTF-8") put Inside the XML Document?
  - From: "Costello, Roger L." <costello@m...>
- Re: [Summary] Why is Encoding Metadata (e.g. encoding="UTF-8") put Inside the XML Document?
  - From: David Carlisle <davidc@n...>
- RE: [Summary] Why is Encoding Metadata (e.g. encoding="UTF-8") put Inside the XML Document?
  - From: "Costello, Roger L." <costello@m...>
- Re: [Summary] Why is Encoding Metadata (e.g. encoding="UTF-8") put Inside the XML Document?
  - From: David Carlisle <davidc@n...>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >