[Home] [By Thread] [By Date] [Recent Entries]
----- Original Message From: "David Carlisle" <davidc@n...> >> Now that it knows the "real" encoding it interprets the rest of the >> document using the encoding it found in the XML declaration. > > That still makes it sound as if the encoding declaration is read using a > different encoding from the rest of the document. Once an encoding has > been determined then the encoding declaration line itself must be > consistent with that encoding. For me, the above statement isn't correct. If an XML document starts out with: <?xml version="1.0" encoding="iso-8859-2"?> The parser will analyse the first character by reading up to 4 bytes of input (as described in the algorithm mentioned). In this case it will work out that the first character corresponds to the single byte ASCII code for '<'. On that basis, it will assume that it is UTF-8. It will then proceed to read the rest of the XML decl and on interpreting the encoding attribute will revise it's guess to be iso-8859-2. In general, having guessed UTF-8 (variable number of bytes per character), the encoding attribute could change it to the various Latin character sets (iso-8859-* - single byte, but having values 0-255), or something like Shift-JIS which uses an escape sequence to escape out of the ASCII plane. Pete. -- ============================================= Pete Cordell Codalogic for XML Schema to C++ data binding visit http://www.codalogic.com/lmx/ =============================================
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



