[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: [Summary] Why is Encoding Metadata (e.g. encoding="UTF-8")
Hi Roger, Thanks for distilling this kind of information. Costello, Roger L. wrote: > I have incorporated your comments. Please let me know if I am missing > anything, or have incorrectly interpreted your comments: > > http://www.xfront.com/specifying-encoding/ > > I am particularly interested in hearing if you agree with the > recommendations that I list. When discussing encoding detection, you write: If the external information is unreliable or unavailable then a parser examines the first 4 bytes of the document. XML and HTML documents optionally have a Byte Order Mark (BOM) in the first 4 bytes. The BOM may indicate the encoding. So if the document has a BOM then the parser may be able to determine the document's encoding. This is not technically correct, because a BOM is not required for the auto-detection algorithm to work. See [1], which describes cases both with and without a BOM, for encodings including UCS-4 and UTF-16 (big-endian and little-endian), EBCDIC, as well as UTF-8, ISO 646, ASCII, and other encodings that have the ASCII characters in their normal positions. See also David Carlisle's comments, which cover some of the same issues, and arrived while I was composing this message. It would also be useful for your document to link to relevant specs, for example [1]. Jim Ancona [1] XML 1.0 Reccomendation, Appendix F: Autodetection of Character Encodings, http://www.w3.org/TR/REC-xml/#sec-guessing
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|