[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Character Encoding Detection
I am new to Character Encodings, and am trying to implement them for my XML parser. As I understand it, UCS has two flavors, UCS-2 and UCS-4, either of which can optionally have a UCS transformation applied to them. It is my understanding that you could author an XML document in either of these, without applying a transformation. The UTF-16 spec at: http://www.stonehand.com/unicode/standard/wg2n1035.html states: "In UTF-16, any UCS character from the BMP shall be represented by its UCS-2 coded representation." Now in UCS-2: '<' is 00 3C '?' is 00 3f So the start of a UCS-2 or UTF-16 encoded XML document would be 00 3C 00 3F In the section on autodetection of character encodings the XML spec states "00 3C 00 3F: UTF-16, big-endian, no Byte Order Mark (and thus, strictly speaking, in error)" My question is, why is this an error rather than a perfectly acceptable untransformed UCS-2 document? <?xml version="1.0" encoding="ISO-10646-UCS-2"?> --- Chris Hubick mailto:chris@h... http://www.hubick.com/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|