|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] UTF-8
Does the UTF-8 encoding require that the minimum byte count be used when a character is encoded. Recall that the form of a UTF-8 encoding is: 0xxxxxxx 110xxxxx 10xxxxxx 1110xxxx 10xxxxxx 10xxxxxx 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx So one could, for example, claim that: 00111111 and 11000000 10111111 represent the same character, #x3F, or 11110001 10111111 10111111 10111111 and 11111000 10000001 10111111 10111111 10111111 represent #x7FFFF (note: x10000 < x7FFFF < x10FFFF as so is legal). The reason I ask is whether an XML parser has to worry about 5 and 6 byte UTF-8 encodings or can it *allways* assume that the values represented by such encoding are not legal unicode characters. Thanks. Richard Emberson xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








