|
[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: How to read the encoding of an XML document
> > while UTF-16 uses 2 bytes for most characters. > since it's gone midnight and I no longer need to be helpful in this > thread I could query the definition of most here, xFFFF not being most > of x10FFFF by some definitions of most. (Although depending whether you > view an unallocated unicode slot as a character, the numbers might be > different) If the Unicode scalar value is less that 0xFFFF it only requires two bytes using UTF-16 to encode but if it's greater than 0xFFFF then UTF-16 represents that value using a "surrogate pair" which is four bytes total in length. Since most Unicode characters have a value that's less than 0xFFFF, most characters will only require two bytes to encode. UTF-16 can encode all characters in the 0 to 0x10FFFF range. And so can UTF-8 and UTF-32. UCS-2, however, cannot encode characters above 0xFFFF. Jason. XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|

Cart








