Re: Unicode confusion
> No one's disagreeing with the use of Unicode; we're talking about > which character encoding we'll use to represent it. You can represent > Unicode in variable-width 8-bit or 16-bit encodings or in fixed-width > 32-bit encodings. My reading of the Unicode 2.x standard is that the above isn't strictly correct. It is correct if you change "Unicode" to "the ISO 10646 Universal Character Set" though. > Note that Java uses UTF-16, which isn't quite fixed-width, though no > one really notices. It seems to me that Java uses Unicode, which maintains the semantics that 16 bits equals one character. Surrogates are characters in Unicode, whereas those code points are not legal UCS characters, but only artifacts of the UTF-16 encoding. Unicode looks like UTF-16, but the semantics are slightly different. So a file using UTF-16 encoding containing a single "astral plane" character of the UCS would be interpreted by Unicode as a file containing *two* surrogate characters. (I think it's a strange tack to take, but it seems fairly clear to me that this was their position as of Unicode 2.x. I haven't looked at 3.0 yet, so things may have changed since then.) The XML character set is the UCS, not Unicode. Cheers, -Peter- housel@a... xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To unsubscribe, mailto:majordomo@i... the following message; unsubscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format