[Home] [By Thread] [By Date] [Recent Entries]
Timothaeus Bray scripsit: > [D]id you know the BOM was legal in UTF-8? The BOM isn't just a BOM, it's also the ZWNBSP (zero-width non-breaking space; no, I do not know how to pronounce that acronym) character, and is interpreted as a BOM only at the beginning of UCS-2 or UTF-16 documents. Not to worry; the character is as near to a no-op as Unicode allows for. > And of course by the fact that Unicode/10646 is a moving target. Only sort of. 8859-1 is theoretically a moving target too, except that all the slots are full; CP 1252 is a moving target that has just moved (by adding the euro at 0x80). In all these cases, characters can be added (in principle) but not moved or deleted (any more). > In practice, > I've never actually seen anything outside of the BMP, but the > experts agree they're showing up real soon now. Not until Unicode 4.0, unless someone wants to use the private-use planes 15 and 16. > How to get it in? Something like 𐌳 I expect. Exactly so. Or the decimal NCR equivalent. Two NCRs representing the surrogates separately would be erroneous by both Unicode/10646 definitions and XML definitions. -- John Cowan http://www.ccil.org/~cowan cowan@c... You tollerday donsk? N. You tolkatiff scowegian? Nn. You spigotty anglease? Nnn. You phonio saxo? Nnnn. Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|

Cart



