[Home] [By Thread] [By Date] [Recent Entries]
From: Baden Hughes <bmhughes@o...> >I know that XML 1.0 allows you to use 'special' characters as included in >the Unicode 2.0 specification. With the upcoming release of Unicode 3.0 how >will we be able to refer to characters in 3.0 which were not in 2.0 ? The >same way (meaning the actual version of Unicode spec is irrelevant as long >as the method used is included in XML) or some new way ? > >For instance, the Sinhala character set was not in Unicode 2.0 but will be >in 3.0. How do I get one of those characters in an XML document ? Or is that >inconsequential to the document per se as it is simply a reference and its >really up to the application to render it correctly ? The document character set of XML is ISO 10646, as used by the Unicode Consortium's character set Unicode. I think most people's strong expectation is that XML will track ISO 10646, just as Unicode tracks it. In fact, I think it is essential that XML automatically tracks ISO 10646: people will always try to do strange and interesting things with characters and codes, and XML should try to allow as much freedom for them to do this as possible. Developers should be very wary of putting type-checking into their systems which will cause future legitimate ISO 10646 to fail. For example, when a new character is invented, like the Euro, the only difficulty it should cause is if the font is not upgraded or if the sort/type system doesnt allow new character registration. We certainly need to abandon the expectation the number of characters is fixed or knowable, which is how some might interpret material from Unicode Consortium: a character set standard tries to put in what is generally useful against some criteria--if your criteria do not match, then you easily legitimately decide that your character is not found in the set: is Apple's "apple" character a real character? are variant kanji characters real characters? are roman, fraktur, italic and uncial "a" characters different? Is English "W" a different character (i.e., "UU") from German "W" (i.e. "VV"), when using historical material? In my book I use a dinosaur glyph as a word have liked to have put it in the index too: why is it not a character? Such questions can never be resolved, but a character set must make a decision based on some selection criteria; and those criteria will not be appropriate in every situation. The nice thing about markup is it lets us simulate the existance of a character missing from a character set: however, we have no markup conventions yet to do this systematically. There are no standard methods for saying "when you find 'a' in this context, collate it differently" for example (apart from, perhaps, language-tagged elements). Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|

Cart



