[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Character Entities: An XML Core WG View
Tim Bray wrote: > John Cowan wrote: > > >Why not Unicode.org? It could create short name "aliases" > > >of the long name descriptions. > > Are you really prepared to create short names No -- that's why I suggested Unicode.org. :) > > (other than ones involving hex digits) for all 95,156 > > characters in Unicode 3.2? Or even if we leave out the Han > > and Hangul characters, the 13,791 characters that are left? > > It is a biiiiiiiiiiiiiiiiiiiig job. > Yes, but it sure would be nice if it were done. If this were done, > I think that a lot of people would be willing to focus support > on this and nothing else. I wonder how much could be automated? > Hmm... -Tim None of the Latin, Greek, or Math used in today's markup should, IMO, be automated. Those should come from the XHTML, Docbook, MathML traditions, as "unified" by David C. & Co. As to the rest, the writing groups are, well, different -- especially as to case, letters, characters, vowel signs, intent, etc. A few random samples from UnicodeData.txt: BOX DRAWINGS RIGHT LIGHT AND LEFT VERTICAL HEAVY RECYCLING SYMBOL FOR TYPE-4 PLASTICS UPWARDS HARPOON WITH BARB LEFT BESIDE DOWNWARDS HARPOON WITH BARB RIGHT CYRILLIC CAPITAL LETTER GHE WITH UPTURN ARABIC LETTER DAL WITH DOT BELOW AND SMALL TAH ARABIC LIGATURE FEH WITH KHAH WITH MEEM INITIAL FORM SINHALA LETTER MAHAAPRAANA PAYANNA SINHALA VOWEL SIGN KOMBUVA HAA AELA-PILLA TIBETAN MARK GTER YIG MGO -UM GTER TSHEG MA TIBETAN SUBJOINED LETTER CHA TIBETAN VOWEL SIGN REVERSED II TIBETAN SIGN NYI ZLA NAA DA HANGUL CHOSEONG CEONGCHIEUMSSANGCIEUC HANGUL JUNGSEONG SSANGARAEA HANGUL LETTER KAPYEOUNSSANGPIEUP PARENTHESIZED HANGUL MIEUM A You might possibly automate *some* of it group by group. A lot of them don't seem to yield very well to "entification", automatic or otherwise. :) And the alternative underscore trick could cause too many to end it all with an &UPWARDS_HARPOON_WITH_BARB_LEFT_BESIDE_DOWNWARDS_HARPOON_WITH_BARB_RIGHT ;. It's probably best to start with a single unified western set from XHTML, Docbook, and MathML that people can bring in -- *if they desire* -- and ten years or so from now, we'll rarely need it (or any other entified Unicode) anyway. /Jelks
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|