[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Unicode and XML (was Re: Remembering the origina
Previously swallowed by the xml-dev bit bucket: Tony Graham wrote at 17 Feb 2003 17:13:36 +0000: > Gavin Thomas Nicol wrote at 16 Feb 2003 14:16:23 -0500: > > On Sunday 16 February 2003 12:35 pm, Mike Champion wrote: > > > Stupid question: Why couldn't XML incorporate Unicode by reference rather > > > than spending half of the spec defining the "unicode-character apparatus"? > > > > There are a fair number of characters that really don't make much sense as > > markup... and XML 1.0 is pretty conservative, but generally sensible. At the > > time, there were no good guidelines from the Unicode consortium on what > > should/should not be allowed, which is something they have addressed > > recently. > > The Unicode Standard, Version 2.0, was published in 1996. Section > 5.14, Identifiers, contains guidelines for "the definition of > identifier syntax." > > Unicode 2.1, which was approved eight days after XML was approved, did > add the simplifying mapping of syntactic classes in the "Identifier" > section to the character classes in the Unicode Character Database > (UCD) but didn't change the substance of the guidelines. > > Section 5.16, Identifiers, of the Unicode Standard, Version 3.0, kept > the verbiage about what makes a good identifier, kept the mapping to > character classes, and dropped most of the syntatic classes, for no > real change in the guidelines. > > The Unicode Standard didn't, and still doesn't, proscribe the > identifier syntax because "each programming language standard has its > own identifier syntax". > > XML 1.0 was always going to have to define its identifier syntax, > i.e., its name characters, because XML allows ":", "_", "-", and "." > in names (whereas other, non-XML standards have their own lists of > extras). > > XML 1.0 names are mostly defined in terms of UCD character classes, > and the suggestions for XML 1.1 names are still mostly based on those > character classes. The hard work in defining XML 1.0 names would have > been resolving the inconsistencies in the Unicode Character Database, > both w.r.t. canonical equivalence (x0387) and because the UCD used to > contain a 'PropList.txt' file that was provided without explanation > and that sometimes contradicted the information in the main > 'UnicodeData.txt' file. > > (And the status of 'PropList.txt' is something that has been addressed > recently.) > > Regards, > > > Tony Graham > ------------------------------------------------------------------------ > XML Technology Center - Dublin > Sun Microsystems Ireland Ltd Phone: +353 1 8199708 > Hamilton House, East Point Business Park, Dublin 3 x(70)19708
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|