[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Blueberry/Unicode/XML
> However, I presume there was a good reason why the current name character > scheme was implemented. The reasons I can think of are easily dismissed or > dealt with. Are there any other more serious implications? Yes. 0) To catch markup errors. If someone is using a symbol, they must think they are in content, marked section, attribute value, comment or a PI. This way, the error will be detected closer to its source. 1) Catch transcoding problems. Japanese experts report this is good. 2) Promote readable (and read-aloud-able) markup. Symbols are not appropriate for markup characters. (If you want to use symbols for markup characters, use ISO SGML's short references or Simon St.Laurent's Regular Fragmentations.) 3) Minimize the chances of delimiter clashing in languages (hence no ' in names even if it would be useful for French [' the ASCII character that is]) 4) Prevent the use of symbols which also have similar glyphs that are letters. For example, the maths x or the maths alpha are symbols, and they have distinct codes from the letters. If you used the maths x for whatever reason, the document might be visually correct on inspection, but be coded wrong: your name comparisons would not work. 5) There is every chance that people will only use their native scripts when the document is used regionally or locally. Prudence dictates it. A Cyrillic user can assume that other readers of the document have Cyrillic alphabet fonts, but she cannot assume which symbols may be used. 6) There are several function characters and characters which are not suitable for markup: for example the new language-tagging characters from plane 17 and perhaps the BIDI characters even. (See Unicode TR 20.) If we keep them out, we prevent people trying to do strange things; people will always insist that there is no intent behind a technology and that they can get by just on the mechanism, so the mechanism needs to hard-code the intent. 7) Because it is not really very expensive to implement. But just allowing any surrogate without nitpicking is fine. But I agree with James Clark that perhaps name checking (and normalization) should be some kind of different layer to WF ultimately. (I note that James has been thinking about this issue almost as long as anyone: he implemented native-language markup capabilities in SP early on, probably the first person to release a markup-based application that treated other scripts with equity.) Cheers Rick Jelliffe
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|