[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Unicode and XSL (was substring())
David Carlisle wrote: > > combining characters are not necessarily the main problem. > I'd argue that they ought to count as separate characters as that is > what they are in the character data of the XMl spec. The problem is that many characters can be represented in Unicode both - as a base character and one or more combining characters - as a single precomposed character Is "a acute" one character or two? This problem is particularily severe when documents are using a legacy encoding (ie not one based on UCS). When converting to Unicode, which of the alternative methods for representing a character in Unicode should a converter choose? There are two issues (a) How do you define a canonical form so that there's a single answer to questions like this? (b) Where does the canonicalization happen? Historically the answer to (a) has been that you canonicalize by decomposing precomposed characters into their base+combining form. More recently it has been proposed that canonicalization should compose base+combining combinations wherever there is a precomposed combination available in a particular version of Unicode (probably 3.0). For (b) the problem is that canonicalization is quite an expensive, complex process. The cost of requiring all Web clients (including very lightweight clients like mobile phones and PDAs) always to canonicalize data themselves would be prohibitive. So the current proposal is that all data gets canonicalized as early as possible, ideally when it is produced but in any case before it is sent over the Web. There is another significant problem that I haven't touched on which is compatibility characters. See: http://www.w3.org/TR/WD-charmod http://www.unicode.org/unicode/reports/tr15/tr15-10.html for more background. James XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|