[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Unicode and XSL (was substring())
Paul Prescod wrote: > > David Carlisle wrote: > > > > Harder are characters out of the basic plane of unicode. These are a > > single character in XML eg accessed by a single Ӓ but since > > they don't fit into 16bits, they take up two slots when the unicode > > is encoded in utf-16. So the natural thing to do is to count these > > characters as single characters, but that means string indexing requires > > walking the string and thus proportional to the index rather than being > > a constant time array lookup. It also means that indexing and string > > length give different values if you use a `pure XML' approach or if > > you escape out to some language that treats strings as an array of 16bit > > quantities. > > Why are you worrying about the encoding? If your programming language is > broken in its handling of the platonic ideal concept of characters then > that is the XSL implementor's problem. There are ways of getting this > right: you can just use 32 bit characters or you can switch your character > width or iteration algoritm based on the actual contents of a string. This > isn't trivial but it is an implementor's problem and should not be > reflected in XSL. I basically agree with this. Counting characters by counting the the 16-bit quantities that encode characters in UTF-16 makes about as much sense as counting characters by counting the 8-bit quantities that encode characters in UTF-8 (which would mean for example that a dollar counts as one character, and a pound sterling sign counts as 2 charcaters). The counter-argument to this is that the DOM counts using UTF-16. I would respond by saying that the DOM is not counting characters but counting 16-bit quantities; there's nothing wrong with counting 16-bit quantities any more than there is with counting 8-bit quantities, it just isn't the same thing as counting characters. The XML Rec defines what a character is for XML and that is what we should count. James XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|