[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Unicode and XSL (was substring())

Subject: Re: Unicode and XSL (was substring())
From: James Clark <jjc@xxxxxxxxxx>
Date: Sun, 06 Jun 1999 11:50:54 +0700
james clark unicode
David Carlisle wrote:
> 
> combining characters are not necessarily the main problem.
> I'd argue that they ought to count as separate characters as that is
> what they are in the character data of the XMl spec.

The problem is that many characters can be represented in Unicode both

- as a base character and one or more combining characters
- as a single precomposed character

Is "a acute" one character or two? This problem is particularily severe
when documents are using a legacy encoding (ie not one based on UCS).
When converting to Unicode, which of the alternative methods for
representing a character in Unicode should a converter choose?

There are two issues

(a) How do you define a canonical form so that there's a single answer
to questions like this?

(b) Where does the canonicalization happen?

Historically the answer to (a) has been that you canonicalize by
decomposing precomposed characters into their base+combining form.  More
recently it has been proposed that canonicalization should compose
base+combining combinations wherever there is a precomposed combination
available in a particular version of Unicode (probably 3.0).

For (b) the problem is that canonicalization is quite an expensive,
complex process.  The cost of requiring all Web clients (including very
lightweight clients like mobile phones and PDAs) always to canonicalize
data themselves would be prohibitive. So the current proposal is that
all data gets canonicalized as early as possible, ideally when it is
produced but in any case before it is sent over the Web.

There is another significant problem that I haven't touched on which is
compatibility characters.

See:

  http://www.w3.org/TR/WD-charmod
  http://www.unicode.org/unicode/reports/tr15/tr15-10.html

for more background.

James


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.