|
[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Unicode question
On Wed, May 23, 2018 at 05:41:44PM -0000, Erik Siegel erik@xxxxxxxxxxx scripsit: > I have a problem that is Unicode related. Some Unicode characters (for > instance emojis) can have some code *following* the actual character to > indicate a variant. For instance in the following stylesheet, the emoji > character in $x (U+1F61C) is followed by U+DE1C. When I look in oXygen it > shows me this. But when I run the stylesheet it reports a string length of > 1 and only a single codepoint. > > I suppose that is true, it is onlyB single character. But how can I find > out (in XPath) what the value of the second bcharacterb (indicator?) is? > Or is that impossible anyway? If I try to look up U+DE1C, I am informed that this is not a Unicode code point. It is the second half the UTF-16 surrogate pair -- D83D DE1C -- use to represent U+1F61C in UTF-16. (See <https://apps.timwhitlock.info/unicode/inspect?s=%F0%9F%98%9C> ) I would suppose that oXygen is showing you UTF-16 source but the processing is happening in UTF-8, where the emoji is a single code point and corresponding glyph. -- Graydon
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|

Cart








