[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Asian, UTF-8, markup, extensions and d-o-e

Subject: Asian, UTF-8, markup, extensions and d-o-e
From: Frikkie Swardt <Frikkie.Swardt@xxxxxxxxx>
Date: Thu, 30 May 2002 15:58:12 -0500
u0020

This was posted at Sourceforge, Saxon. I got one reply but none since
May 22. I'm hoping someone on this list may be able to assist.

We are using Saxon 6.5 (I tried with 6.5.2; same results)
I am trying to display chinese(and others) with HTML markup.
The text gets loaded in a HashMap
The text contains html markup (break, color, class etc)
It appears the disable-output-escaping="yes" has no affect on the "<"
and ">" when there is unicode with a value above 255 in the text.

sample HashMap for en:
label.test1=Simplified
label.test2=Traditional
label.test3=Accommodation
label.test4=Thank you for using <i>Our Website</i>

sample HashMap for zh_CN:
label.test1=\u7b80\u5316
label.test2=\u4f20\u7edf
label.test3=\u4F4F\u5BBF
label.test4=\u611F\u8C22\u60A8\u4F7F\u7528 <i>Our Website</i>\u3002

output statement:
<xsl:output method="html" indent="no" encoding="iso-8859-1"
saxon:character-representation="entity;entity" />
native, entity, decimal or hex produce the same results on markup text.

We call a custom extension (not saxon extension) to get the text:
<xsl:value-of disable-output-escaping="yes"
select="java:getMessage($vtExtension,$locale,string('label.test4'))"/>

On label.test4 I expected to see Our Website in italics, but instead I
saw the markup.
It never works without disable-output-escaping="yes"
It only shows the markup if the text contains unicode for characters
with values higher than 255. (non-ASCII)

So, I'm looking for a solution where I can use both the unicode and
markup, and still use the java extension to read the HashMap.

some other results:

(snapshots at http://frik.50megs.com/xsl/thetext.jpg and
http://frik.50megs.com/xsl/theresult.jpg)
Text:
test01=nothing funny <i>Our Website</i>
test02=nothing funny <i>Our Website</i>
test03=something funny <i>Our Website</i> with unicode: \u7b80\u5316
test04=something funny <i>Our Website</i> with unicode: \u7b80\u5316
test05=with amper lt and gt &lt;i&gt;Our Website&lt;/i&gt; with unicode:
\u7b80\u5316
test06=with amper lt and gt &lt;i&gt;Our Website&lt;/i&gt; with unicode:
\u7b80\u5316
test07=with unicode for lt and gt \u003ci\u003eOur Website\u003c/i\u003e
with unicode: \u7b80 \u5316
test08=with unicode for lt and gt \u003ci\u003eOur Website\u003c/i\u003e
with unicode: \u7b80 \u5316
test09=with unicode for lt and gt \u003ci\u003eOur Website\u003c/i\u003e
with no other unicode
test10=with unicode for lt and gt \u003ci\u003eOur Website\u003c/i\u003e
with no other unicode
test11=\u0041\u006C\u006C\u0020\u0069\u006E\u0020\u0055\u006E\u0069\u0063\u006F\u0064\u0065\u0020\u003C\u0069\u003E\u0020\u004F\u0075\u0072\u0020\u0057\u0065\u0062\u0073\u0069\u0074\u0065\u0020\u003C\u002F\u0069\u003E\u0020\u7b80\u5316

test12=\u0041\u006C\u006C\u0020\u0069\u006E\u0020\u0055\u006E\u0069\u0063\u006F\u0064\u0065\u0020\u003C\u0069\u003E\u0020\u004F\u0075\u0072\u0020\u0057\u0065\u0062\u0073\u0069\u0074\u0065\u0020\u003C\u002F\u0069\u003E\u0020\u7b80\u5316

test13=\u0041\u006C\u006C\u0020\u0069\u006E\u0020\u0055\u006E\u0069\u0063\u006F\u0064\u0065\u0020\u003C\u0069\u003E\u0020\u004F\u0075\u0072\u0020\u0057\u0065\u0062\u0073\u0069\u0074\u0065\u0020\u003C\u002F\u0069\u003E\u0020

test14=\u0041\u006C\u006C\u0020\u0069\u006E\u0020\u0055\u006E\u0069\u0063\u006F\u0064\u0065\u0020\u003C\u0069\u003E\u0020\u004F\u0075\u0072\u0020\u0057\u0065\u0062\u0073\u0069\u0074\u0065\u0020\u003C\u002F\u0069\u003E\u0020

test15=electrónico
test16=electr&oacute;nico
test17=electrónico<i>test17</i>
test18=electr&oacute;nico<i>test18</i>
test19=\u611F\u8C22\u60A8\u4F7F\u7528 <i>Our Website</i>\u3002


Result: (yes/no refers to disable-output-escaping)
test01 yes = nothing funny Our Website
test02 no = nothing funny <i>Our Website</i>
test03 yes = something funny <i>Our Website</i> with unicode: ??
test04 no = something funny <i>Our Website</i> with unicode: ??
test05 yes = with amper lt and gt &lt;i&gt;Our Website&lt;/i&gt; with
unicode: ??
test06 no = with amper lt and gt &lt;i&gt;Our Website&lt;/i&gt; with
unicode: ??
test07 yes = with unicode for lt and gt <i>Our Website</i> with unicode:
? ?
test08 no = with unicode for lt and gt <i>Our Website</i> with unicode:
? ?
test09 yes = with unicode for lt and gt Our Website with no other
unicode
test10 no = with unicode for lt and gt <i>Our Website</i> with no other
unicode
test11 yes = All in Unicode <i> Our Website </i> ??
test12 no = All in Unicode <i> Our Website </i> ??
test13 yes below 255 = All in Unicode Our Website
test14 no below 255 = All in Unicode <i> Our Website </i>
test15 yes = electrónico
test15 no = electrónico
test16 yes = electrónico
test16 no = electr&oacute;nico
test17 yes = electrónicotest17
test17 no = electrónico<i>test17</i>
test18 yes = electrónicotest18
test18 no = electr&oacute;nico<i>test18</i>
test19 no = ????? <i>Our Website</i>?
test19 yes = ????? <i>Our Website</i>?




Michael Kay stated:
The XSLT spec says that it is an error to output a character not
available in the chosen encoding with disable-output-escaping="yes". The
processor is allowed to signal the error, or to recover by ignoring the
d-o-e="yes" attribute. You are using encoding="iso-8859-1", therefore
outputting characters above 256 is only possible by using character
references. If you use encoding="utf-8", it should work fine.

So I tried what Michael suggested, but it produces a different result,
still undesireable.
When using encoding="UTF-8" , the markup works with d-o-e="yes", but
then the asian characters comes in different.
They come in as single characters, and from what I could see (viewed
with a hex viewer) is that it drops the first byte.
Example (test3/4):
characters: \u7b80\u5316
with UTF-8 and d-o-e="yes", I get x'8016' (non-displayable)
I tried with saxon:character-representation as native, entity, hex and
decimal.
All have the same results.


snapshots at:
http://frik.50megs.com/xsl/theresultutf8.jpg
http://frik.50megs.com/xsl/viewsource.jpg



Thanks for any light you can put on this subject.

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.