[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Asian, UTF-8, markup, extensions and d-o-e

Subject: RE: Asian, UTF-8, markup, extensions and d-o-e
From: "Michael Kay" <michael.h.kay@xxxxxxxxxxxx>
Date: Fri, 31 May 2002 09:52:11 +0100
u0020
Sorry to drop the ball on this thread.

I've posted a reply on the Saxon forum at
https://sourceforge.net/forum/forum.php?thread_id=681805&forum_id=94027

The bottom line is that I can't reproduce the problem from the
information you've given me: it works for me. But I'm afraid I don't
really understand what your Java application is doing with these
HashMaps.

Michael Kay

> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx 
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of 
> Frikkie Swardt
> Sent: 30 May 2002 21:58
> To: XSL-List@xxxxxxxxxxxxxxxxxxxxxx
> Subject:  Asian, UTF-8, markup, extensions and d-o-e
> 
> 
> 
> 
> This was posted at Sourceforge, Saxon. I got one reply but 
> none since May 22. I'm hoping someone on this list may be 
> able to assist.
> 
> We are using Saxon 6.5 (I tried with 6.5.2; same results)
> I am trying to display chinese(and others) with HTML markup. 
> The text gets loaded in a HashMap The text contains html 
> markup (break, color, class etc) It appears the 
> disable-output-escaping="yes" has no affect on the "<" and 
> ">" when there is unicode with a value above 255 in the text.
> 
> sample HashMap for en:
> label.test1=Simplified
> label.test2=Traditional
> label.test3=Accommodation
> label.test4=Thank you for using <i>Our Website</i>
> 
> sample HashMap for zh_CN:
> label.test1=\u7b80\u5316
> label.test2=\u4f20\u7edf
> label.test3=\u4F4F\u5BBF 
> label.test4=\u611F\u8C22\u60A8\u4F7F\u7528 <i>Our Website</i>\u3002
> 
> output statement:
> <xsl:output method="html" indent="no" encoding="iso-8859-1" 
> saxon:character-representation="entity;entity" /> native, 
> entity, decimal or hex produce the same results on markup text.
> 
> We call a custom extension (not saxon extension) to get the 
> text: <xsl:value-of disable-output-escaping="yes" 
> select="java:getMessage($vtExtension,$locale,string('label.test4'))"/>
> 
> On label.test4 I expected to see Our Website in italics, but 
> instead I saw the markup. It never works without 
> disable-output-escaping="yes" It only shows the markup if the 
> text contains unicode for characters with values higher than 
> 255. (non-ASCII)
> 
> So, I'm looking for a solution where I can use both the 
> unicode and markup, and still use the java extension to read 
> the HashMap.
> 
> some other results:
> 
> (snapshots at http://frik.50megs.com/xsl/thetext.jpg and
> http://frik.50megs.com/xsl/theresult.jpg)
> Text:
> test01=nothing funny <i>Our Website</i>
> test02=nothing funny <i>Our Website</i>
> test03=something funny <i>Our Website</i> with unicode: 
> \u7b80\u5316 test04=something funny <i>Our Website</i> with 
> unicode: \u7b80\u5316 test05=with amper lt and gt 
> &lt;i&gt;Our Website&lt;/i&gt; with unicode: \u7b80\u5316 
> test06=with amper lt and gt &lt;i&gt;Our Website&lt;/i&gt; 
> with unicode: \u7b80\u5316 test07=with unicode for lt and gt 
> \u003ci\u003eOur Website\u003c/i\u003e with unicode: \u7b80 
> \u5316 test08=with unicode for lt and gt \u003ci\u003eOur 
> Website\u003c/i\u003e with unicode: \u7b80 \u5316 test09=with 
> unicode for lt and gt \u003ci\u003eOur Website\u003c/i\u003e 
> with no other unicode test10=with unicode for lt and gt 
> \u003ci\u003eOur Website\u003c/i\u003e with no other unicode 
> test11=\u0041\u006C\u006C\u0020\u0069\u006E\u0020\u0055\u006E\
> u0069\u0063\u006F\u0064\u0065\u0020\u003C\u0069\u003E\u0020\u0
> 04F\u0075\u0072\u0020\u0057\u0065\u0062\u0073\u0069\u0074\u006
> 5\u0020\u003C\u002F\u0069\u003E\u0020\u7b80\u5316
> 
> test12=\u0041\u006C\u006C\u0020\u0069\u006E\u0020\u0055\u006E\
> u0069\u0063\u006F\u0064\u0065\u0020\u003C\u0069\u003E\u0020\u0
> 04F\u0075\u0072\u0020\u0057\u0065\u0062\u0073\u0069\u0074\u006
> 5\u0020\u003C\u002F\u0069\u003E\u0020\u7b80\u5316
> 
> test13=\u0041\u006C\u006C\u0020\u0069\u006E\u0020\u0055\u006E\
> u0069\u0063\u006F\u0064\u0065\u0020\u003C\u0069\u003E\u0020\u0
> 04F\u0075\u0072\u0020\u0057\u0065\u0062\u0073\u0069\u0074\u006
> 5\u0020\u003C\u002F\u0069\u003E\u0020
> 
> test14=\u0041\u006C\u006C\u0020\u0069\u006E\u0020\u0055\u006E\
> u0069\u0063\u006F\u0064\u0065\u0020\u003C\u0069\u003E\u0020\u0
> 04F\u0075\u0072\u0020\u0057\u0065\u0062\u0073\u0069\u0074\u006
> 5\u0020\u003C\u002F\u0069\u003E\u0020
> 
> test15=electrónico
> test16=electr&oacute;nico
> test17=electrónico<i>test17</i> test18=electr&oacute;nico<i>test18</i>
> test19=\u611F\u8C22\u60A8\u4F7F\u7528 <i>Our Website</i>\u3002
> 
> 
> Result: (yes/no refers to disable-output-escaping)
> test01 yes = nothing funny Our Website
> test02 no = nothing funny <i>Our Website</i>
> test03 yes = something funny <i>Our Website</i> with unicode: 
> ?? test04 no = something funny <i>Our Website</i> with 
> unicode: ?? test05 yes = with amper lt and gt &lt;i&gt;Our 
> Website&lt;/i&gt; with
> unicode: ??
> test06 no = with amper lt and gt &lt;i&gt;Our Website&lt;/i&gt; with
> unicode: ??
> test07 yes = with unicode for lt and gt <i>Our Website</i> 
> with unicode: ? ? test08 no = with unicode for lt and gt 
> <i>Our Website</i> with unicode: ? ? test09 yes = with 
> unicode for lt and gt Our Website with no other unicode 
> test10 no = with unicode for lt and gt <i>Our Website</i> 
> with no other unicode test11 yes = All in Unicode <i> Our 
> Website </i> ?? test12 no = All in Unicode <i> Our Website 
> </i> ?? test13 yes below 255 = All in Unicode Our Website 
> test14 no below 255 = All in Unicode <i> Our Website </i> 
> test15 yes = electrónico test15 no = electrónico test16 yes = 
> electrónico test16 no = electr&oacute;nico test17 yes = 
> electrónicotest17 test17 no = electrónico<i>test17</i> test18 
> yes = electrónicotest18 test18 no = 
> electr&oacute;nico<i>test18</i> test19 no = ????? <i>Our 
> Website</i>? test19 yes = ????? <i>Our Website</i>?
> 
> 
> 
> 
> Michael Kay stated:
> The XSLT spec says that it is an error to output a character 
> not available in the chosen encoding with 
> disable-output-escaping="yes". The processor is allowed to 
> signal the error, or to recover by ignoring the d-o-e="yes" 
> attribute. You are using encoding="iso-8859-1", therefore 
> outputting characters above 256 is only possible by using 
> character references. If you use encoding="utf-8", it should 
> work fine.
> 
> So I tried what Michael suggested, but it produces a 
> different result, still undesireable. When using 
> encoding="UTF-8" , the markup works with d-o-e="yes", but 
> then the asian characters comes in different. They come in as 
> single characters, and from what I could see (viewed with a 
> hex viewer) is that it drops the first byte. Example (test3/4):
> characters: \u7b80\u5316
> with UTF-8 and d-o-e="yes", I get x'8016' (non-displayable)
> I tried with saxon:character-representation as native, 
> entity, hex and decimal. All have the same results.
> 
> 
> snapshots at:
> http://frik.50megs.com/xsl/theresultutf8.jpg
> http://frik.50megs.com/xsl/viewsource.jpg
> 
> 
> 
> Thanks for any light you can put on this subject.
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.