[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Unicode usage

Subject: Re: Unicode usage
From: "Jonathan Perret" <jonathan@xxxxxxxxxxxx>
Date: Fri, 25 Jan 2002 18:02:04 +0100
notepad binary
> I also loaded each result into Notepad on Win95.  Notepad displayed the
> file correctly, but not the utf-8 result (it showed that "A" character
> a little circle above it), ahead of the trademark symbol.  This is what I
> was suggesting would happen. BTW, Notepad on the Win2000 computer did
> display both results correctly.

I don't see what this proves that wasn't already obvious. Notepad on
Windows 95 supports only one encoding, which matches the installed
code page - that encoding is generally windows-1252 (what windows
calls 'ANSI' or even 'ASCII' -yuk!- sometimes) on an occidental version.
Feeding it utf-8 text, regardless of the actual codepoints used, is akin
to opening a Word document with it : though some text might appear
readable, the general result is garbage.

On Windows 2000, notepad has been upgraded to know about UTF-8,
so again it's no surprise that it can display the text correctly, given
that the file probably starts with a BOM mark, that signals it
as being utf-8 encoded. Note that without the BOM, Windows 2000
notepad would probably have 'failed' the same way as its Win95
cousin, since it would have assumed an ANSI-encoded file.

> Summarizing, what you will see displayed for high-order characters can
> depend on the encoding, OS,  and the viewing program.  On older versions
> Windows, at least, non-browsers are likely to display the wrong thing.

The fact is that what you will see is completely predictible (give or take
odd bug). If the viewing program is not told in what encoding the
text is, it will assume an encoding that will quite frequently be wrong.

In the notepad example, the OS itself has nothing to do with the issue :
notepad/Win95 and notepad/Win2000 are two very different programs.
If you were to take the win95 notepad binary and run it under Win2000,
it'd behave exactly the same as under win95. Why not try this ?

> In fact, even on my Win2000 machine, using XML Cooktop to run and display
> the transformation gave an incorrect display (and it uses the IE activeX
> control to display the results!), so you can't be sure even on Win2000
> high order characters will display the intended way, depending on the app.

If XML Cooktop (which I've never used) has the same bug as XML Spy,
then it has trouble with MSXML's transformNode method, which always
transforms to UTF-16 regardless of the <xsl:output> element. That would
cause what you've been seeing.


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.