[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: 8bit ascii encoding

Subject: Re: 8bit ascii encoding
From: "Thomas B. Passin" <tpassin@xxxxxxxxxxxx>
Date: Fri, 23 Aug 2002 11:40:29 -0400
xslt ascii
[Andrew Welch]
[[
> The <meta> element has to specify the encoding that the document is
> in. If you change the specified encoding without changing the encoding
> then you are dead.

ha.. nice.  After some testing it seems that char references display
fine, while characters themselves do not (after the utf-8 string has
gone through the ActiveX).  This is because the refs are simply a string
of single byte ascii chars that get converted by IE, while the unicode
chars are multi-byte and therefore are being displayed as two single
chars.  I think the reason IE isn't picking up that each char is two
bytes (utf-8) is because the BOM is getting overwritten/messed up by the
ActiveX - we can't simply write one in either because one of the
restrictions when hosting IE is that you can only write to the <body>
element (unless anyone can tell me different...)
]]

Let me try to summarize a few of the points that have been made and maybe
shed some more light ...

1) The encoding in the actual output document must agree with its
declaration, which in an HTML document resides in a META element, or in an
xml document resides in the xml declaration at the start of a document.  If
there is a disagreement, you may get display problems or even illegal
character errors.  Without a META element, the browser will assume that
document is encoding according to its settings (iso-8859-1 or whatever).

2) Even with the right encoding declared, the browser may not be able to
display a given character depending on the browser settings or code page it
is set to use.

3) If your pre-activeX processing silently changes the encoding but does not
change the encoding declaration to match (or if there is no encoding
declaration), you may not be able to recover and get correct operation.

4) Most of Microsoft's processing is said to use UTF-16 internally.
Different output arrangements either do or do not respect requests to change
the encoding to something else.  I do not know the details, but I think they
are on some of the FAQ sites.  Especially, as I recall, streams remain in
UTF-16.  There are ways to get a non-UTF-16 output, and you probably need to
look them up.

5) If the actual encoding is UTF-16 and the byte order mark gets removed or
never inserted, you will have problems.

6) If the final encoding is declared correctly and has a correct BOM but
does not match the browser's capabilties, you may be able to simply run an
identity transformation using xslt that only changes the  encoding (e.g., to
iso-8859-1).

Cheers,

Tom P


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.