Re: 8bit ascii encoding
[Andrew Welch] [[ > The <meta> element has to specify the encoding that the document is > in. If you change the specified encoding without changing the encoding > then you are dead. ha.. nice. After some testing it seems that char references display fine, while characters themselves do not (after the utf-8 string has gone through the ActiveX). This is because the refs are simply a string of single byte ascii chars that get converted by IE, while the unicode chars are multi-byte and therefore are being displayed as two single chars. I think the reason IE isn't picking up that each char is two bytes (utf-8) is because the BOM is getting overwritten/messed up by the ActiveX - we can't simply write one in either because one of the restrictions when hosting IE is that you can only write to the <body> element (unless anyone can tell me different...) ]] Let me try to summarize a few of the points that have been made and maybe shed some more light ... 1) The encoding in the actual output document must agree with its declaration, which in an HTML document resides in a META element, or in an xml document resides in the xml declaration at the start of a document. If there is a disagreement, you may get display problems or even illegal character errors. Without a META element, the browser will assume that document is encoding according to its settings (iso-8859-1 or whatever). 2) Even with the right encoding declared, the browser may not be able to display a given character depending on the browser settings or code page it is set to use. 3) If your pre-activeX processing silently changes the encoding but does not change the encoding declaration to match (or if there is no encoding declaration), you may not be able to recover and get correct operation. 4) Most of Microsoft's processing is said to use UTF-16 internally. Different output arrangements either do or do not respect requests to change the encoding to something else. I do not know the details, but I think they are on some of the FAQ sites. Especially, as I recall, streams remain in UTF-16. There are ways to get a non-UTF-16 output, and you probably need to look them up. 5) If the actual encoding is UTF-16 and the byte order mark gets removed or never inserted, you will have problems. 6) If the final encoding is declared correctly and has a correct BOM but does not match the browser's capabilties, you may be able to simply run an identity transformation using xslt that only changes the encoding (e.g., to iso-8859-1). Cheers, Tom P XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format