[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: using xsl:message with UTF-8 characters
On 4/23/07, Abel Braaksma <abel.online@xxxxxxxxx> wrote:
When the Regional settings are set to US or some Western European country, the codepage will default to CP1252 (windows-1252) (which is, like I said, incompatible with the codepage for the console, giving the weird characters in the U+0127+ range).
Apparently Microsoft decided to wedge more characters into the 8-bit range by replacing characters in the C0 and C1 ranges with more useful characters, which seems fair enough, but this is the only encoding (afaik) which remaps these two ranges. The problem arises when you save any file without being explicit with the encoding, and reading back in any other encoding. This happens a lot (in Windows) when you save an XML file with a non-xml-aware editor (say notepad), and then open it in an XML aware editor. The file will be saved in CP1252, and with characters like "en dash" and "em dash" being saved as #150 and #151 instead of #8211 and #8212 respectively. So when you open the file in using an XML aware editor it reads the xml prolog and reads the file in say, UTF-8, and you get non-printable characters instead of the dashes... which can be represented as either a box or a question mark depending on (...I'm not sure what that depends on actually). To compound the issue, if your XML is specified as IS0-8859-1 in the prolog, some MS tools will read the characters in the control ranges and auto-switch the encoding to CP1252, giving the impression everything is fine. The simple rule is, always read and write using the same encoding, and be aware when something is converting between characters and bytes behind the scenes - servlets for example. Make sure the font you're viewing the result in contains the glyphs for the characters you're trying to view (helpfully the no-glyh character is often the same box or question mark used to mean no-mapping in the encoding...requiring a hex editor to check the underlying bytes), and be certain the viewer is showing the result in the right encoding (the cmd window here, or say the Eclipse output window is another notorious spot)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|