[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: suppression of the transformation of character entities in
From: "S Woodside" <sbwoodside@y...> > Probably you are specifying the output to be encoded in UTF-8 or > something like that where the character is supported in the encoding. I don't think so. The data goes wrong coming into the XML processor. The character references are supposed to be for various kinds of quotes, but the numbers are not the Unicode Numbers. If the characters get though, it will only be by accident. If the output encoding is set to UTF-8, for example, then ’ will produce two bytes. (The case where it will *seem to* work is if the output encoding passes throught the C1 characters to the same bytes: for example a ISO8559-1 transcoder. Then if the output is then read using CP1252 the characters will come out.) Bad systems are easy. Fragile, slack, and out of control. Better to make the character reference be for the correct Unicode characters so that the XML coming in is correct. Then make sure the XML coming out is correct. Also avoid debugging character encodings of generated HTML using a browser: they can guess or do all sorts of things (depending on the generation, brand and settings): use any hex or text editor that lets you select encodings or which understands the XML encoding header. Using a browser to figure out what is happening with encodings is the surest road to insanity. To see what the character references should be, see http://www.alanwood.net/demos/ansi.html Instead of the numbers in the "ANSI" column, use the (decimal) numbers in the "Unicode" column. Cheers Rick Jelliffe
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|