[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Transforming ™ to ™
On 7/20/06, Chris Burdess <d09@h...> wrote: > Sanjay Goel wrote: > > ... if I put ™ or if I define a entity, the output in html > > is ™. So this html gets displayed differently on different > > browsers. I need ™ or ™ in the final html so that the > > browsers read it correctly. > > This may be because you specified "xml" as the XSL output method but > serve the result as text/html. If you specify "html" as the output > method the transformer should include a content type with a charset > parameter in an http-equiv instruction in the generated HTML. > > Ensure that you are serving the result correctly, with a charset > parameter the same as the charset you serialised the XSL result to. > So if you serialised to UTF-8 and you are serving as text/html you > should include the header > > Content-Type: text/html; charset=UTF-8 > > See http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.4.1 > for why you need to do this. The default for HTTP, without a charset > parameter, is ISO-8859-1, but this encoding does not contain the > trademark symbol and will therefore not work for you. One thing to be aware of here is browsers auto-switching between ISO-8859-1 and Windows 1252. Although ISO-8859-1 doesn't contain the TM chararacter, Windows 1252 does in the C1 control range at #153 (x99). If a browser (html parser) is given a page apparently encoded using ISO-8859-1 but contains characters in the C1 control range (such as x99) it will auto-switch the read encoding to Windows 1252 and automagically display the characters. This ability to be "sloppy" with the correct encoding and have the browser detect the one you really meant doesn't follow with XML parsers, where the policy has rightly shifted towards being strict. So what does this mean? Given the following page, where the meta states the encoding is ISO-8859-1 but a C1 control character has been used (#153): <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/> <title>Encoding example</title> </head> <body>somebrand™</body> </html> When served to an HTML parser the auto-switch of the read encoding takes place to Windows-1252 and the TM character is displayed: somebrand™ When the same file is served to an XML parser (which is what will happen in an XHTML browser) the file is read using ISO-8859-1 and the non-displayed C1 control character "Single Graphic Character Introducer" is output (it's there, you just cant see it): somebrand I'm highlighting this here as it caught me - creating test files and opening them in the browser was only compounding the issue because of the silent auto-switching giving the impression everything was ok. A real pain. cheers andrew
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|