[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Transforming ™ to ™


html trade
On 7/20/06, Chris Burdess <d09@h...> wrote:
> Sanjay Goel wrote:
> > ... if I put &#x2122; or if I define a entity, the output in html
> > is ™. So this html gets displayed differently on different
> > browsers. I need &trade; or &#x2122; in the final html so that the
> > browsers read it correctly.
>
> This may be because you specified "xml" as the XSL output method but
> serve the result as text/html. If you specify "html" as the output
> method the transformer should include a content type with a charset
> parameter in an http-equiv instruction in the generated HTML.
>
> Ensure that you are serving the result correctly, with a charset
> parameter the same as the charset you serialised the XSL result to.
> So if you serialised to UTF-8 and you are serving as text/html you
> should include the header
>
>    Content-Type: text/html; charset=UTF-8
>
> See http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.4.1
> for why you need to do this. The default for HTTP, without a charset
> parameter, is ISO-8859-1, but this encoding does not contain the
> trademark symbol and will therefore not work for you.

One thing to be aware of here is browsers auto-switching between
ISO-8859-1 and Windows 1252.  Although ISO-8859-1 doesn't contain the
TM chararacter, Windows 1252 does in the C1 control range at #153
(x99).

If a browser (html parser) is given a page apparently encoded using
ISO-8859-1 but contains characters in the C1 control range (such as
x99) it will auto-switch the read encoding to Windows 1252 and
automagically display the characters.  This ability to be "sloppy"
with the correct encoding and have the browser detect the one you
really meant doesn't follow with XML parsers, where the policy has
rightly shifted towards being strict.

So what does this mean?  Given the following page, where the meta
states the encoding is ISO-8859-1 but a C1 control character has been
used (#153):

<html xmlns="http://www.w3.org/1999/xhtml">
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
      <title>Encoding example</title>
   </head>
   <body>somebrand&#153;</body>
</html>

When served to an HTML parser the auto-switch of the read encoding
takes place to Windows-1252 and the TM character is displayed:

somebrand™

When the same file is served to an XML parser (which is what will
happen in an XHTML browser) the file is read using ISO-8859-1 and the
non-displayed C1 control character "Single Graphic Character
Introducer" is output (it's there, you just cant see it):

somebrand

I'm highlighting this here as it caught me - creating test files and
opening them in the browser was only compounding the issue because of
the silent auto-switching giving the impression everything was ok.  A
real pain.

cheers
andrew

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.