[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Global parameters with UTF-8 characters and ???s

Subject: Re: Global parameters with UTF-8 characters and ???s <Disregard Previous>
From: David Carlisle <davidc@xxxxxxxxx>
Date: Thu, 3 Aug 2006 10:32:24 +0100
firefox utf 8
> Setting the output encoding to "US-ASCII" works.  I no longer see the
> question marks.
> 
> Is this the right solution?  Or does it just point out what the issue
> is?

Most (but not all) encodings used today encode 0-9, a-z, A-Z using the
same code points so if you ensure that your file only has these (and
some punctuation) then most of the time the file will work no matter
what encoding it is specified as having.  If I'm generating files that
other people may put on web servers I usually try to always use us-ascii
as the encoding (and if using xslt2 then omit the encoding declaration
so it will be taken as utf8 (which is also correct as ascii files are
also valid utf8 files). If your file uses non-ascii characters then you
need to declare the correct encoding. Most likely your files were in
utf8 but your web server was declaring tehm to be iso-8859-1 (you can
check that by looking in your browser (view/character encoding) in
firefox, something similar in IE. If the file is displaying incorrectly
but manually using the encoding menu to change the encoding makes the
fiel display correctly then it's almost certainly the fact that the
server is specifying the wrong encoding to the browser.(Most web servers
do _not_ look at the file to determine what encoding to specify, they
just use a site or directory default encoding for that file type).

specifying US-ASCII is a good solution for some kinds of files in some
work scenarios, but not all.

Advantages:
* when it works, it works, and is very simple to do.

Disadvantages,

* the encoding is rather inefficient, an e-acute is one byte in
 iso-8859-1, 2 bytes in utf8 but at least 6 (&#xe9;) bytes in
 us-ascii. So if your file is English with the occasional non-breaking
 space or currency symbol, it's not too bad, to have the occasional
 character encoded this way, but in some langauges your file is 5 or 6
 times larger

* You can not use the mechanism at all if non-ascii characters are used
  in places where the &# notation is not available, so if any such
  characters appear in comments, or in processing instructions, or in
  element or attribute names, this is not an option at all.

David

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.