[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: ANSI encoding

Subject: Re: ANSI encoding
From: "Christopher R. Maden" <crism@xxxxxxxxx>
Date: Thu, 23 May 2002 02:32:21 -0700
ansi encoding iso
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 14:41 22/5/02, Joel Konkle-Parker wrote:
>What's the <?xml version="1.0" encoding=""?> encoding="" string for
>ANSI?

You've already had a few answers addressing various facets of this; I hope 
this will also be useful.

ANSI, as Mike Kay pointed out, is a standards body.  Their best-known 
encoding is ASCII, whose identifier is "US-ASCII".  (The canonical charset 
name is "ANSI_X3.4-1968"; aliases include "ASCII" and "US-ASCII", which is 
preferred for MIME usage.)  ASCII is a 7-bit encoding, covering values from 
0 to 127; if you have any accented characters or other "weird" letters, you 
are not using ASCII.  Since ASCII is identical with UTF-8 for characters 
127 and below, and doesn't cover any other characters, you might as well 
leave the identifier out since UTF-8 is the default.

As others have mentioned, Windows sometimes calls its encoding 
"ANSI".  This is nonsensical, yet true.  If you are using a US or western 
European system, you are using Windows codepage 1252.  This is identical 
with the ISO western European encoding, ISO 8859-1, except for characters 
128-159 (which are control codes in ISO 8859-1 and are punctuation like the 
euro, ellipses, dagger, em dash, curly quotes in Windows CP 1252).  If you 
aren't using that middle range, use the label "ISO-8859-1"; if you are 
using that range, use the "windows-1252" label.  That's all if you're sure 
that you actually have an 8-bit encoding, and that the information hasn't 
been stored in UTF-8.  The easiest way to determine this is to open the 
document in a very stupid editor, or using "type" at the DOS prompt.  If 
your fancy schmancy euro-characters show up as single characters, it's an 
8-bit encoding; if they show up as sequences of multiple characters, 
usually starting with an accented A of some sort, then you're in UTF-8 and 
don't need a label.  If they show up as always two characters, the first of 
which is null, then it's UTF-16 and you still shouldn't need a label.

A complete list of IANA-registered identifiers can be found at <URL: 
http://www.iana.org/assignments/character-sets >.

[This is what happens when charset nerds drink too much espresso.]

~Chris

>-----BEGIN PGP SIGNATURE-----

P.S. Signing your message doesn't help when your public key isn't available 
from any of the usual places.
- -- 
Christopher R. Maden, Principal Consultant, crism consulting
DTDs/schemas - conversion - ebooks - publishing - Web - B2B - training
<URL: http://crism.maden.org/consulting/ >
PGP Fingerprint: BBA6 4085 DED0 E176 D6D4  5DFC AC52 F825 AFEC 58DA
-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Privacy 6.5.8

iQA/AwUBPOy3JaxS+CWv7FjaEQJy1QCbB1RoZtUWzQXVwDqBkopJ5jycg8YAmwdH
1NgVgikf5WevBGwg5AQmbnZn
=/+JM
-----END PGP SIGNATURE-----


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.