Re: Problems with characters
Ragulf Pickaxe wrote: > I have a problem with characters using characterset 8859-1. You have more problems than that :) > Rather than displaying the Danish characters of ?,? and ? the presentation > is the following: > ? instead of ? > ? instead of ? > ? is depicted as ? As you can see, my email software did not like the fact that your email contained bytes outside the ASCII range (ASCII = 00-7F, and that's being generous) and that your email failed to declare what character set to use when interpreting these bytes. Let's look at your email with a hex editor: 72 73 65 74 20 38 38 35 39 2d 31 2e 0a 0a 52 61 |rset 8859-1...Ra| 74 68 65 72 20 74 68 61 6e 20 64 69 73 70 6c 61 |ther than displa| 79 69 6e 67 20 74 68 65 20 44 61 6e 69 73 68 20 |ying the Danish | 63 68 61 72 61 63 74 65 72 73 20 6f 66 20 bf 2c |characters of ¿,| b8 20 61 6e 64 20 e5 20 74 68 65 20 70 72 65 73 |¸ and å the pres| 65 6e 74 61 74 69 6f 6e 20 0a 69 73 20 74 68 65 |entation .is the| 20 66 6f 6c 6c 6f 77 69 6e 67 3a 0a e6 20 69 6e | following:.æ in| 73 74 65 61 64 20 6f 66 20 bf 0a f8 20 69 6e 73 |stead of ¿.ø ins| 74 65 61 64 20 6f 66 20 b8 0a e5 20 69 73 20 64 |tead of ¸.å is d| 65 70 69 63 74 65 64 20 61 73 20 e5 0a 0a 49 73 |epicted as å..Is| OK, on the left are the hex notations for the bytes, and on the right are the raw bytes. On the fourth line, the 2nd-to-last byte is BF, which on my terminal looks like an upside-down question mark. I happen to know that in iso-8859-1, the upside-down question mark is byte A1, so we can safely assume that what I see and what you see may very well be two completely different things :) Therefore, I cannot even begin to answer your questions, because I have no idea what characters you think you were typing in your email. If you go to http://www.eki.ee/letter/chardata.cgi?ucode=00a0-00ff you will probably find the info you seek, and you will also find the official Unicode names for these characters (e.g. "LATIN CAPITAL LETTER A WITH RING ABOVE") and their Unicode code points (e.g. "00C5", which would be written "U-000000C5" or if you say "U+00C5" it's not completely accurate but people will know what you mean), either of which will help you effectively communicate what characters you are talking about. > I get my data from an SQL-database and transform it twice, both with > <?xml version="1.0" encoding="ISO-8859-1"?> Well.. that's not saying much. The encoding declaration in an XML document is saying "the bytes in this document map to Unicode characters according to the iso-8859-1 character map". It is expected to be a truthful assertion, and is only for the XML parser's benefit. You need more info about exactly what bytes are going into the database, what bytes are coming out, and then what you're doing with them after that. There are many possible points of failure, and I suspect you may be corrupting or losing encoding information for what goes into your database in the first place... > I suspect it is the conversion of data from SQL database characterset to > output/stylesheet characterset, but I don't know what to do about it. You should educate yourself about encoding issues, and then trace your character data through its entire lifetime from its creation to its storage to its transmission and interpretation... every step of the way introduces the possibility of confusion with respect to encoding. Good luck. http://skew.org/xml/tutorial/ explains encoding w.r.t. XML http://skew.org/xml/links/ has many good encoding related links - Mike ____________________________________________________________________________ mike j. brown, fourthought.com | xml/xslt: http://skew.org/xml/ denver/boulder, colorado, usa | personal: http://hyperreal.org/~mike/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format