|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: UTF-8 use with XML
From: "Long, Craig Z" <craig.long@e...> > One of the engineers here translates the hex as: <BirthCity>Koln</BirthCity> > is this correct? When looking at UTF-8 codes, there are a few easy rules you can apply for ASCII: 1) All ASCII characters (i.e. the characters on a US keyboard) are represented by the same bytes in UTF-8 as in ASCII. So an ASCII string has exactly the same bytes if it is UTF-8. 2) Moreover, there is only one way of coding those ASCII characters. So < does not have two different encodings, one with three bytes and one with just a single byte. * 3) Every byte that is less than 0x80 is the ASCII character. Multi-byte code sequences have all their codes >= 0x80. So three bytes all greater than 0xFF are not <. Now it is also a little strange that the example given is Koln, not Köln. Has the data been transliterated (i.e. to remove umlauts)? If so, that is the stage that may have inroduced some problems. (I would have expected the transliteration for Köln to be Koeln, if that is the German city.) Cheers Rick Jelliffe * (However, there could be other, non-ASCII characters which look similar. And there is also a really odd thing called "normalization" which may have some impact too, but probably not here.)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








