[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: UTF-8 use with XML


utf 8 codes
From: "Long, Craig Z" <craig.long@e...>

> One of the engineers here translates the hex as: <BirthCity>Koln</BirthCity>
> is this correct? 

When looking at UTF-8 codes, there are a few easy rules you can apply for ASCII:

1) All ASCII characters (i.e. the characters on a US keyboard) are represented
by the same bytes in UTF-8 as in ASCII.  So an ASCII string has exactly the same
bytes if it is UTF-8.  

2) Moreover, there is only one way of coding those ASCII characters. So < does
not have two different encodings, one with three bytes and one with just a single
byte. *

3) Every byte that is less than 0x80 is the ASCII character. Multi-byte code
sequences have all their codes >= 0x80.  

So three bytes all greater than 0xFF are not <.    

Now it is also a little strange that the example given is Koln, not K&ouml;ln. 
Has the data been transliterated (i.e. to remove umlauts)? If so, that is 
the stage that may have inroduced some problems. (I would have expected the 
transliteration for K&ouml;ln to be Koeln, if that is the German city.) 


Cheers
Rick Jelliffe

* (However, there could be other, non-ASCII characters which look similar.
And there is also a really odd thing called "normalization" which may have some
impact too, but probably not here.)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.