[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Comparison of URIs: Character encoding.
The mapping to an ascii character sequence is defined in the URI specifications. However, there is the 'old' way (which allowed only Latin-1 characters in a URI) and the 'new' way ( which alows any characters, but requires a choice of charset, with UTF-8 being the recommended one), and the two are quite different. The formal specifications are in a bunch of RFCs, including http://www.ietf.org/rfc/rfc2396.txt (URI syntax - updates RFCs 1808, 1738 ) http://www.ietf.org/rfc/rfc2718.txt (Guidelines for new URL schemes, with a note on charset issues) Hoope this helps -- Ian -- Ian Graham .......................... http://www.utoronto.ca/ian/ i a n d o t g r a h a m a t u t o r o n t o d o t c a On Sun, 26 Nov 2000, Alan Kennedy wrote: > Hello again, > > Another question about identifiers, this time URIs. > > I need to compare URIs, both as SYSTEM identifiers and Namespace > identifiers. The question I need to answer is this:- > > What character encoding should I use for encoding and decoding of > escaped values in URIs? > > For example: if I see "%7e"("~" in USASCII) in a URI, what character > en(de)coding should I use to map that to a single character for > comparison purposes? What about "%e9" ("e-acute" in "iso-8859-1")? > > Another example: If I see a non-USASCII character in an URI, > say "ü" ("u-umlaut"), should I escape that as "%fc", as in > "iso-8859-1"? Or should I be using UTF-8? > > Or is there no such universal mapping? > > Again, TIA for any enlightenment. > > Alan. >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|