[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Localisation: Character Encodings & RDBMS, Unicode->UTF-8 wit h Ro u
The irony here is just too much. :-) I have appended the content of Matts message below. > -----Original Message----- > From: Matt Sergeant [SMTP:matt@s...] > Sent: Monday, June 19, 2000 11:28 AM > To: Dylan Walsh > Cc: xml-dev@x... > Subject: RE: Localisation: Character Encodings & RDBMS, > Unicode->UTF-8 wit h Ro und Tripping > > This message uses a character set that is not supported by the Internet > Service. To view the original message content, open the attached > message. If the text doesn't display correctly, save the attachment to > disk, and then open it using a viewer that can display the original > character set. > > << File: message.txt >> > On Mon, 19 Jun 2000, Dylan Walsh wrote: > Forwarding, as it is relevent to this thread. >=20 > > -----Original Message----- > > From: Ronald Bourret [SMTP:rpbourret@h...] > > Sent: Saturday, June 17, 2000 12:35 PM > > To: mrys@m...; Dylan.Walsh@K... > > Subject: RE: Localisation: Character Encodings & RDBMS, > > Unicode->UTF-8 wit h Ro und Tripping > >=20 > > Michael Rys wrote: > >=20 > > >Most databases provide Unicode support (e.g., nchar). Since UTF-8 is= an > > >encoding where the unicode two-byte characters are mapped into a=20 > > >single-byte > > >character space such that for some characters two or three single-by= te > > >characters are used, you of course can easily store UTF-8 as well in= an > > >single-character string datatype. However, strlen functions are norm= ally > > >oblivious to the fact that you actually have UTF-8 stored in the lat= er=20 > > >case, > > >but just from a storage point of view, you should be able to roundtr= ip > > >either UTF-8 or Unicode. > >=20 > > Note also that, unless the database knows it is storing UTF-8, any=20 > > characters that require two bytes to be stored will be unqueriable. F= or=20 > > example, suppose the character '=E4' requires two bytes to be store (= I don't > >=20 > > actually know if it does or not) and the database thinks it is storin= g=20 > > ASCII. If so, the query > >=20 > > SELECT * FROM Employees WHERE Name=3D"Sch=E4fer" > >=20 > > will fail because the bytes actually stored in the database are: > >=20 > > "Sch--fer" > >=20 > > where -- represents the two bytes needed to store '=E4', which don't = match=20 > > "Sch=E4fer". They do if the query is also in UTF-8, and therefore you're requesting: SELECT * FROM Employees WHERE Name=3D"Sch--fer" (using your syntax). --=20 <Matt/> Fastnet Software Ltd. High Performance Web Specialists Providing mod_perl, XML, Sybase and Oracle solutions Email for training and consultancy availability. http://sergeant.org | AxKit: http://axkit.org *************************************************************************** This is xml-dev, the mailing list for XML developers. To unsubscribe, mailto:majordomo@x...&BODY=unsubscribe%20xml-dev List archives are available at http://xml.org/archives/xml-dev/ ***************************************************************************
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|