[Home] [By Thread] [By Date] [Recent Entries]
Excellent! Thanks David. I have re-worded: Example: Microsoft Word uses Windows-1252 encoding. The hex value for the left curly (a.k.a. smart) quote is x93. In UTF-8 encoding the left curly quote is a three-byte sequence of hex codes xE2 x80 x9C, and there is no character corresponding to hex value x93. Copying a left curly quote from a Word document and pasting it into a UTF-8 XML document results in the XML document receiving a byte sequence that cannot be decoded as UTF-8. Is it stated accurately now? If it is correct, I will update the summary with this version. /Roger -----Original Message----- From: David Carlisle [mailto:davidc@n...] Sent: Thursday, September 06, 2007 12:45 PM To: Costello, Roger L. Cc: xml-dev@l... Subject: Re: [Summary] Dangers of Copying Text into an XML Document > . In UTF-8 encoding the hex value for the left curly quote is x201C, No, that's the unicode value (in hex) but in utf8 the character is represneted as a mult-byte sequence. (with the three bytes with hex code points E2 80 9C). The document should be careful to distinguish unicode from its encodings as a sequence of bytes (since it is encoding errors that it is describing, mainly) > Copying a left curly quote from a Word document and pasting it into a > UTF-8 XML document may result in the XML document receiving an illegal > character. that wording makes it sound as if you'd get the same sort of error as if you'd included a control character in the document, that is, a valid unicode character that is not allowed in XML. What you'd get in this case is a byte stream that could not be decoded using utf8, so there would be no characters to pass to the XML parser at all. David http://people.w3.org/rishida/scripts/uniview/conversion _______________________________________________________________________ _ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. _______________________________________________________________________ _
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



