[Home] [By Thread] [By Date] [Recent Entries]

  • From: David Carlisle <davidc@n...>
  • To: costello@m...
  • Date: Wed, 5 Sep 2007 17:05:34 +0100

  





> Example: Word uses Windows-1252 encoding.
word will presumably use whatever encoding its set to use on that
system. 1252 presumably isn't the default everywhere.


> Consequently, if the text was created
> in an editor that uses a different encoding than the XML document then
> the characters that result from pasting the text into the XML document
> may not be the same. 

That's one thing that can happen, but perhaps more likely is that the
resulting string is not a valid utf8 sequence and so the resulting
document can not be parsed at all and will be rejected (with a "fatal
error")

> In UTF-8 the hex value x93 corresponds to a control character. 
No, unless the following bytes also have the top bit set, and this is
the start of a mult-byte encoding of a character, this would be a fatal
error.

> Can you think of other problems that may result from copying text from
> one document and pasting it into an XML document?

The text might contain the string ]]> (although arguably you have
covered this by including > in the list of characters that "may need to
be escaped")

The text might contain non-xml characters that (in XML 1.0) can not even
be entered as numeric references (C0 controls, FFFE, FFFF or the values
corresponding to half a surrogate pair)

David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member