|
next
|
 Subject: XML: Editing with embedded space \u00a0 Author: Tony Lavinio Date: 13 Jul 2005 12:08 PM
|
Okay, the problem is what I suspected.
In order to XML parsers to properly read a file with characters
encoded as 0xA0 bytes, the encoding MUST be specified in the <?xml?>
heading. In this case, the file SHOULD have
<?xml version="1.0" encoding="iso-8859-1"?> at the top.
Otherwise, it is being processed as UTF-8, which is the XML default,
according to http://www.w3.org/TR/REC-xml/#NT-EncodingDecl
In UTF-8 a 0xA0 byte in the file is actually invalid for UTF-8, but
for some reason the Xerces-C++ parser that we use instead turns it
into 0xFFFD. 0xA0 encoding for UTF-8 should be in the file as two
bytes, 0xC3 and 0xA0.
For more information on how UTF-8 works, see
http://en.wikipedia.org/wiki/UTF-8
So, either have the person supplying the XML document change the
header to include the encoding="ISO-8859-1", or have them change
the emitter so that non-breaking spaces are written as 0xC3 0xA0
(but not both together!)
|
|
|