[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
[Recent Entries]
[Reply To This Message]
Re: unreadable characters from indesign
Abel Braaksma wrote:
Marc Lambrichs wrote:
I'm reading in an xml-feed from Adobe InDesign and in some nodes
there are three characters that can't be interpreted by my
xsl-translation using utf-8. The codepoints of these 3 are (octal)
226, 128, 169. First of all, I would like to know what these
characters should represent. And secondly, could I filter these
characters out using something like translate?
This is not possible. Of the range 226, 128 and 169 are octal, you
mistyped at least the digits '8' and '9'.
Assuming you meant decimal, and you are talking about codepoints
indeed, then there cannot be any problem in reading it, the codepoints
226, 128 and 169 represent the string b€) (not sure the mailer
messes this up), which are:
U+00E2, LATIN SMALL LETTER A WITH CIRCUMFLEX
U+0080, control
U+00A9, COPYRIGHT SIGN
See http://www.unicode.org/Public/UNIDATA/UnicodeData.txt for a full
list of codepoints.
In UTF-8, this is encoded as the following octets (view your input
hexadecimal and you can see if this is indeed correct):
U+00E2 >>> C3A2
U+0080 >>> C280
U+00A9 >>> C2A9
I am not sure what you mean with "can't be interpreted by my
xsl-translation using utf-8", because any valid XSLT processor
understands at least UTF-8 and UTF-16. However, if what you mean is
that these characters are there and should be removed, you can indeed
use translate() to remove them:
translate($yourinput, '
", '')
But if what you mean is that the input has somehow these three values
encoded in such a way that it is not UTF-8, then you will have to
change your input, because it is not possible to process non-UTF-8
(meaning: containing illegal utf-8 sequences) as if it were UTF-8.
Cheers,
-- Abel Braaksma
http://www.nuntia.nl
Sorry, no mistype, sheer stupidity on my behalf. Rereading the message
I'm sure I should have asked the top half of the question in some Adobe
newsgroup, because I still don't understand how those characters end up
in my xml and what they should represent. The second half shows how to
get rid of them, at the least.
Cheers,
Marc
|
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format
RSS 2.0 |
|
Atom 0.3 |
|
|