[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: unreadable characters from indesign

Subject: Re: unreadable characters from indesign
From: Marc Lambrichs <marc.lambrichs@xxxxxxxxxxxxx>
Date: Thu, 18 Jan 2007 02:24:16 +0100
Re:  unreadable characters from indesign
Abel Braaksma wrote:

Marc Lambrichs wrote:

I'm reading in an xml-feed from Adobe InDesign and in some nodes there are three characters that can't be interpreted by my xsl-translation using utf-8. The codepoints of these 3 are (octal) 226, 128, 169. First of all, I would like to know what these characters should represent. And secondly, could I filter these characters out using something like translate?


This is not possible. Of the range 226, 128 and 169 are octal, you mistyped at least the digits '8' and '9'.


Assuming you meant decimal, and you are talking about codepoints indeed, then there cannot be any problem in reading it, the codepoints 226, 128 and 169 represent the string b&#128;) (not sure the mailer messes this up), which are:

U+00E2, LATIN SMALL LETTER A WITH CIRCUMFLEX
U+0080, control
U+00A9, COPYRIGHT SIGN

See http://www.unicode.org/Public/UNIDATA/UnicodeData.txt for a full list of codepoints.

In UTF-8, this is encoded as the following octets (view your input hexadecimal and you can see if this is indeed correct):
U+00E2 >>> C3A2
U+0080 >>> C280
U+00A9 >>> C2A9


I am not sure what you mean with "can't be interpreted by my xsl-translation using utf-8", because any valid XSLT processor understands at least UTF-8 and UTF-16. However, if what you mean is that these characters are there and should be removed, you can indeed use translate() to remove them:

translate($yourinput, '&#226;&#128;&#169", '')

But if what you mean is that the input has somehow these three values encoded in such a way that it is not UTF-8, then you will have to change your input, because it is not possible to process non-UTF-8 (meaning: containing illegal utf-8 sequences) as if it were UTF-8.

Cheers,
-- Abel Braaksma
  http://www.nuntia.nl

Sorry, no mistype, sheer stupidity on my behalf. Rereading the message I'm sure I should have asked the top half of the question in some Adobe newsgroup, because I still don't understand how those characters end up in my xml and what they should represent. The second half shows how to get rid of them, at the least.

Cheers,
Marc

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.