[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Character encoding/representation from ISO-8859-1

Subject: Re: Character encoding/representation from ISO-8859-1 to UTF-8
From: "Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 11 Oct 2016 19:29:08 -0000
Re:  Character encoding/representation from ISO-8859-1
But do we know that the characters are just bytes?

Sometimes UTF-8 is being read as if it were ISO-8859-1 or CP-1252 (which is more likely on Windows) and then saved as UTF-8. Then C"b,b" are 3 (multibyte) UTF-8 characters.

If this is the case, you can correct it with

iconv -t WINDOWS-1252 -f UTF-8 input.xml | sed -e 's/ encoding="iso-8859-1"/ encoding="UTF-8"/' > output.xml

Gerrit

On 11.10.2016 21:23, Wolfgang Laun wolfgang.laun@xxxxxxxxx wrote:
The characters E2 80 99 are the UTF-8 encoding of the Unicode character
RIGHT SINGLE QUOTATION MARK.

Simply changing the ISO-8859-1 in your XML file to UTF-8 should fix this.


On 11 October 2016 at 21:00, Bridger Dyson-Smith bdysonsmith@xxxxxxxxx <mailto:bdysonsmith@xxxxxxxxx> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx <mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:

Hi all,

    I'm struggling with a character encoding issue (or a character
    representation issue maybe?): I have input XML that looks like this

    input.xml
    <?xml version="1.0" encoding="iso-8859-1"?>
    <documents>
    <document>The reality of the effect of natural ventilation in a
    residential attic cavity has been the topic of many debates and
    scholarly reports since the 1930C"b,b"s.</document>
    </documents>

    and I would like to get it to a point where the characters are
    represented properly, i.e.

    output.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <documents>
    <document>The reality of the effect of natural ventilation in a
    residential attic cavity has been the topic of many debates and
    scholarly reports since the 1930bs.</document>
    </documents>

    Thanks to Liam's help on irc and reading through the list archives,
    it seems like an identity transform should be the right step towards
    getting the representation corrected, but something isn't working
    (or I have a misunderstanding somewhere).

    If I apply the following identity transform with Saxon HE 9.6.0.7 in
    oXygen 18:
    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform
    <http://www.w3.org/1999/XSL/Transform>"
    version="2.0">
    <xsl:output encoding="UTF-8" indent="yes"/>
    <xsl:template match="/"><xsl:copy-of select="/"/></xsl:template>
    </xsl:stylesheet>

    I get the following result:
    <?xml version="1.0" encoding="UTF-8"?>
    <documents>
     <document>The reality of the effect of natural ventilation in a
    residential attic cavity has been the topic of many debates and
    scholarly reports since the 1930C"&#x80;&#x99;s.</document>
    </documents>

    Could someone provide some insight into what I've done wrong here?
    Any help would be greatly appreciated.

    Best,
    Bridger

    XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
    EasyUnsubscribe <-list/528976> (by email)


XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list> EasyUnsubscribe <-list/225679> (by email <>)

-- Gerrit Imsieke GeschC$ftsfC<hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

GeschC$ftsfC<hrer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard VC6ckler
------------------------------------------------------------------------------
Meet us at Frankfurt Book Fair:
Hall 4.2, Stand L68.
More info at http://www.le-tex.de/en/buchmesse.html

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.