[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Here is how to convert an XML document to a different encoding

  • From: "Costello, Roger L." <costello@mitre.org>
  • To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
  • Date: Wed, 26 Dec 2012 20:02:19 +0000

Here is how to convert an XML document to a different encoding
Hi Folks,

The characters in this XML document are encoded using UTF-8:

<?xml version="1.0"?>
<Name>López</Name>

Its encoding can be changed to another encoding using this simple XSLT program:
---------------------------------------------------
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                           version="1.0">
    
    <xsl:output method="xml"
                         encoding="Shift_JIS"/>
    
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>
    
</xsl:stylesheet>
---------------------------------------------------

The encoding attribute on <xsl:output> specifies the desired encoding. The rest of the XSLT program simply performs an identity copy operation.

Shift_JIS is the character encoding for the Japanese language. 

iso-8859-1 is a superset of ASCII. It consists of 191 characters (ASCII has 128 characters). It contains the characters for most Western European languages.

I applied the XSLT program to the above XML document, specifying encoding="iso-8859-1" and then encoding="Shift_JS"

Then, using a hex editor I was able to see, at the byte level, the changes that were made to the XML document's encoding.

---------------------------------------------
encoding="utf-8"
  L      ó      p   e   z
4C C3 B3 70 65 7A

Two bytes (C3 B3) used to encode ó 
---------------------------------------------
encoding="iso-8859-1"
 L    ó   p   e   z
4C F3 70 65 7A

One byte (F3) used to encode ó 
---------------------------------------------
encoding="Shift_JIS"
  L   &   #   x    f   3    ;    p   e   z
4C 26 23 78 66 33 3B 70 65 7A

ó is converted to a character reference
---------------------------------------------

Very cool!

For more info, see this excellent article: http://www.opentag.com/xfaq_enc.htm#enc_conv  

/Roger


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.