[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XSLT encoding problem

Subject: Re: XSLT encoding problem
From: Mike Brown <mike@xxxxxxxx>
Date: Tue, 8 Jul 2003 14:20:28 -0600 (MDT)
xslt encoding
Venkat Gyambavantha wrote:
> I have an xml with UTF-8 encoding. I want to just change the encoding to ISO
> Latin 1 using XSLT.

Regardless of the encoding of the source XML, this is the stylesheet
you would use:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="no" encoding="iso-8859-1"/>
  <xsl:template match="/">
    <xsl:copy-of select="/">
  </xsl:template>
</xsl:stylesheet>

>  I see the following error
> 
> "An invalid XML character (unicode 0xfc) "

The XML parser is complaining before the XSLT processor even comes into
the picture. This is a cryptic error message, though, because
Unicode character number (hexadecimal) FC (Latin small letter u with umlaut)
*is* a legal character in XML. It can't appear just anywhere, though.

Regardless, I suspect that you were trying to change the XML's actual
encoding by manually editing the encoding declaration in the XML. The
encoding declaration is just a hint to the XML parser to tell it how
the document's bytes are supposed to be mapped to Unicode characters.
The encoding that was actually used to produce the bytes of the document
is the only one you are allowed to put in the declaration.

That is, if your document contains the single byte FC to represent
Latin small letter u with umlaut, then you must declare the encoding
as iso-8859-1. If it uses the two bytes C3 BC to represent Latin small
letter u with umlaut, then you must declare the encoding as utf-8.

If you accurately declare the encoding, the XML document can be parsed 
and its important bits fed to the XSLT processor, which can build the
source tree from that information. The stylesheet above can be used to
duplicate the source tree and serialize the result as iso-8859-1 encoded
text in XML syntax. You will lose unimportant lexical details that 
a parser is designed to weed through, such as entity and character
references, CDATA sections, and what kind of quotes you had originally
put around attribute values, but the content will be the same.

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.