[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Encoding issue

Subject: Re: Encoding issue
From: Mike Brown <mike@xxxxxxxx>
Date: Mon, 13 Aug 2001 22:29:10 -0600 (MDT)
domdocument save encoding
Jason Macki wrote:
> In Notepad, a line shows up like this:
> 	<value><![CDATA[ sustainable consumption by Gábor
> Náray-Szabó]]></value>
> 
> However, in Visual Interdev, the accented characters are displayed as
> gibberish: 
> 	<value><![CDATA[ sustainable consumption by Gábor Náray-Szabó
> ]]></value>

1. It has nothing to do with CDATA sections, in case you were wondering.
   There is no need for them here, unless you the <value> might
   contain unescaped "&" or "<" characters.

2. The document you are viewing in Notepad and Visual Interdev contains 
   the UTF-8 bytes for each character. That's one byte for each ASCII 
   character and two bytes for each of those particular accented non-ASCII 
   characters (á and ó).

3. The version of Notepad that you are using knows how to interpret
   UTF-8 and is showing you the correct glyphs on your screen.

4. The version of Visual Interdev you are using is misinterpreting the
   document as if it were ISO-8859-1 or Windows-1252 encoded. It thinks
   the two-byte characters are two separate characters, and is showing
   you the glyphs accordingly.

> When I use another application to transform this document, an error
> occurs because the line in question contains invalid characters, and the
> "parseerror.srcText" method displays "sustainable consumption by G?bor
> N?ray-Szab?".

The typical cause of this kind of error is that the document contains
ISO-8859-1 or Windows-1252 bytes, while it is being interpreted as UTF-8 
when there's no encoding declaration. Certain byte sequences are 
illegal in UTF-8, and almost any document that is not UTF-8 and not pure 
ASCII will set off this alarm.

The DOMDocument save method knows how to save properly (from the SDK docs:
"Character encoding is based on the encoding attribute in the XML
declaration, such as <?xml version="1.0"  encoding="windows-1252"?>. When
no encoding attribute is specified, the default setting is UTF-8.)

I am guessing that something is amiss in how you are loading this document
for transformation. I'd like to see the actual error you are getting,
though, and what methods you are calling, because sometimes when MSXML is
involved, UTF-16 becomes an issue.

   - Mike
____________________________________________________________________________
  mike j. brown, fourthought.com  |  xml/xslt: http://skew.org/xml/
  denver/boulder, colorado, usa   |  personal: http://hyperreal.org/~mike/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.