[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: how to make the XP parser recognize xml encoding

Subject: RE: how to make the XP parser recognize xml encoding
From: "George Prezerakos (ETG)" <George.Prezerakos@xxxxxxxxxxxxxxxxxxx>
Date: Wed, 28 Jun 2000 12:28:45 +0200
parse xml encoding
I had a similar problem when using iso-8859-7 encoding as input and wanting utf-8 encoding as output. What I have done is parse the original XML source and replacing each ASCII instance greater than 127 with &#unicode;. 

I can then feed the XML source into xerces and xalan (using the XSLT... classes) and they get automatically converted to utf-8 on the output. I don't even have to change the xml header to encoding=utf-8 (however if you are using a different set of tools you migh have to). Mind that with these tools, you have no other option than utf-8 for output encoding.

Hope this helps.

> George Prezerakos, Ph.D.
> Mobile Internet Applications Development
> Ericsson Hellas S.A.	Phone: + 301 96 01 441 (ext. 966)
> 33, Zeppou Str., 	                Mobile: + 3 0945 545282
> 166 75 - Glyfada,                     	
> Athens-Greece                     	
> E-mail: george.prezerakos@xxxxxxxxxxxxxxx

-----Original Message-----
From: Tom Wang [mailto:tomw@xxxxxxxxx]
Sent: Tuesday, June 27, 2000 9:04 PM
To: xsl-list@xxxxxxxxxxxxxxxx
Subject: how to make the XP parser recognize xml encoding


This is an interesting problem.  I appreciate if anyone can offer me some
help on the following.  Here's my source xml:

<?xml version="1.0" encoding="iso-8859-1"?>

The xml file contains non-ascii characters and it must use the eocoding
specified in the document itself.  I'm using James Clark's XT engine
(com.jclark.xsl.sax.XSLProcessor) and XP parser (com.jclark.xml.sax.Driver).
I construct a FileReader for the above xml file, then use it to construct an
InputSource that feeds into the xsl processor.  But somehow the XP parser is
not recognizing the encoding embedded in the XML decl.  I actually put
garbage there (e.g., encoding="xxx") and the results come out the same.

I traced into James Clark's code and found that because the InputSource is
from a FileReader, it uses encoding "UTF-16" for all character streams.
Help please?



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.