[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Sax filter encoding problem

  • To: xml-dev@l...
  • Subject: Sax filter encoding problem
  • From: leo zhu <leozhuca@y...>
  • Date: Fri, 13 Feb 2004 09:26:16 -0800 (PST)

sax encoding
I am trying to write a simple sax filter in Java to
experiment spliting large xml file into small ones.
But I found I couldn't get same content as that in
original xml file. 

For example, I have a xml file as following which was
encoded as "UTF-8".

<?xml version="1.0" encoding="utf-8"?>
<Root>
<Record>“Multipurpose“ abcd</Record>
<Record>the “banana oil” test</Record>
</Root>

It looks OK when I used IE to browse it but after I
used it as input file and run my sax program (just use
sax API to write same file to output file), the
content changed to as followings:

<Root>
<Record>“Multipurpose“ abcd</Record>
<Record>the “banana oil” test</Record>
</Root>

I checked the respective binary: "e2 80 9c " changed
to "93" and "e2 80 9d" changed to "94"! It's not what
I wanted and also I got error when I tried to use IE
to browse it!

At this time, I used 

parser.parse(new InputSource(new File
(input_file_name).toURL().toString ()));

in my program.

And then, I tried another way:

FileReader in_file = new FileReader(input_file_name);
parser.parse(new InputSource(new File
(args[0]).toURL().toString ())));

after running my program, the output looks like:

<Root>
<Record>“Multipurpose“ abcd</Record>
<Record>the “banana oilâ€? test</Record>
</Root>

Speaking with binary, "e2 80 9c " is OK but "e2 80 9d"
still was changed to "94". It's also illegl character
when I use IE to browse it.

Can any body tell me how to handle this problem? And
which way is best way to wrap the input file in
inputsource? Any reply would be appreciated!

My test program likes following:

public class parseXML extends DefaultHandler {

       public void startElement(java.lang.String
namespaceURI,
		java.lang.String localName, java.lang.String qName,
Attributes atts)
	{
             ......
        }

        public void characters(char[] ch, int start,
int length)
	{
           for(int i=0; i<length; i++){
                                                      
                   System.out.print(ch[start+i]);
                }
                .......
        }
......
}


Thanks.

Leo



__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.