[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Encoding problem or what else?

Subject: Re: Encoding problem or what else?
From: "FC" <flavio@xxxxxx>
Date: Wed, 7 Dec 2005 23:25:01 +0100
utf 8 marker
----- Original Message ----- From: "Geert Josten" <Geert.Josten@xxxxxxxxxxx>
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Wednesday, December 07, 2005 7:22 PM
Subject: Re: Encoding problem or what else?



Hi Flavio,

I expected this from your first post. The three bytes are the (optional) UTF-8 Byte Order Mark (BOM). The XML Parser that is used by your XSL processor does not consume them as it should, resulting in character data in the prolog, which is obviously not allowed.

It is typical of Microsoft products to use this BOM. Wordpad adds it at save time and consumes it at reading time, so you will never see it in that editor. Switch to a different (XML) parser, get rid of the BOM in your data (can you influence the creation?) or patch the reading process to consume this BOM.

Second option is perhaps easiest.

Regards,
Geert



Geert,
this is interesting to know.
What do you mean by "patch"?
Do you mean perhaps that I should write something that strips out the 3 bytes from the beginning of the file?


I think that the easiest solution is to ask the people who deliver this file to switch to ISO-8859-1 as there is no real need to use unicode for these files, I mean, there is not going to be any text containing exotic characters in there.

I am bound to use this xsl processor for the simple reason that it's the best of the bunch from a performance standpoint (thanks Micheal Kay!).
I've been struggling for days with Altova XSLT 2005 engine and Oracle's internal processor and it was a nightmare.
I had a file of 32Mb xml file that took *hours* to be processed with these two processors until I tried out saxon that cruched it in less than one minute!
So, as you can easily guess, I am not going to willingly dump Saxon just for those three funny bytes.


Hey Micheal, what do you think about this?
Is there any hope that xerces will "consume" this utf-8 marker in the near future?


Bye,
Flavio


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.