[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: output to iso-8859-1 of non-iso characters, what i

Subject: Re: output to iso-8859-1 of non-iso characters, what is required action
From: "bryan rasmussen" <rasmussen.bryan@xxxxxxxxx>
Date: Wed, 7 May 2008 17:08:15 +0200
Re:  output to iso-8859-1 of non-iso characters
>  don't mix 'characters' with 'bytes'. iso-8859-1 is a codepage that assigns
> a number of characters to certain bytes in the range of 0255.
>  In XML a character may be displayed in different ways, all perfectly
> A, &#65; &#x41;

yes, was there anything in the question that implied otherwise?

>  I seem to remember that it is totally up to the processor to select a
> method. If you use Saxon there are special options to control that
> (if you prefer native bytes, decimal or hex entities).
ok. But by reading the spec it seems to me that if you don't specify a
method it has to do it automatically for you in the case of outputting
text nodes in an XML document (personally I think it should do the
same in comment nodes - not sure why it was decided not to), but to
always fail on a text output.

>  Dropping characters is never an option.
why not. If I want to go from UTF-8 to ISO 8859-1 for some reason the
low level way would be to write something that went through every byte
and checked if it was in range and if not remove it. In the case of a
text output from XML it would be nice if  by declaring the output in
my stylesheet that this was the behavior. But it isn't so on text
output using XSL 1 isn't useful because translate a poor solution for
something that a declarative solution should handle well.

I declare I have something of encoding x and I want something of
encoding y, if I also declare an XML output is required the processor
finds a solution for me. If I declare a text output it seems to think
there is no possible solution. whereas the common solution is to
remove what isn't allowed
replace what isn't allowed.
I think in that context fail doesn't seem very good.

> If you want that you could easily
> filter using translate() to remove all unwanted characters from text nodes.

given that translate (in XSL 1) of all non iso-8859-1 characters to an
empty string is easy do you think you could send me one? :)

Bryan Rasmussen

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.