[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: text extraction

Subject: Re: text extraction
From: "Andrew Welch" <andrew.j.welch@xxxxxxxxx>
Date: Thu, 12 Oct 2006 17:05:30 +0100
Re:  text extraction
On 10/12/06, Abel Braaksma <abel.online@xxxxxxxxx> wrote:
Andrew Welch wrote:
> On 10/12/06, mus47@xxxxxxxx <mus47@xxxxxxxx> wrote:
>> And also I want to now how can the output file encoding setted to
>> iso8859-1 instead of utf8.
>> I use the xsltproc tool.
>
> You can set the output encoding using <xsl:output/>

But it is not guaranteed that the processor supports anything different
from UTF-8/UTF-16.

Are you sure? Interestingly the spec states:


"The value of the encoding attribute provides the value of the
encoding parameter to the serialization method. The default value is
implementation-defined, but in the case of the xml and xhtml methods
it must be either UTF-8 or UTF-16."

(http://www.w3.org/TR/xslt20/#element-output)

...which took me a little by surprise - It seems to say that when the
output method is xml or xhtml the encoding MUST be either UTF-8 or
UTF-16?  Saxon doesn't seem to mind...

Also note, the first 127 codepoints when encoded as ISO-8859-1 or UTF-8
are exactly equal. Only ISO 128 (sometimes euro sign, but you may see
something different: ) and above are treated differently.

Note that ISO-8859-1 is an order of magnitude smaller then UTF-8, so you
may end up with missing or replaced characters (not sure what they will
be replaced with though, when they don't exist) in the output stream.

No you dont end up with missing or replaced characters... Any characters not in the encoding should be output as a character reference. Its a well known technique to use an output encoding of US-ASCII so that all non-ascii characters get output as character references, which gets around read encoding problems further down the pipe.

cheers
andrew

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.