[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: unparsed-text() and illegal characters

Subject: RE: unparsed-text() and illegal characters
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 27 Jul 2006 20:21:40 +0100
xml 1.0 illegal characters
The spec is very strict that characters not allowed in XML cause an error.
This is a change since the book was written.

However, the spec is very loose about how URIs are resolved. So a conformant
product could take the URI


as a reference to "the document formed by taking thing.txt and substituting
illegal characters with xFFFD."

Perhaps I'll do that.

Michael Kay


> -----Original Message-----
> From: Abel Braaksma Online [mailto:abel.online@xxxxxxxxx] 
> Sent: 27 July 2006 19:10
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject:  unparsed-text() and illegal characters
> Dear List,
> Trying to "import" a non-XML file of an undefined encoding, I 
> received the following error when using Saxon8: "The unparsed 
> text file contains a character illegal in XML (line=1 
> column=4 value=hex 11)". I only found one reference about 
> this error 
> (http://www.stylusstudio.com/xsllist/200510/post90470.html), 
> which is actually a post about illegal characters inside the 
> XSLT document.
> Michael Kay points out in that post that this error is merged 
> into XTDE1190 (see 
> http://www.w3.org/TR/xslt20/#err-XTDE1190). It is claimed in 
> the specs that non-understood characters or byte sequences 
> should result in this non-recoverable dynamic error.
> In his indispensable book, the  XSLT 2.0 Programmer's 
> Reference, he states the following:
> "Some processors will provide configuration options that pass 
> this choice on the user. If the file contains characters that 
> are invalid in XML (this applies to most control characters 
> in the range x00 to x1F under XML 1.0, but only to the null 
> character x00 under XML 1.1) then the invalid characters are 
> substituted by the special Unicode character xFFFD, which is 
> specifically intended for such purposes."
> I understand that the book was written before XSLT 2.0 was 
> finalized (it is still a Candidate), but I wonder if a 
> treatment like above is still possible somehow. The contents 
> of the file is ISO-8859-1, apart from the start and end 
> header, which contain control characters. I only need the 
> part that is parsable as text, the rest can be dismissed.
> Am I asking too much from XSLT, or is this somehow possible? 
> It would really add to the possibilities, and it means I 
> don't need some extra filter or preparse step.
> Cheers,
> Abel Braaksma
> www.nuntia.nl

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.