[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Unparse-text() string contains ascii chars 29, 30

Subject: RE: Unparse-text() string contains ascii chars 29, 30 and 31
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 19 Oct 2005 17:43:56 +0100
xml string contains
You might be able to make this work by using an XML 1.1 parser (specifying
version="1.1" in the XML declaration). The current Saxon release is a bit
patchy in its support for XML 1.1 (I've been doing some improvements so it
should be better in 8.6) but the basics are there. XML 1.1 allows characters
in the range x01 to x1F provided they are written as character references.
The only character not allowed is 0, which was the result of a coalition
between people who wanted to prevent you holding pure binary, and people who
want to write their software in C.

substring-before is more likely to work than tokenize, because
substring-before allows any string (any string that you can get through the
XML parser, that is), whereas regexes have their own rules and another layer
of parsing. If necessary use translate() to translate the C0 control
characters into PUA Unicode characters, which are legal in a regex.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: andrew welch [mailto:andrew.j.welch@xxxxxxxxx] 
> Sent: 19 October 2005 16:50
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject:  Unparse-text() string contains ascii chars 29, 
> 30 and 31
> 
> I'm trying to process some data that's one long string delimited using
> ascii characters 29, 30 and 31 (which are apparently group, record and
> unit 'separator characters').
> 
> I can get access to the string using unparsed-text(), but when I
> attempt to process the string using any of the function eg:
> 
> tokenize($str, '&#29;')
> 
> or
> 
> substring-before($str, '&#31;')
> 
> ...the XML parser complains that these aren't legal XML characters
> (when the stylesheet itself is parsed).
> 
> Is there any way around this?  I can't see how I can process the
> string in XSLT without using the characters themselves.
> 
> The two alternative's I can see are to use an XMLFilter to turn it
> into XML using Java, or to go back to the source to get them to export
> their data in a less archaic way...

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.