[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Safe-guarding codepoints-to-string() from wrong input

Subject: Safe-guarding codepoints-to-string() from wrong input
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Wed, 20 Dec 2006 15:33:45 +0100
 Safe-guarding codepoints-to-string() from wrong input
Hi all,

In some translation-stylesheet, I take user-input (arbitrary string) and replace a set of numbers to a set of characters, like this:

$input = "some [34]quoted[34] string"
output --> some "quoted" string

<xsl:analyze-string select="$input" regex="\[(\d+)\]">
<xsl:matching-substring>
<xsl:value-of select="codepoints-to-string(xs:integer(regex-group(1))" />
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="." />
</xsl:non-matching-substring>
</xsl:analyze-string>


Because we are talking tons of data containing the above-like strings (in text files), I'd like to make the codepoints-to-string() a bit more rock-solid. In normal operation, it fails hard. But I'd like it to gracefully degrade: be liberal in what you accept.

I know that control characters are not allowed and throw an "Invalid XML character" error. Also, when adding very wide numbers (like "1234567") give a plural of the same error (Im not sure why). Some characters (I believe these are the ones that are not assigned in Unicode) result in an empty string (like "12345").

Is there a robust way of allowing/disallowing a set of codepoints (other than making one huge lookup list)?

Cheers,
Abel

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.