Re: [xsl] Safe-guarding codepoints-to-string() from wrong in

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

Subject: Re: Safe-guarding codepoints-to-string() from wrong input
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Wed, 20 Dec 2006 16:18:31 +0100

Andrew Welch wrote:


If you are receiving strings containing literal control characters
then they're almost definitely encoded in Windows-1252 - just parse
them using that and you'll be ok.

No, that's not it. The codepoints are encoded using literal numeric hexadecimal strings (compare 
 in XML, which would be [0A] in the original example)


If the string contains control characters as character references,
then its a bit harder because the references get expanded using
unicode codepoints, and not those specified in the Windows-1252
mappings...  So you need to parse/serialize the string to expand the
references (I personally use JTidy with the CharEncoding set to
Configuration.RAW which forces the Tidy to output the bytes instead of
a reference)

Its a pain....

Well, that's encouraging ;)

The project contains strings that are "escaped" in several ways (texts are literal): C-style: \x0ASome text \x22between quotes\x22 Local style: <0A>Some text <22>between quotes<22> Other style: Text with <22,24,54> multiple special chars XML-like: &0A;Some text &22;between quotes&22;

In short: the input is rubbish. But we know for a fact how to get the codepoints. However, in the past, users have made mistakes. The original application simply ignored those mistakes, replacing the illegal codepoints with nothingness.

The good news is: all codepoints are Unicode codepoints.

Thanks,
-- Abel

Current Thread
Safe-guarding codepoints-to-string() from wrong input Abel Braaksma - 20 Dec 2006 14:35:05 -0000 Andrew Welch - 20 Dec 2006 15:08:33 -0000 Abel Braaksma - 20 Dec 2006 15:19:39 -0000 <= Michael Kay - 20 Dec 2006 15:20:18 -0000 Abel Braaksma - 20 Dec 2006 18:18:51 -0000 Florent Georges - 20 Dec 2006 15:20:37 -0000 Abel Braaksma - 20 Dec 2006 18:15:22 -0000

<- Previous	Index	Next ->
Re: Safe-guarding codepoints-, Andrew Welch	Thread	RE: Safe-guarding codepoints-, Michael Kay
RE: Positional grouping with , Michael Kay	Date	RE: Safe-guarding codepoints-, Michael Kay
	Month

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >