|
[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
[Recent Entries]
[Reply To This Message]
Re: unpacking percent-escaped URI components
On 11/7/2022 10:29 PM, Martin Honnen martin.honnen@xxxxxx wrote:
On 11/7/2022 9:55 PM, Graydon graydon@xxxxxxxxx wrote:
Unpacking RFC 4122 percent-escaped strings for code points less than
256 is straightforward --
tokenize($value,'%')[normalize-space()] ! local:H2D(.) !
codepoints-to-string(.)
where local:H2D is a hex-digits-to-decimal-integer function.
When the escaped value goes above 255, as with U+201C and U+201D, bb,
the escapes start being multi-octet UTF-8, so %E2%80%9C and %E2%80%9D.
Is there a useful way to turn those multi-octet escapes back into single
characters in XPath or XSLT?
I wonder whether with Saxon PE or EE you can use e.g.
(tokenize($value,'%')[normalize-space()] ! local:H2D(.)) =>
saxon:octets-to-hexBinary() => saxon:hexBinary-to-string('UTF8')
Yes, now tested e.g.
B (226, 128,B 156, 226, 128,B 157) => saxon:octets-to-hexBinary() =>
saxon:hexBinary-to-string('UTF8')
gives bb
|
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format
| RSS 2.0 |
|
| Atom 0.3 |
|
|