[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: decoding percent-escaped octet sequences
On 2011-05-20 18:14, Julian Reschke wrote:
On 2011-05-20 17:52, Brandon Ibach wrote:Generally, when you're doing string manipulations inside XSLT/XPath, there really is no such thing as ISO-8859-1, UTF-8 or any other encoding, since the "string" data type in XPath is just a string of Unicode characters. The encoding of the input is used to map the sequence of octets to Unicode characters on the way in and the requested encoding of the output is used to do the reverse on the way out. Ok, the following approach isnbt quite a pure XSLT/XPath proof of concept, but maybe youbll still find it useful: ===========8<------------------------ <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:my="my" xmlns:java-urldecode="java:java.net.URLDecoder" > <xsl:output method="xml" indent="yes" /> <!-- see comment below for ' escaping --> <xsl:variable name='input' as='xs:string*' select="( 'us-ascii''en-us''This%20is%20%2A%2A%2Afun%2A%2A%2A', 'iso-8859-1''en''%A3%20rates', 'UTF-8''''%c2%a3%20and%20%e2%82%ac%20rates' )" /> <my:input> <val>us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A</val> <val>iso-8859-1'en'%A3%20rates</val> <val>UTF-8''%c2%a3%20and%20%e2%82%ac%20rates</val> </my:input> <xsl:template name="decode"> <test> <!-- if you select="$input" in the following for-each, please note that the literal ' must be quoted as '' when specifying $input literally --> <xsl:for-each select="document('')//my:input/val"> <xsl:analyze-string select="." regex="^(.*?)'(.*?)'(.*)$"> <xsl:matching-substring> <string encoding="{regex-group(1)}" lang="{regex-group(3)}" encoded="{regex-group(3)}"> <xsl:value-of select="java-urldecode:decode(regex-group(3), regex-group(1))" /> </string> </xsl:matching-substring> </xsl:analyze-string> </xsl:for-each> </test> </xsl:template> </xsl:stylesheet> ===========8<------------------------ It requires a Java-based XSLT 2 processor such as Saxon or Altova. In case of Saxon, I think it works only with PE or EE versions, or with older 9.1 versions. Output (invoke Saxon with -it:decode): <?xml version="1.0" encoding="UTF-8"?> <test xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:my="my" xmlns:java-urldecode="java:java.net.URLDecoder"> <string encoding="us-ascii" lang="This%20is%20%2A%2A%2Afun%2A%2A%2A" encoded="This%20is%20%2A%2A%2Afun%2A%2A%2A">This is ***fun***</string> <string encoding="iso-8859-1" lang="%A3%20rates" encoded="%A3%20rates">B# rates</string> <string encoding="UTF-8" lang="%c2%a3%20and%20%e2%82%ac%20rates" encoded="%c2%a3%20and%20%e2%82%ac%20rates">B# and b, rates</string> </test> -Gerrit -- Gerrit Imsieke GeschC$ftsfC<hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de Registergericht / Commercial Register: Amtsgericht Leipzig Registernummer / Registration Number: HRB 24930 GeschC$ftsfC<hrer: Gerrit Imsieke, Svea Jelonek, Thomas Schmidt, Dr. Reinhard VC6ckler
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|