[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Fw: decoding percent-escaped octet sequences

Subject: Fw: decoding percent-escaped octet sequences
From: Hermann Stamm-Wilbrandt <STAMMW@xxxxxxxxxx>
Date: Mon, 23 May 2011 11:47:58 +0200
Fw:  decoding percent-escaped octet sequences
Trying to send again, this time not as UTF-8 email ...

----- Forwarded by Hermann Stamm-Wilbrandt/Germany/IBM on 05/23/2011 11:47
AM -----

From:   Hermann Stamm-Wilbrandt/Germany/IBM
To:     xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Date:   05/23/2011 10:37 AM
Subject:        Re:  decoding percent-escaped octet sequences


DataPower provides a convert-http action to be able to process HTTP form
submissions which are Non-XML.
At the time this entered the product (before acquisition by IBM in 2005)
the default encoding for URL-encoded strings was ISO-8859-1.

The equivalent of convert-action to be used inside DataPower stylesheets
is the dp:decode() extension function:
http://publib.boulder.ibm.com/infocenter/wsdatap/v3r8m2/index.jsp?topic=/xa35
/extensionfunctions41.htm

Last year a customer requested to be able to deal with UTF-8 URL-encoded
URIs (because Google returns those to them).

I provided an implementation for that in a technote and a Webcast:
http://www-01.ibm.com/support/docview.wss?uid=swg21412370
http://www-01.ibm.com/support/docview.wss?uid=swg27019118&aid=1#page=15

This implementation is based on EXSLT extension function str:decode-uri()
(DataPower is a XSLT 1.0 processor).
http://exslt.org/str/functions/decode-uri/index.html


I modified the stylesheet from the technote to eliminate the access to
"dp:variable()".
This way it even works with xsltproc, see below.

$ xsltproc utf8uriDemo.xsl utf8uriDemo.xsl
<?xml version="1.0"?>
<request xmlns:uri="http://uri
"><url>/utf8uri?danish=%C3%86-%C3%98-%C3%85&amp;french=%C5%92-%C3%A6&amp;germ
an=%C3%84-%C3%96-%C3%9C-%C3%9F&amp;spanish=%CA%A7-%EA%9D%86-%C3%91</url><base
-url>/utf8uri</base-url><args
src="url"><arg name="danish">F-X-E</arg><arg name="french">?-f</arg><arg
name="german">D-V-\-_</arg><arg
name="spanish">?-?-Q</arg></args></request>
$
$ cat utf8uriDemo.xsl
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:str="http://exslt.org/strings"
  xmlns:uri="http://uri"
  exclude-result-prefixes="str"
>
  <xsl:template match="/">
    <xsl:variable
name="url"><![CDATA[/utf8uri?danish=%C3%86-%C3%98-%C3%85&french=%C5%92-%C3%A6
&german=%C3%84-%C3%96-%C3%9C-%C3%9F&spanish=%CA%A7-%EA%9D%86-%C3%91]]></xsl:v
ariable>

    <request>
      <url><xsl:copy-of select="$url"/></url>
      <base-url>
        <xsl:copy-of select="substring-before($url,'?')"/>
      </base-url>
      <args src="url">
        <xsl:for-each
          select="str:tokenize(substring-after($url,'?'),'&amp;')">
          <xsl:element name="arg">
            <xsl:attribute name="name">
              <xsl:value-of select="substring-before(.,'=')"/>
            </xsl:attribute>
            <xsl:value-of
              select="str:decode-uri(substring-after(.,'='))"/>
          </xsl:element>
        </xsl:for-each>
      </args>
    </request>
  </xsl:template>

</xsl:stylesheet>
$


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
Fixpack team lead
WebSphere DataPower SOA Appliances
https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



From:   Chris Maloney <voldrani@xxxxxxxxx>
To:     xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Cc:     Brandon Ibach <brandon.ibach@xxxxxxxxxxxxxxxxxxx>
Date:   05/20/2011 07:22 PM
Subject:        Re:  decoding percent-escaped octet sequences



On Fri, May 20, 2011 at 12:14 PM, Julian Reschke <julian.reschke@xxxxxx>
wrote:
> On 2011-05-20 17:52, Brandon Ibach wrote:
>> Generally, when you're doing string manipulations inside XSLT/XPath,
>> there really is no such thing as ISO-8859-1, UTF-8 or any other
>> encoding, since the "string" data type in XPath is just a string of
>> Unicode characters.

But Julian is right that a percent-encoded string, which represents a
byte sequence, can be considered to be encoded in one or another way.
I investigated this same kind of thing for the site I work on, and I
don't have a solution for how to convert these to strings inside XSLT,
but I thought I'd just paste some of the test cases I worked with, in
case they might prove interesting or useful.

1. UTF-8 encoded single character
A. ?term=%C3%84rzteblatt
"Drzteblatt"

2. Invalid character codes (ASCII control character(s), but not valid
ISO-8859-1 or UTF-8)
A. ?term=%02%03cat

3. Non UTF-8, ISO-8859-1, single character
A. ?term=%C4rzteblatt
"Drzteblatt"

4. Invalid byte sequence (not valid utf-8 or iso-8859-1)
A. ?term=%C4%83%C4cat

5. Chinese characters, UTF-8 encoded
A. ?term=%e4%bd%a0%e5%a5%bd
Search box: "??"

6. ISO-8859-1 multi-byte - this sequence starts out looking like UTF-8,
but
it's not.
A. ?term=%c4%A0%c4rzteblatt
Search box: "D Drzteblatt"


After working with this for a while, we reached the conclusion that
it's best to try to strictly enforce the rule that percent-encoding in
URLs be UTF-8.  In other words, I think it's a bad idea to try to
continue to maintain ISO-8859-1 encoded URLs, because it just leads to
too many possible problems, that are very hard to debug.

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.