[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: (Re-)Escaping entities in input text

Subject: Re: (Re-)Escaping entities in input text
From: David Vergnaud <dvergnaud@xxxxxxxxx>
Date: Wed, 20 Aug 2008 07:38:56 -0700 (PDT)
Re:  (Re-)Escaping entities in input text
Hi,

thanks for the answer, but I think you misunderstood my question. The point is, I'm not writing to a file on the drive, therefore I'm not using any XML writer here. I'm simply saving my result to an xs:string, and passing it (without writing it down) to another program. so for example, if I have this input file:
<goo>
  <foo>a &lt; b</foo>
</goo>

saxon (I'm using version 9.0.0.4J) will read it, and, in a template that matches "foo", <xsl:value-of select="."/> would return "a < b", which is correct. 

Now assume I have this piece of code (I'm writing it on the run, please be lenient :-) ):
<xsl:template match="foo">
  <xsl:variable name="my_xml">
    <xsl:text>&lt;bar&gt;</xsl:text>
    <xsl:value-of select="." />
    <xsl:text>&lt;/bar&gt;</xsl:text>
  </xsl:variable>
  <xsl:value-of select="java_class:function" />
</xsl:template>

The point of this template is to create a pseudo XML file in a string (my_xml), and pass it on to a java function (java_class:function) which will process it. However, doing it this way, my_xml will have the following content:
<bar>a < b</bar>
which is not well-formed, and thence couldn't be parsed by an XML parser in my java class. 

So what i'm looking for is a way of outputting, *in my internal string*,  "a &lt; b" instead of "a < b". 

I don't think this is bad practice, is it? I mean, definitely there are some cases where XSLT just cannot handle everything, and the processing of a piece of XML have to be handed over to some other processor :-). 

On a related note: could it be that Saxon uses ISO-8859-1 instead of UTF-8 internally?? My source file is definitely UTF-8, but when I pass a string containing special characters (in that case german umlauts) to my Java class, I'm getting '?' (question marks) instead of the 2-byte codepoints... Any idea why this is happening, or how to avoid that?? 

David


----- Original Message ----
From: Andrew Welch <andrew.j.welch@xxxxxxxxx>
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Sent: Wednesday, August 20, 2008 4:21:18 PM
Subject: Re:  (Re-)Escaping entities in input text

> Of course, there is the possibility of replacing all 5 entities "by hand" by calling a transform function, however this might not be very efficient when the string is getting big. Is there either:

_never_ do that... it's the first step down the wrong road which is
long and painful.

> - a way of disabling entity interpretation with xsl:value-of (actually getting "<" when it's written like this in the input file)

xsl:value-of simply creates text nodes in the result tree, there is no
interpretation going on - that only happens during
parsing/serialisation

> - a function to "reescape" a piece of text so that it's usable in an XML file/string?

that happens during serialisation... for example:

<foo> a &lt; b </foo>

when that's parsed you will get a node "foo" with a single text node
child "a < b".   If you do xsl:value-of on that text node, it will add
to the result tree.  It's still "a < b" at this point.  Then the
serializer operates on the result tree which knows that "<" in a text
node must be escaped, so after that step it becomes "a &lt; b"...

It sounds like you might be skipping the serialization step - perhaps
you're constructing a String and just writing that to disk?  eg

String xml = "<foo>" + someValue = "</foo>";

...which would give you:

<foo> a < b </foo>.

...hence the question?  Doing it that way is A Bad Thing - the golden
rule is to always read and write XML using proper XML readers and
writers.

-- 
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.