Re: (Re-)Escaping entities in input text
Hi, thanks for the answer, but I think you misunderstood my question. The point is, I'm not writing to a file on the drive, therefore I'm not using any XML writer here. I'm simply saving my result to an xs:string, and passing it (without writing it down) to another program. so for example, if I have this input file: <goo> <foo>a < b</foo> </goo> saxon (I'm using version 18.104.22.168J) will read it, and, in a template that matches "foo", <xsl:value-of select="."/> would return "a < b", which is correct. Now assume I have this piece of code (I'm writing it on the run, please be lenient :-) ): <xsl:template match="foo"> <xsl:variable name="my_xml"> <xsl:text><bar></xsl:text> <xsl:value-of select="." /> <xsl:text></bar></xsl:text> </xsl:variable> <xsl:value-of select="java_class:function" /> </xsl:template> The point of this template is to create a pseudo XML file in a string (my_xml), and pass it on to a java function (java_class:function) which will process it. However, doing it this way, my_xml will have the following content: <bar>a < b</bar> which is not well-formed, and thence couldn't be parsed by an XML parser in my java class. So what i'm looking for is a way of outputting, *in my internal string*, "a < b" instead of "a < b". I don't think this is bad practice, is it? I mean, definitely there are some cases where XSLT just cannot handle everything, and the processing of a piece of XML have to be handed over to some other processor :-). On a related note: could it be that Saxon uses ISO-8859-1 instead of UTF-8 internally?? My source file is definitely UTF-8, but when I pass a string containing special characters (in that case german umlauts) to my Java class, I'm getting '?' (question marks) instead of the 2-byte codepoints... Any idea why this is happening, or how to avoid that?? David ----- Original Message ---- From: Andrew Welch <andrew.j.welch@xxxxxxxxx> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Sent: Wednesday, August 20, 2008 4:21:18 PM Subject: Re: (Re-)Escaping entities in input text > Of course, there is the possibility of replacing all 5 entities "by hand" by calling a transform function, however this might not be very efficient when the string is getting big. Is there either: _never_ do that... it's the first step down the wrong road which is long and painful. > - a way of disabling entity interpretation with xsl:value-of (actually getting "<" when it's written like this in the input file) xsl:value-of simply creates text nodes in the result tree, there is no interpretation going on - that only happens during parsing/serialisation > - a function to "reescape" a piece of text so that it's usable in an XML file/string? that happens during serialisation... for example: <foo> a < b </foo> when that's parsed you will get a node "foo" with a single text node child "a < b". If you do xsl:value-of on that text node, it will add to the result tree. It's still "a < b" at this point. Then the serializer operates on the result tree which knows that "<" in a text node must be escaped, so after that step it becomes "a < b"... It sounds like you might be skipping the serialization step - perhaps you're constructing a String and just writing that to disk? eg String xml = "<foo>" + someValue = "</foo>"; ...which would give you: <foo> a < b </foo>. ...hence the question? Doing it that way is A Bad Thing - the golden rule is to always read and write XML using proper XML readers and writers. -- Andrew Welch http://andrewjwelch.com Kernow: http://kernowforsaxon.sf.net/
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format