Subject: RE: Escaped characters being duplicated
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 11 Dec 2007 23:22:58 -0000
|
Perplexing indeed.
I'd be less surprised if the output came out as "&lt;" rather that
"<<". That's much more common, and could be caused by processing text
twice when it should only be processed once.
The conversion from "<" to "<" is done by the XML serializer. The fact
that you're using the Saxon XSLT processor doesn't necessarily mean that
you're using the Saxon serializer (the Saxon output could be sent to a DOM
which is then serialized using the DOM serializer); it would be a good idea
to find out what serializer is actually being used. The easiest way to find
out is to see whether the serialization is affected by xsl:output
declarations in the stylesheet.
How did you satisfy yourself that both the successful and the unsuccessful
runs are using Saxon 6.5.5? JAXP is a wonderful beast, and ensures that many
people are running a different XSLT processor from the one they thought they
were using.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Anderson, Paul [mailto:Paul.Anderson@xxxxxxxxxxxxx]
> Sent: 11 December 2007 23:07
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Escaped characters being duplicated
>
> Greetings All,
>
> We have a bunch of DITA XML content and we're using the
> open-source DITA Open Toolkit to transform it into a variety
> of outputs. The DITA Open Toolkit is a collection of Java
> classes, XSL stylesheets, and ANT scripts that transform the
> content and create the output.
>
> To shield our users from the command-line invocation of the
> publishing scripts, we deployed a simple web application
> running on Tomcat 5.5 that takes input from a JSP page and
> invokes the necessary ANT script to generate the desired
> output for the user. This methodology has been working quite
> nicely for nearly a year.
>
> Over that time, a few of our users are having a problem where
> characters escaped in the XML content (for example, angle
> brackets and ampersands) are duplicated in the output. For
> example, in the place of one angle-bracket (<), we end up
> with two or sometimes four escaped angle brackets (<<<<).
>
> I've been troubleshooting the problem and the duplication
> always appears in the output files generated by one of the
> XSL stylesheets in the DITA Open Toolkit. If the input file
> contained an escaped character, the output file contains two
> of those escaped characters. The most interesting discovery
> so far is this: For each user that has the problem, the
> problem goes away if they invoke the ANT script via the
> command line; the duplication only occurs when the ANT script
> is invoked from the JSP page running on Tomcat 5.5. Having
> said that, the problem only exists for a few users; most
> users never see this problem when they use the JSP page to
> invoke the ANT script and publish the exact same XML content.
>
> Perplexing.
>
> Given all this background, my plea to this list is simple:
> What sort of conditions cause an XSL transformation to
> duplicate an escaped character?
>
> Would the system locale have an impact?
> Would the Java version (1.5 versus 1.6) have an impact?
> All source files use UTF-8 encoding.
> All users are using the same XSL processor: Saxon 6.5.5.
> I don't think the problem is in the XSL stylesheet or any
> other part of the DITA Open Toolkit because all users are
> using the same code and it works for most users.
>
> Any ideas about his issue are appreciated.
>
> Best regards,
>
> Paul Anderson
> Information Developer - Codex Administrator Compuware
> Corporation The contents of this e-mail are intended for the
> named addressee only. It contains information that may be
> confidential. Unless you are the named addressee or an
> authorized designee, you may not copy or use it, or disclose
> it to anyone else. If you received it in error please notify
> us immediately and then destroy it.
|