[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: GOTCHA!

Subject: Re: GOTCHA!
From: "Oren Ben-Kiki" <oren@xxxxxxxxxxxxx>
Date: Fri, 15 Jan 1999 12:31:52 +0200
cdata script
James Clark <jjc@xxxxxxxxxx> wrote:

>I wrote:
>> Now this is a hack! You stepped on another XT bug here - or a specs bug.
I
>> checked the following:
>>
>> <xsl:template match="A">
>> <xsl:pi name="JavaScript">
>>  <xsl:text><![CDATA[<&>]]></xsl:text></xsl:pi>
>> </xsl:template>
>>
>> And got in the result:
>>
>> <?JavaScript <&>>
>
>Not an XT bug or a specs bug. You would get
>
><?JavaScript <&>?>

>
>which *is* well-formed. Remember that in XML a PI is terminated by ?>.

Checked again. XT version 0.5 emits '>' and not '?>' at an end of a PI.
Also, it would emit '?>' inside the content without converting it to '? >'
as per the spec. But you are right - there are three constructs which avoid
markup, comments, PIs and CDATA, and we already have access to two of them.
I'd still argue that being able to generate all possible XML text files (as
opposed to all possible XML in-memory representations) has its value, but I
understand why that would be lower priority.

>How often will you get ?> in Javascript? Less often than ]]> I suspect.

I believe that '>?' isn't valid JavaScript. It might appear in strings, of
course... But strings in embedded scripts are a whole painful issue by
itself :-)

>> Or could I expect that an XML/XSL processor to be smart enough to use
>> different character quoting rules within a <SCRIPT> tag?
>
>Right.


I've tried to understand how this works - it does work, to my great
surprise. I went back to the documentation...

The XML spec insists that unadorned '<' and '&' can appear only inside CDATA
sections, a PI, or a comment (section 2.4). Section 2.7 describes CDATA
sections and makes it clear they always begin with "<![CDATA[" and end with
"]]>". Section 3.2 discusses element types. It lists '#PCDATA' as a possible
type in 3.2.2 (without giving its definition, or even a link to somewhere
where it is defined - strange). It does _not_ list 'CDATA' as a valid type.
XSL is expected to always emit valid XML. And yet...

The HTML 4.0 does specify CDATA as the value type for the SCRIPT element
(and many other things), with a link to the _SGML_ standard. Obviously HTML
4.0 isn't XML. Yet it is a valid result-ns for XSL, and the XT processor
emits what seems to be CDATA, for SCRIPT tags. Should be illegal...

The explanation is in section 2.2. In an editorial note it states that it is
possible to use the result-ns to specify non-XML output, and lists HTML as
an example. Elthough this is just an editorial note, _it explicitly caters
to non-XML output_. Who said the W3C isn't responsive? They are just being
shy about it, so they put in in small letters :-) In fact, it is a very
elegant way of solving the problem - it limits the damage to a single
attribute of a single tag. Neat!

Even better, this trick has the potential to settle this issue once and for
all. Consider adding an 'http://www.w3c.org/TR/rec-cdata' result-ns. This
result-ns would specify that all output elements have the content type
'CDATA', so that any text emitted by the stylesheet would not be marked up,
ever. This can't be done in an XML DTD, but neither can the HTML one.
Stylesheets using this result-ns would probably not bother to generate
elements, anyway; by using just <xsl:text> etc. they'll generate output in
an arbitrary formats - without changing anything in the XSL standard itself.

>> It would also have
>> to examine the LANGUAGE attribute for it...
>
>Huh? SCRIPT in HTML 4.0 is an SGML CDATA element, which means that when
>outputting it, & and < must not be escaped to &amp; and &lt;.  This is
>independent of the scripting language.


Right. Sorry. I was thinking about the quoted strings problem - the need to
take some text and quote it so that it may be safely embedded in a
scripting language string; this would be different between scripting
languages. It's really a variant of the arbitrary text formatting issue. If
<xsl:ecmascript-string> is unacceptable, how about adding a perl-like regexp
capability to <xsl:text>? <xsl:text transform='s:["\\]:\\&amp;:g'> would do
wonders :-)

BTW, a final hack which works in XT, if the result-ns is HTML, and would
probably work in other processors as well:

<xsl:template match="...">
<SCRIPT>
<xsl:text><![CDATA[</SCRIPT>]]>
Anything you want - &lt;, &amp;, &gt;
<![CDATA[<SCRIPT>]]></xsl:text>
</SCRIPT>
</xsl:template>

Emits:

<SCRIPT></SCRIPT>
Anything you want: <, &, >
<SCRIPT></SCRIPT>

Where there's a will, there's a way :-)

    Oren Ben-Kiki


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.