[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Variables and HTML

Subject: RE: Variables and HTML
From: Pieter Reint Siegers Kort <pieter.siegers@xxxxxxxxxxx>
Date: Thu, 10 Mar 2005 18:36:46 -0600
variables in html
Thanx Michael, once again I learned something, but it also raises a
question:

> But there are use cases for it, and a prime one is extracting HTML
documents or fragments that have been wrapped in an XML wrapper.

Why will d-o-e be deprecated in XSLT 2.0 when there are valid use cases for
it?

Cheers,
<prs/>

-----Original Message-----
From: Michael Kay [mailto:mike@xxxxxxxxxxxx] 
Sent: Jueves, 10 de Marzo de 2005 05:47 p.m.
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: RE:  Variables and HTML

> <xsl:variable name="italicOpen">&lt;i></xsl:variable>

I think that in general double-markup (markup disguised as text) is a bad
idea, because it's very confusing and not well supported by tools. It's much
better wherever possible to exploit the fact that XML is fully hierarchic,
so markup can always be nested.

However, the problem does come up quite often. Sometimes its done for bad
reasons, but there are also some plausible reasons:

(a) the inner markup is HTML and is not well-formed-XML

(b) the inner markup represents a complete XML document containing a DTD
internal subset

The process of turning angle brackets into nodes is called parsing. Parsing
also turns an &lt; escape sequence into an angle bracket. If nested markup
like this is to be manipulated by XSLT, then it needs to be turned into
nodes. To get from &lt; via < to a node you need to parse it twice: you can
do this using an extension such as saxon:parse().

The reverse of parsing is serialization. During serialization, nodes are
turned into angle brackets, and angle brackets are turned into escape
sequences (entities).

You want the &lt; in the input to become < in the output. This means you
either need to parse it twice and serialize it once, or you need to parse it
once and bypass the usual action of the serializer - which is what
disable-output-escaping does.

Disable-output-escaping is generally derided because it's so frequently
misused by beginners who haven't understood that XSLT is dealing with trees.
But there are use cases for it, and a prime one is extracting HTML documents
or fragments that have been wrapped in an XML wrapper. It's very problematic
architecturally because it distorts the interface between the transformer
and the serializer, and that's why not every processor supports it. However,
there are cases where it's the best solution available.

Michael Kay
http://www.saxonica.com/

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.