[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: RSS feeds and disable-output-escaping="yes"

Subject: Re: RSS feeds and disable-output-escaping="yes"
From: David Carlisle <davidc@xxxxxxxxx>
Date: Fri, 6 May 2005 12:03:31 +0100
disable output escaping html
> It's likely that the HTML isn't well-formed XML, so you're going to have to
> extract it as a string, put it through the tidy utility, parse it, and get
> it back into the stylesheet in tree form before you can manipulate it at the
> node level. 
> 
> I would tend to do this as a non-XSLT stage in a processing pipeline; you
> could also do it by calling out to an extension function.
> 

Of course Michael is probably still using XSLT1. Some of us have moved
up to XSLT2 (There's a nice implementation called saxon8...) in which
case you can handle a fair amount of "non well formed html as a string"
just using XSLT2 functions.


eg


h.xml:


<greeting><![CDATA[<P>Hello, <i>world!</P>]]></greeting>


h.xsl:

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:d="data:,dpc"
exclude-result-prefixes="d">

<xsl:import href="http://www.dcarlisle.demon.co.uk/htmlparse.xsl"/>

<xsl:output method="html"/>
<xsl:template match="/">
<html>
<head>
<title>Today's greeting</title>
</head>
<body>
<xsl:copy-of select="d:htmlparse(string(greeting[1]),'',true())/node()"/>
</body>
</html>
</xsl:template>


</xsl:stylesheet>



$ saxon8 h.xml  h.xsl
<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <title>Today's greeting</title>
   </head>
   <body>
      <p>Hello, <i>world!</i></p><i></i></body>
</html>



The <i></i> there is an artifact of its html "recovery" mode of
re-opening automatically closed elements (looks like I should improve
that a bit one day), you can turn off that so by changing true() in the
above call to false() then you get

$ saxon8 h.xml  h.xsl
<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <title>Today's greeting</title>
   </head>
   <body>
      <P>Hello, <i>world!</i></P>
   </body>
</html>

so now the <i> element has been closed but no lowercasing or other
html-specific transformations have been done, and <i> isn't re-opened.

David




________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.