[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Selective escaping of special characters

Subject: Re: Selective escaping of special characters
From: "Thomas B. Passin" <tpassin@xxxxxxxxxxxx>
Date: Wed, 13 Mar 2002 10:25:59 -0500
kyrre wathne
[Kyrre Wathne]

> My apologies if this question has been asked before, I haven't found posts
> that address this exact issue.
>
> My problem is that I want to transform junk HTML generated by Microsoft
> Word. This contains markup, of course, so my first instinct was to use
> disable-output-escaping. However, this also disables escaping of other
> special characters, like the special dash character &#8211;. These are
then
> outputted in a format my browser (Internet Explorer) doesn't understand (I
> use "ISO-8859-1" as encoding in output).
>

Not exactly what you asked for, but HTML-Tidy has a setting that causes it
to remove all the Microsoft junk from Word2000 output.  There are java and C
versions, with various wrappers including Python.  One fast preprocessing
pass with Tidy will do a really nice job of getting rid of all that noise,
much easier than trying to get a stylesheet working.

Cheers,

Tom P


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.