[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

How to Handle Bad XML (or Word HTML)

Subject: How to Handle Bad XML (or Word HTML)
From: Ted Stresen-Reuter <tedmasterweb@xxxxxxx>
Date: Tue, 11 Mar 2003 15:36:33 -0600
bad xml
Hi,

Thanks again to everyone who answers on this list. You've all been really sweet.

Today's question hopes to try and tackle a transformation of the HTML produced by MS Word into a valid XHTML format.

In general, the problem is Word doesn't produce "valid" XML (specifically, for many elements, attributes are not quoted). The file I'm working with starts with the following:

<html xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns="http://www.w3.org/TR/REC-html40">

Additionally, a typical element might look like this:

<p class=MsoNormal style='text-align:justify;mso-hyphenate:none'><![if !supportEmptyParas]>&nbsp;<![endif]><o:p></o:p></p>

Is it even possible to use such a document as a source document and if so, how do I handle errors returned by the XSLT processor when unquoted attributes are found?

Thanks again to all of you who take the time to read and actually answer these queries.

Sincerely,

Ted Stresen-Reuter


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.