[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Converting poorly formed HTML into well-formed XML

Subject: Re: Converting poorly formed HTML into well-formed XML
From: "Steve Muench" <smuench@xxxxxxxxxxxxx>
Date: Tue, 26 Sep 2000 16:55:50 -0700
jtidy html to xml
| Does XSLT have the facilities to directly 
| read in the poorly formed HTML?

No built-in features to do this.

I'd recommend leveraging Andy Quick's excellent (open source)
Java implementation of Dave Raggett's HTML "Tidy" utility called
JTidy.

http://www3.sympatico.ca/ac.quick/jtidy.html

It can expose a DOM API to the "tidied-up" (that is, well-formed)
XML tree for any ill-formed HTML document. You can then pass
the DOM Document into your XSLT engine for transformation.

In my about-to-be-released book "Building Oracle XML Applications"
from O'Reilly, I had occasion to use this JTidy library to show
readers how to take ill-formed HTML and use XSLT to "scrape" 
interesting data out of the "tidied"-up XML result from dynamic
web pages like stock quote services or other online sources of 
information.

______________________________________________________________
Steve Muench, Lead XML Evangelist & Consulting Product Manager
BC4J & XSQL Servlet Development Teams, Oracle Rep to XSL WG
Author "Building Oracle XML Applications", O'Reilly
http://www.oreilly.com/catalog/orxmlapp/


| Does XSLT have the facilities to directly read in the poorly formed HTML?
| And if so, what needs to be done.
| 
| Or,
| 
| Will designing a custom parser that builds a DOM from the poorly formed HTML
| to then be output to an XML file, or directly processed by an XSLT document,
| be the best solution.
| 
| I've already begun developing the latter (custom) solution, but thought I'd
| double check to see if there are any HTML -> XHTML converters available.
| 
| Thanks in advance for your help,
| 
| Joe Fourness
| 
| 
|  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
| 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.