[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: html2xml?

Subject: RE: html2xml?
From: naha@xxxxxxxxxx
Date: Wed, 27 Mar 2002 08:54:02 -0500 (EST)
html2xml
Quoting Jarno.Elovirta@xxxxxxxxx:

> Hi,
> 
> > Has anyone done html to xml transformation?
> > Is it possible? If yes...how? A small example would be great =)
> 
> Run the HTML document throught Tidy/JTidy/SX/OpenXML and then process
> like a normal XML document.

I recently tried Tidy (http://www.w3.org/People/Raggett/tidy/) for 
this but found it overly-aggressive in its enforcement of the HTML
DTD.  For example, it transformed

    <a href="some-url">
        <div class="style">anchor text</div>
    </a>

into

    <a href="some-url">
    </a>

    <div class="style">anchor text</div>

which affects the semantics of the document.  I've not found a 
configuration parameter to control this behavior.

Wouldn't it be more correct to transform to

    <div class="style">
        <a href="some-uri">anchor text</a>
    </div>

I'm not familiar with any of the other suggested tools.

I was originally hoping for an all-XSL solution to my problem, but 
since it involves capturing and processing a tree (more like a 
shrub) of crossreferenced web pages, all of which need to be HTML->XML 
converted first, I've started writing a Java program for this.
I was hoping to use the HEX parser 
(http://www-uk.hpl.hp.com/people/sth/java/hex.html) but the version 
I fetched appears to be buggy and the author's email address is no 
longer valid.

I'm unaware of the other converters you suggested.  Google found
whao are apparently two different "OpenXML"s, one written in Java
and one in Delphi.  Could you provide a URL to the one you suggested?
The only information I found about the Java one was on CNET 
(http://download.cnet.com/downloads/0-14492-100-5565652.html) and the
site it refers to as the "publisher" (http://www.openxml.org/) seems 
to be a shopping site.

This topic would be a great candidate for a FAQ.  I didn't find one 
on Dave Pawson's site.

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.