|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: generating DOM from ill-formed HTML docs
[Robert Mena] > Hi, I am developing an application that will have to > build a DOM tree of html pages. > > I'll use such DOM trees to perform some > analysis/comparisons. > > Since most of the time I'll find ill-formed documents > I'd like to know if there are any parsers out there > that "accept" this flaws and builds the tree anyway. > > I've tried domxml (php) with no luck. The usual answer is to preprocess with Tidy - see http://www.w3.org/People/Raggett/tidy/ You may also want to look at NekoHTML, at http://www.apache.org/~andyc/ This work processed html, including fixing up some problems, and uses the Xerxes JNI so you can build a DOM. Cheers, Tom P
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








