[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: fault tolerant saxon:parse()

Subject: Re: fault tolerant saxon:parse()
From: "Andrew Welch" <andrew.j.welch@xxxxxxxxx>
Date: Mon, 17 Nov 2008 11:58:39 +0000
Re:  fault tolerant saxon:parse()
2008/11/17 David Carlisle <davidc@xxxxxxxxx>:
>> I'm wondering if there's a standard approach for a fault tolerant
>> saxon:parse()   (or alternative equivalent)
> personally I've used tagsoup and htmplparse.xsl, but parhaps the nearest
> to a standard these days is http://about.validator.nu/ which implements
> the HTML5 parsing algorithm in Java and exposes (so I'm told) sax and
> DOM interfaces as if it were reading XML.

Thanks, but I'm looking more for a way of detecting when it's needed...

For example, in the nasty RSS feed for Transport for London's live
travel updates you can have:

<title> &lt;a href="/tfl/livetravelnews/realtime/tube/default.html"&gt;Today&lt;/a&gt;

<title>Hammersmith &amp; City</title>

The former needs parsing if you want to process the escaped markup,
but if you do that with the latter you get an error (because it thinks
the ampersand is the start of an entity) - its the same element, so
both escaped and non-escaped markup needs to be handled.

Maybe saxon:try / catch is the only option here...?

Andrew Welch
Kernow: http://kernowforsaxon.sf.net/

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.