[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: sets of parsing rules
Belated reply: Thanks Michael and Philippe for the pointers. I will see how far I can get with tag soup and pick up from there with the many good links Philippe sent. ----------->Nathan .:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.: ||:. Nathan Young Cisco.com->Interface Development A: ncy1717 E: natyoung@c... > -----Original Message----- > From: Michael Kay [mailto:mike@s...] > Sent: Thursday, February 08, 2007 2:06 AM > To: Nathan Young -X (natyoung - Artizen at Cisco); 'XML > Developers List' > Subject: RE: sets of parsing rules > > > I have an application that parses a large number of HTML > > pages. A few of them are well formed XHTML but that's the > > exception rather than the rule. By grabbing pages, > > manipulating them a bit (regexps have been sufficient here so > > far), then tidying them I can get them to a state where they > > are parsable XML. From there I can use XSL to get them the > > rest of the way (although I have a process that allows me to > > run regexps here too, supplementing XSLT 1.0). > > I'm not sure why you are doing this yourself, when the job > has already been > done. Pick up John Cowan's TagSoup parser, and just plug it > in as the parser > front-end to Saxon, and you will be able to run your > stylesheets on the HTML > directly. > > Michael Kay > http://www.saxonica.com/ >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|