[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: How to parse XML document with default namespace with JDOM
Jack, It seems to be only the Ant build for Tagsoup that requires Saxon 6.5. You can run the binary distribution (tagsoup-1.2.jar) without Saxon and, as Michael has pointed out, the code you posted has no dependency on Saxon. You may have an environment or classpath issue given that the following class (which is your code with typo 'saxbuilder' changed to 'saxBuilder'), outputs what you are expecting: =================== OUTPUT: =================== hollywood san jose san francisco San diego =================== JAVA CLASS: =================== import java.io.BufferedReader; import java.io.FileReader; import java.util.Iterator; import java.util.List; import org.jdom.Content; import org.jdom.input.SAXBuilder; import org.jdom.xpath.XPath; public class Test { @SuppressWarnings("unchecked") public static void main(String[] args) { try { FileReader frInHtml = new FileReader("C:\\Temp\\ABC.html"); BufferedReader brInHtml = new BufferedReader(frInHtml); SAXBuilder saxBuilder = new SAXBuilder( "org.ccil.cowan.tagsoup.Parser"); org.jdom.Document jdomDocument = saxBuilder.build(brInHtml); XPath xpath = XPath .newInstance("/ns:html/ns:body/ns:div[@id='container']/ns:div[@id='content'] /ns:table[@class='sresults']/ns:tr/ns:td/ns:a"); xpath.addNamespace("ns", "http://www.w3.org/1999/xhtml"); List list = (List) (xpath.selectNodes(jdomDocument)); Iterator iterator = list.iterator(); while (iterator.hasNext()) { Object object = iterator.next(); if (object instanceof Content) System.out.println(((Content) object).getValue()); } } catch (Exception e) { e.printStackTrace(); } } } =================== INPUT FILE: =================== <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> </head> <body> <div id="container"> <div id="content"> <table class="sresults"> <tr> <td> <a href="http://www.abc.com/areas" title="Hollywood, CA">hollywood</a> </td> <td> <a href="http://www.abc.com/areas" title="San Jose, CA">san jose</a> </td> <td> <a href="http://www.abc.com/areas" title="San Francisco, CA">san francisco</a> </td> <td> <a href="http://www.abc.com/areas" title="San Diego, CA">San diego</a> </td> </tr> </table> </div> </div> </body> </html> =================== BUILD/RUN PATH: =================== JDK 1.6.0_06 (i.e. xerces & xalan) tagsoup-1.2.jar jdom.jar jaxen-core.jar jaxen-jdom.jar saxpath.jar Regards, Nick Ardlie. [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|