[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: How to parse XML document with default namespace with JDOM
Hi Nicholas,
Thanks for responding to my question. I can confirm that the XPath using Saxon parser ("org.ccil.cowan.tagsoup.Parser") is working with default namespace. I made the mistake of assuming that the XML document converted by TagSoup was identical to using light_html2xml in the past.
Consequently, what is outstanding still, even though not critical, but nice to have, is ( i ) to exclude DTD from XML file. If this is not possible, ( iv ) to setup local SYSTEM EntityResolver in this JDOM environment. Below is an example of what I am trying to achieve in ( iv ) in a DOM environment:
import java.io.IOException; Would anyone be able to give me some idea on how to do this? Thanks a lot again, Jack From: Nicholas Ardlie <nicholas.ardlie@p...> To: netbeansfan@y...; xml-dev@l... Sent: Thursday, 6 November, 2008 9:40:38 PM Subject: RE: How to parse XML document with default namespace with JDOM XPath Jack, It seems to be only the Ant build for Tagsoup that requires Saxon 6.5. You can run the binary distribution (tagsoup-1.2.jar) without Saxon and, as Michael has pointed out, the code you posted has no dependency on Saxon. You may have an environment or classpath issue given that the following class (which is your code with typo 'saxbuilder' changed to 'saxBuilder'), outputs what you are expecting: =================== OUTPUT: =================== hollywood san jose san francisco San diego =================== JAVA CLASS: =================== import java.io.BufferedReader; import java.io.FileReader; import java.util.Iterator; import java.util.List; import org.jdom.Content; import org.jdom.input.SAXBuilder; import org.jdom.xpath.XPath; public class Test { @SuppressWarnings("unchecked") public static void main(String[] args) { try { FileReader frInHtml = new FileReader("C:\\Temp\\ABC.html"); BufferedReader brInHtml = new BufferedReader(frInHtml); SAXBuilder saxBuilder = new SAXBuilder( "org.ccil.cowan.tagsoup.Parser"); org.jdom.Document jdomDocument = saxBuilder.build(brInHtml); XPath xpath = XPath .newInstance("/ns:html/ns:body/ns:div[@id='container']/ns:div[@id='content'] /ns:table[@class='sresults']/ns:tr/ns:td/ns:a"); xpath.addNamespace("ns", "http://www.w3.org/1999/xhtml"); List list = (List) (xpath.selectNodes(jdomDocument)); Iterator iterator = list.iterator(); while (iterator.hasNext()) { Object object = iterator.next(); if (object instanceof Content) System.out..println(((Content) object).getValue()); } } catch (Exception e) { e.printStackTrace(); } } } =================== INPUT FILE: =================== <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> </head> <body> <div id="container"> <div id="content"> <table class="sresults"> <tr> <td> <a href="http://www.abc.com/areas" title="Hollywood, CA">hollywood</a> </td> <td> <a href="http://www.abc.com/areas" title="San Jose, CA">san jose</a> </td> <td> <a href="http://www.abc.com/areas" title="San Francisco, CA">san francisco</a> </td> <td> <a href="http://www.abc.com/areas" title="San Diego, CA">San diego</a> </td> </tr> </table> </div> </div> </body> </html> =================== BUILD/RUN PATH: =================== JDK 1.6.0_06 (i.e. xerces & xalan) tagsoup-1.2.jar jdom.jar jaxen-core.jar jaxen-jdom.jar saxpath.jar Regards, Nick Ardlie.
| ||||||||||
Search 1000's of available singles in your area at the new Yahoo!7 Dating. http://au.rd.yahoo.com/dating/mail/tagline2/*http://au.dating.yahoo.com/?cid=53151&pid=1012
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|