|
[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message] Re: Finding a name and the resultingWolfgang Hoschek wolfgang.hoschek at mac.comTue Aug 29 18:05:30 PDT 2006
TagSoup outputs an XML document in the XHTML namespace, so a query needs to look for namespaced nodes, and declare the xhtml namespaces, for example as in: [hoschek /Users/hoschek/unix/devel/nux] curl 'http:// finance.yahoo.com/q?s=IBM' > test5.xml [hoschek /Users/hoschek/unix/devel/nux] fire-xquery --validate=html -- query='{declare namespace xhtml="http://www.w3.org/1999/xhtml"; // xhtml:td}' test5.xml Without specifying the namespaces, the query will come up with an empty result sequence. Wolfgang. On Aug 29, 2006, at 4:45 PM, Graham Reeds wrote: > Sorry about the delay in replying to the questions - other matters > to attend to. > > > Hey, that was not a question that can be answered using yes/no! ;-) > > Sorry about that. Didn't read your response properly. > > > > > I think you really have to come up with the problem solution in > > non-XQuery terms before we (the list) can help you implement that in > > XQuery. E.g. find out how you can determine which table cell relates > > to which user, what do you want do to with multiple values for one > > user, are there any exceptions etc. > > The table that I deemed would be the easiest is work with is 4 > cells wide with the possibility of having just 2 cells populated > with data. > > The cells are simply name-value pairs with the first cell the name > and the second cell the value (an alpha-numeric). To conserve > screen space the original authors placed 2 name-value pairs per row > - awkward I know (they didn't even hyper link them instead had to > go to another screen to see how far in the workers are on the > project). > > An ascii example of the layout: > > +------+---------+------+----------+ > | Tom | ABC123| Dick | DEF456 | > +------+---------+------+----------+ > | Harry | IJK789 | | | > +------+---------+------+----------+ > > This table is nested within other tables for layout and really is > an antiquated system - the amount of hours I have put in this > (between other tasks) I think I could of written the features and > learnt the finer points of java in the same time (c++ is my first > language). > > Currently I have program that can read in a page that using a > combination of Nux, Xom, TagSoup and Saxon. In trying to implement > http://www-128.ibm.com/developerworks/xml/library/j-jtp03225.html > scraping of the Yahoo stock quote for IBM using the below code > simply gives the output of <table /> instead of 81.40. I may of > misinterpreted how to get the value out of results but I should of > got slightly more than a closed table. It is entirely possible > though that tagsoup has nuked all possibility of extracting the > expected value. That is something I need to look into. > > Anyway, thanks for all your continued help. > > Graham Reeds. > > source: > > public void getPage() > { > try > { > XMLReader tagsoup = XMLReaderFactory.createXMLReader > ("org.ccil.cowan.tagsoup.Parser"); > Document doc = new Builder(tagsoup).build("http:// > finance.yahoo.com/q?s=IBM"); > > String query = "<table>\n"+ > "{\n"+ > " for $d in //td\n"+ > " where contains($d/text()[1], \"Last Trade\")\n"+ > " return <tr><td> { data($d/following-sibling::td) } </td></tr>\n"+ > "}\n"+ > "</table>"; > Nodes results = XQueryUtil.xquery(doc, query); > > for (int i=0; i < results.size(); i++) > { > System.out.println(results.get(i).toXML()); > // System.out.println(results.get(i)); > } > } > catch (/* the various exceptions */) > { > // ... > } > > _______________________________________________ > http://x-query.com/mailman/listinfo/talk > http://x-query.com/mailman/listinfo/talk
|
Purchase Stylus Studio Online Today!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|






