Subject:Grab html to xml Author:Yaniv Gatigno Date:16 May 2008 06:49 PM
Hi there.
Suppose I have an html file such as a amazon book description.
I wish to get an XML file of values of selcted fields in that html.
In example: relating to the page refferd at the bottom of the post - I'l like to get and XML containing the author name, title and price.
1. How do I do that?
2. What if I have many files with the same template? Can I batch the operation?
Subject:Grab html to xml Author:(Deleted User) Date:20 May 2008 09:43 AM
Hi,
you can convert the HTML page into XML by running the Document Wizard "HTML to XML", then you can write an XSLT or XQuery program that extracts XPath expressions like //*[@class='buying']//*[@id='btAsinTitle'] (to get the title), //*[@class='buying']//a[contains(@href,'field-author')] (to get the author) and //*[@class='buying']//*[@class='priceLarge'] (to get the price).
But as you can guess, this method can be easily broken by minor changes in the HTML generated by Amazon (and has been deprecated by the web community several years ago); the proper way to get such informations from Amazon is to use their Web Service interface as described at http://www.amazon.com/E-Commerce-Service-AWS-home-page/b/ref=sc_fe_l_6?node=12738641
Using Stylus Studio's Web Service Call Composer and the XQuery engine you can then automate the fetching of the data for multiple items and the generation of any document based on them (see http://www.stylusstudio.com/videos/ws-xquery1/ws-xquery1.html for a video describing a similar scenario).