SSDN - Grab html to xml

XML Editor

Sign Up

Search

Options

Chat

Help

News

Log in

Not Logged in

Topic

Yaniv Gatigno

Subject: Grab html to xml
Author: Yaniv Gatigno
Date: 16 May 2008 06:49 PM

Hi there.
Suppose I have an html file such as a amazon book description.

I wish to get an XML file of values of selcted fields in that html.
In example: relating to the page refferd at the bottom of the post - I'l like to get and XML containing the author name, title and price.

1. How do I do that?
2. What if I have many files with the same template? Can I batch the operation?

Thanks!
Yaniv.

http://www.amazon.com/Introduction-Theory-Computation-Second-Michael/dp/0534950973/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1210977722&sr=8-1

(Deleted User)

Subject: Grab html to xml
Author: (Deleted User)
Date: 20 May 2008 09:43 AM

Hi,
you can convert the HTML page into XML by running the Document Wizard "HTML to XML", then you can write an XSLT or XQuery program that extracts XPath expressions like //*[@class='buying']//*[@id='btAsinTitle'] (to get the title), //*[@class='buying']//a[contains(@href,'field-author')] (to get the author) and //*[@class='buying']//*[@class='priceLarge'] (to get the price).
But as you can guess, this method can be easily broken by minor changes in the HTML generated by Amazon (and has been deprecated by the web community several years ago); the proper way to get such informations from Amazon is to use their Web Service interface as described at http://www.amazon.com/E-Commerce-Service-AWS-home-page/b/ref=sc_fe_l_6?node=12738641

Using Stylus Studio's Web Service Call Composer and the XQuery engine you can then automate the fetching of the data for multiple items and the generation of any document based on them (see http://www.stylusstudio.com/videos/ws-xquery1/ws-xquery1.html for a video describing a similar scenario).

Hope this helps,
Alberto

Powered by Stylus Studio, the world's leading XML IDE for XML, XSLT, XQuery, XML Schema, DTD, XPath, WSDL, XHTML, SQL/XML, and XML Mapping!

Go to Conference:

Log In Options Username: Password:

Site Map | Privacy Policy | Terms of Use | Trademarks

Stylus Studio® and DataDirect XQuery ™are from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2016 All Rights Reserved.