XML Editor
Sign up for a WebBoard account Sign Up Keyword Search Search More Options... Options
Chat Rooms Chat Help Help News News Log in to WebBoard Log in Not Logged in
Show tree view Topic
Go to previous topicPrev TopicGo to next topicNext Topic
Yaniv GatignoSubject: Grab html to xml
Author: Yaniv Gatigno
Date: 16 May 2008 06:49 PM
Hi there.
Suppose I have an html file such as a amazon book description.

I wish to get an XML file of values of selcted fields in that html.
In example: relating to the page refferd at the bottom of the post - I'l like to get and XML containing the author name, title and price.

1. How do I do that?
2. What if I have many files with the same template? Can I batch the operation?



(Deleted User) Subject: Grab html to xml
Author: (Deleted User)
Date: 20 May 2008 09:43 AM
you can convert the HTML page into XML by running the Document Wizard "HTML to XML", then you can write an XSLT or XQuery program that extracts XPath expressions like //*[@class='buying']//*[@id='btAsinTitle'] (to get the title), //*[@class='buying']//a[contains(@href,'field-author')] (to get the author) and //*[@class='buying']//*[@class='priceLarge'] (to get the price).
But as you can guess, this method can be easily broken by minor changes in the HTML generated by Amazon (and has been deprecated by the web community several years ago); the proper way to get such informations from Amazon is to use their Web Service interface as described at http://www.amazon.com/E-Commerce-Service-AWS-home-page/b/ref=sc_fe_l_6?node=12738641

Using Stylus Studio's Web Service Call Composer and the XQuery engine you can then automate the fetching of the data for multiple items and the generation of any document based on them (see http://www.stylusstudio.com/videos/ws-xquery1/ws-xquery1.html for a video describing a similar scenario).

Hope this helps,

Go to previous topicPrev TopicGo to next topicNext Topic
Download A Free Trial of Stylus Studio 6 XML Professional Edition Today! Powered by Stylus Studio, the world's leading XML IDE for XML, XSLT, XQuery, XML Schema, DTD, XPath, WSDL, XHTML, SQL/XML, and XML Mapping!  

Log In Options

Site Map | Privacy Policy | Terms of Use | Trademarks
Stylus Scoop XML Newsletter:
W3C Member
Stylus Studio® and DataDirect XQuery ™are from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2016 All Rights Reserved.