XML Editor
Sign up for a WebBoard account Sign Up Keyword Search Search More Options... Options
Chat Rooms Chat Help Help News News Log in to WebBoard Log in Not Logged in
Conferences Close Tree View
+ Stylus Studio Feature Requests (1192)
+ Stylus Studio Technical Forum (14621)
+ Website Feedback (249)
+ XSLT Help and Discussion (7625)
+ XQuery Help and Discussion (2017)
+ Stylus Studio FAQs (159)
- Stylus Studio Code Samples & Utilities (364)
-> + create an xml document with ex... (2)
-> + Default xml converter from com... (2)
-> + Transforming Data With XSLT (5)
-> + Error when creating pdf file (5)
-> + Got Shkespear XML Doc from web... (3)
-> + VFP 8 (2)
-> + CSV import (7)
-> + Generating X12 from XML (4)
-> + How to handle variable input &... (2)
-> - Grab html to xml (2)
-> ->Grab html to xml
-> + XML To X12 conversion using in... (3)
-> + XSLT mapping help required (2)
-> + convert XML to JAVA (2)
-> + Insert a carriage return or li... (3)
-> + Automatation for comparing two... (2)
-> - Filter records (1)
-> + BIT Fields (3)
-> + Restricted Mixed Content (4)
-> - Freelance Project - Mapping di... (1)
-> + flat file to xml conversion (10)
-- [1-20] [21-40] [41-60] Next
+ Stylus Studio Announcements (113)
Yaniv GatignoSubject: Grab html to xml
Author: Yaniv Gatigno
Date: 16 May 2008 06:49 PM
Hi there.
Suppose I have an html file such as a amazon book description.

I wish to get an XML file of values of selcted fields in that html.
In example: relating to the page refferd at the bottom of the post - I'l like to get and XML containing the author name, title and price.

1. How do I do that?
2. What if I have many files with the same template? Can I batch the operation?



(Deleted User) Subject: Grab html to xml
Author: (Deleted User)
Date: 20 May 2008 09:43 AM
you can convert the HTML page into XML by running the Document Wizard "HTML to XML", then you can write an XSLT or XQuery program that extracts XPath expressions like //*[@class='buying']//*[@id='btAsinTitle'] (to get the title), //*[@class='buying']//a[contains(@href,'field-author')] (to get the author) and //*[@class='buying']//*[@class='priceLarge'] (to get the price).
But as you can guess, this method can be easily broken by minor changes in the HTML generated by Amazon (and has been deprecated by the web community several years ago); the proper way to get such informations from Amazon is to use their Web Service interface as described at http://www.amazon.com/E-Commerce-Service-AWS-home-page/b/ref=sc_fe_l_6?node=12738641

Using Stylus Studio's Web Service Call Composer and the XQuery engine you can then automate the fetching of the data for multiple items and the generation of any document based on them (see http://www.stylusstudio.com/videos/ws-xquery1/ws-xquery1.html for a video describing a similar scenario).

Hope this helps,

Download A Free Trial of Stylus Studio 6 XML Professional Edition Today! Powered by Stylus Studio, the world's leading XML IDE for XML, XSLT, XQuery, XML Schema, DTD, XPath, WSDL, XHTML, SQL/XML, and XML Mapping!  

Log In Options

Site Map | Privacy Policy | Terms of Use | Trademarks
Stylus Scoop XML Newsletter:
W3C Member
Stylus Studio® and DataDirect XQuery ™are from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2016 All Rights Reserved.