[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: identify sections in an xhtml document
Tempore 01:28:30, die 02/11/2005 AD, hinc in
xsl-list@xxxxxxxxxxxxxxxxxxxxxx scripsit Dean Maslic
<dean.maslic@xxxxxxxxx>:
Hi,Im thinking in a generic way, with any site. Some ideas I had were eg. calculate total num of nodes, then go through block level nodes (div, table,tr, ol etc) and calculate a ratio between their number of nodes vs. total number of nodes. If the numbers are roughly the same (say > 0.9), don't label, go to the child nodes and apply the same. If they are different, look for collections of of links (eg. count(descendant::html:a) > 5) or size of text nodes etc. Im sure there would be a way to do it for a generic 'standard' site (ie.page that contains a Top link-bar, left/right sidebar, and some text/image content) The algorithms you have in mind can be applied with XSLT, but I doubt they'll ever result in something usable. It will only work with structured and well-designed -consistency - sites that are in XHTML. But such sites typically already have decent structure and/or well chosen class attributes from which you can easily derive it. I don't think it will ever work with a "standard" website, which tends to equal a messy and bloated tag soup.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|