[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] OT: web crawling (was: Re: HGRAB. Syndication. Google. Grey ar
Paul T wrote: > > > > Google does *exactly* this (and also Google > > > provides a cached copy of the original content) > > > > > > That means: > > > > > > Either both HGRAB and Google should be sued, > > > because they both sell the content > > > *which does not belong to them*, or both > > > HGRAB and Google should be considered > > > 'just a service'. > > > > > > Have a look at http://www.google.com/robots.txt > > I don't understand your point. Could you pelase > explain? > > Because HGRAB, for example, is > usually polling only home page of the website, > they are all allowed for polling. Not all. Some sites "Disallow: /". > Also, I'm not sure if search engines do > really care about the robot.txt, but that's another > story. Googlebot does [1], and that answers your question about the difference between it and HGRAB. > Also, the interesting twist is that when the > robot encounters the website with *no* > robots.txt ( most of the sites have no robots.txt ) > the robot assumes that it is *safe* for him to > 'steal' the content. No twist here; "if it [robots.txt] was not present [then] all robots will consider themselves welcome" [2]. > I think this is really gray area and > robots.txt is not a solution. > At the moment, at least. It isn't. It is just a machine-readable version of [3], kindly provided by Google for your crawling convenience. robots.txt has no legal meaning [4]; you probably can't be sued for disregarding it. But you can for breaking sites' TOS agreements. Ari. [1] http://www.google.com/webmasters/faq.html#nocrawl [2] http://www.robotstxt.org/wc/norobots.html#format [3] http://www.google.com/terms_of_service.html [4] http://www.robotstxt.org/wc/norobots.html#status
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|