A standard robots.txt at *.blogspot.com looks different:
According to the blogger the blog is not private, what would explain the crawler blocking:
It is a public blog. In the past it had a standard robots.txt, but 10 days ago it changed to “Disallow: /”
Copyscape thinks that the blog in question shares a fair amount of content with other Web pages. So does blog search:
has a duplicate, posted by the same author, at
is reprinted at
and so on. Probably a further investigation would reveal more duplicated contents.
It’s understandable that Blogger is not interested in wasting Google’s resources by letting Ms. Googlebot crawl the same contents from different sources. But why do they block other search engines too? And why do they block the source (the posts reprinted at business-house.net state “Originally posted at [blogspot URL]”)?
Is this really censorship, or just a software glitch, or is it all the blogger’s fault?
Update 07/26/2007: The robots.txt reverted to standard contents for unknown reasons. However, with a shabby link neigborhood as expressed in the blog’s footer I doubt the crawlers will enjoy their visits. At least the indexers will consider this sort of spider fodder nauseous.
Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to Entries Comments All Comments