Debugging robots.txt with Google Webmaster Tools

Although Google’s Webmaster Console is a really neat toolkit, it can mislead the not-that-savvy crowd every once in a while.

When you go to “Diagnostics::Crawl Errors::Restricted by robots.txt” and you find URIs that aren’t disallow’ed or even noindex’ed in your very own robots.txt, calm down.

Google’s cool robots.txt validator withdraws its knowledge of redirects and approves your redirecting URIs, driving you nuts until you check each URI’s HTTP response code for redirects (HTTP response codes 301, 302 and 307, as well as undelayed meta refreshs).

Google obeys robots.txt even in a chain of redirects. If for Google’s user agent(s) an URI given in an HTTP header’s location is disallow’ed or noindex’ed, Googlebot doesn’t fetch it, regardless the position in the current chain of redirects. Even a robots.txt block in the 5th hop stops the greedy Web robot. Those URIs are correctly reported back as “restricted by robots.txt”, Google just refuses to tell you that the blocking crawler directive origins from a foreign server.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

2 Comments to "Debugging robots.txt with Google Webmaster Tools"

  1. Michael Thomas on 21 August, 2009  #link

    I noticed this the other day thank you for confirming some of the thoughts I had.

  2. Bryan on 3 October, 2009  #link

    Learning how to use robots.txt also helps when it comes to SEO. You can disallow SEs from crawling some of your pages by using this one. Hope you can also make an article about htaccess. Thanks for the useful information.

    [You’re not even following your own advice (” It is important that you leave good and meaningful comments because that’s what gonna get you clicks from the traffic”) when it comes to blog comment spam. Do you really think dropping such a generic robots-txt-comment can pass the spam filter of a blog that provides tons of posts dealing with the REP? By the way, I’ve an .htaccess category, too.]

Leave a reply


[If you don't do the math, or the answer is wrong, you'd better have saved your comment before hitting submit. Here is why.]

Be nice and feel free to link out when a link adds value to your comment. More in my comment policy.