Tracking Googlebot-Mozilla is a great way to discover bugs in a Web site. Try it for yourself, filter your logs by her user agent name:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Although Googlebot-Mozilla can add pages to the index, I see her mostly digging in ‘fishy’ areas. For example, she explores URLs where I redirect spiders to a page without query string to avoid indexing of duplicate content. She is very interested in pages with a robots NOINDEX,FOLLOW tag, when she knows another page carrying the same content, available from a similar URL but stating INDEX,FOLLOW. She goes after unusual query strings like ‘var=val&&&&’ resulting from a script bug fixed months ago, but still represented by probably thousands of useless URLs in Google’s index. She fetches a page using two different query strings, checking for duplicate content and alerting me to a superflous input variable used in links on a forgotten page. She fetches dead links to read my very informative error page … and her best friend is the AdSense bot since they seem to share IPs as well as the interest in page updates before Googlebot is aware of them.
Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to Entries Comments All Comments