Getting the most out of Google’s 404 stats

The 404 reports in Google’s Webmaster Central panel are great to debug your site, but they contain URLs generated by invalid –respectively truncated– URL drops or typos of other Webmasters too. Are you sick of wasting the link love from invalid inbound links, just because you lack a suitable procedure to 301-redirect all these 404 errors to canonical URLs?

Your pain ends here. At least when you’re on a *ix server running Apache with PHP 4+ or 5+ and .htaccess enabled. (If you suffer from IIS go search another hobby.)

I’ve developed a tool which grabs all 404 requests, letting you map a canonical URL to each 404 error. The tool captures and records 404s, and you can add invalid URLs from Google’s 404-reports, if these aren’t recorded (yet) from requests by Ms. Googlebot.

It’s kinda layer between your standard 404 handling and your error page. If a request results in a 404 error, your .htaccess calls the tool instead of the error page. If you’ve assigned a canonical URL to an invalid URL, the tool 301-redirects the request to the canonical URL. Otherwise it sends a 404 header and outputs your standard 404 error page. Google’s 404-probe requests during the Webmaster Tools verification procedure are unredirectable (is this a word?).

Besides 1:1 mappings of invalid URLs to canonical URLs you can assign keywords to canonical URLs. For example you can define that all invalid requests go to /fruit when the requested URI or the HTTP referrer (usually a SERP) contain the strings “apple”, “orange”, “banana” or “strawberry”. If there’s no persistent mapping, these requests get 302-redirected to the guessed canonical URL, thus you should view the redirect log frequently to find invalid URLs which deserve a persistent 301-redirect.

Next there are tons of bogus requests from spambots searching for exploits or whatever, or hotlinkers, resulting in 404 errors, where it makes no sense to maintain URL mappings. Just update an ignore list to make sure those get 301-redirected to example.com/goFuckYourself or a cruel and scary image hosted on your domain or a free host of your choice.

Everything not matching a persistent redirect rule or an expression ends up in a 404 response, as before, but logged so that you can define a mapping to a canonical URL. Also, you can use this tool when you plan to change (a lot of) URLs, it can 301-redirect the old URL to the new one without adding those to your .htaccess file.

I’ve tested this tool for a while on a couple of smaller sites and I think it can get trained to run smoothly without too many edits once the ignore lists etcetera are up to date, that is matching the site’s requisites. A couple of friends got the script and they will provide useful input. Thanks! If you’d like to join the BETA test drop me a message.

Disclaimer: All data get stored in flat files. With large sites we’d need to change that to a database. The UI sucks, I mean it’s usable but it comes with the browser’s default fonts and all that. IOW the current version is still in the stage of “proof of concept”. But it works just fine ;)



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

2 Comments to "Getting the most out of Google's 404 stats"

  1. Sebastian on 30 December, 2009  #link

    The 404grabber prototype is no longer available. At the moment I’ve no intentions to provide such a script.

  2. […] will make sure my old inbound links are still there and also fix 404 errors on my own […]

Leave a reply


[If you don't do the math, or the answer is wrong, you'd better have saved your comment before hitting submit. Here is why.]

Be nice and feel free to link out when a link adds value to your comment. More in my comment policy.