When your referrer stats turn into a porn TGP

When you wonder why your top referrers are porn galleries, make-you-rich-in-a-second scams and other pages which don’t carry your link but try to sell you something, read further.

Referrer spamming is done by bots requesting pages from your site, leaving a bogus HTTP_REFERER. These spam bots come from various IPs, change their user agents on the fly, and use other sneaky techniques to slip thru spam protection. Some of them are somewhat clever by adjusting the number of bogus requests to your site by your Alexa stats to ensure their “visits” do appear on limited realtime referrer lists and other stats by referrer. Some of them even suck the whole pages from your server, and a few even follow redirects.

So what can you do? Not much. You can’t really get rid of these log entries, because the logs are written before your spam protection handles those requests. But you can reduce the waste of bandwidth and server resources. If you redirect these requests, your server sends only a header, but not the contents. Here is a way to accomplish that:

First of all, extract the bogus referrers from your logs or stats pages, and save them in a plain text file:
Change this to a list of domains, truncating subdomains like “www” or “galleries”, and add .htaccess code:

SetEnvIf Referer \.collegefuckfest\.com GoFuckYourself=1
SetEnvIf Referer \.asstraffic\.com GoFuckYourself=1
SetEnvIf Referer \.allinternal\.com GoFuckYourself=1
SetEnvIf Referer \.mature-lessons\.com GoFuckYourself=1
SetEnvIf Referer \.wildpass\.com GoFuckYourself=1
SetEnvIf Referer \.promote-biz\.net GoFuckYourself=1

This code will create an environment variable “GoFuckYourself” with the value “1″. Following statements can now work with these marked requests:

RewriteCond %{ENV:GoFuckYourself} 1 [NC]
RewriteRule /* %{HTTP_REFERER} [R=301,L]

This redirects the request to its referrer, so if the bogus bot follows redirects, it will request a page from the spammer’s domain. Of course you can redirect to a static URL too:
RewriteRule /* http://www.example.com/gofuckyourself [R=301,L]

You could also use the environment variable in deny statements
order allow,deny
allow from all
deny from env=GoFuckYourself

but that will serve a complete page, and may produce an infinite loop. Deny as well as the similar RewriteRule .* - [F] enforce a 403-Forbidden. Then if you’ve an ErrorDocument 403 /getthefuckouttahere.html directive, the request of the error page runs into the 403 itself - this process calls itself over and over until it gets terminated after 20 or so loops.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

3 Comments to "When your referrer stats turn into a porn TGP"

  1. softplus on 17 May, 2007  #link

    Hi Sebastian
    You might also want to give eKstreme’s CrawlerController a look, he covers things like this as well. http://ekstreme.com/phplabs/crawlercontroller.php

  2. weird biz on 19 November, 2007  #link

    Interesting… although I don’t think I’ve ever experienced this issue myself.

  3. […] traffic. That goes for behaving bots at least. Others might meet a script that handles them with care. I’d rather serve human users bigger ads than wasting bandwidth for useless bots requesting […]

Leave a reply


[If you don't do the math, or the answer is wrong, you'd better have saved your comment before hitting submit. Here is why.]

Be nice and feel free to link out when a link adds value to your comment. More in my comment policy.