How to disagree on Twitter, machine-readable
With standard hyperlinks you can add a
rel="crap nofollow"
attribute to your A elements. But how do you tell search engine crawlers and other Web robots that you disagree with a link’s content, when you post the URI at Twitter or elsewhere?
You cannot rely on the HTML presentation layer of social media sites. Despite the fact that most of them add a condom to all UGC links, crawlers do follow those links. Nowadays crawlers grab tweets and their embedded links long before they bother to fetch the HTML pages. They fatten their indexers with contents scraped from feeds. That means indexers don’t (really) take the implicit disagreement into account.
As long as you operate your own URI shortener, there’s a solution.
Condomize URIs, not A elements
Here’s how to nofollow a plain link drop, where you’ve no control over link attributes like rel-nofollow:
- Prerequisite: understanding the anatomy of a URI shortener.
- Add an attribute like shortUri.suriNofollowed, boolean, default=false, to your shortened URIs database table. In the Web form where you create and edit short URIs, add a corresponding checkbox and update your affected scripts.
- Make sure your search engine crawler detection is up-to-date.
- Change the piece of code that redirects to the original URI:
if ($isCrawler && $suriNofollowed) {
header("HTTP/1.1 403 Forbidden redirect target", TRUE, 403);
print "<html><head><title>This link is condomized!</title></head><body><p>Search engines are not allowed to follow this link: <code>$suriUri</code></p></body></html>";
}
else {
header("HTTP/1.1 301 Here you go", TRUE, 301);
header("Location: $suriUri");
}
exit;
Here’s an example: This shortened URI takes you to a Bing SEO tip. Search engine crawlers get bagged in a 403 link condom.
Since you can’t test it yourself (user agent spoofing doesn’t work), here’s a header reported by Googlebot (requesting the condomized URI above) today:
HTTP/1.1 403 Forbidden
Date: Thu, 07 Jan 2010 10:19:16 GMT
...
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html
The error page just says:
Title + H1: Link is nofollow'ed
P: Sorry, this shortened URI must not get followed by search engines.
If you can’t roll your own, feel free to make use of my URI Condomizer. Have fun condomizing crappy links on Twitter.
If you check “Nofollow” your URI gets condomized. That means, search engines can’t request it from the shortened URI, but users and other Web robots get redirected.
Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb Subscribe to ![]() ![]() ![]() |
Sebastian | URI shortening, Social Web, Twitter, Nofollow | Related posts
$isCrawler? I smell cloaking.
Why not redirect through a robots.txt’ed out location instead?
Heh. $isCrawler is totally innocent WRT cloaking. Like in “if ($isCrawler) {logThisRequest();}”.
Not trusting search engines is a good habit. I know redirect chains involving disallow’ed scripts work with Google, but I wouldn’t bet some of the minor players possibly get its semantics. Also, it’s way too complex, and slower because it adds a totally useless redirect to the chain. Serving users the requested location whilst blocking crawlers is clean, elegant, and 100% safe.
I agree with Sebastian, I don’t really trust search engines.
This is a great article man! I just found your site through Sphinn and I really like your posts, keep up the great content I’ll be following
Comdomizer tool really rocks! I will try to condomize all my URI ))))
[It seems you didn’t understand it.]
Its a good term you used “Condomizer”. You are doing good. Good luck to you.