How to handle a machine-readable pandemic that search engines cannot control

R.I.P. rel-nofollowWhen you’re familiar with my various rants on the ever morphing rel-nofollow microformat infectious link disease, don’t read further. This post is not polemic, ironic, insulting, or otherwise meant to entertain you. I’m just raving about a way to delay the downfall of the InterWeb.

Lets recap: The World Wide Web is based on hyperlinks. Hyperlinks are supposed to lead humans to interesting stuff they want to consume. This simple and therefore brilliant concept worked great for years. The Internet grew up, bubbled a bit, but eventually it gained world domination. Internet traffic was counted, sold, bartered, purchased, and even exchanged for free in units called “hits”. (A “hit” means one human surfer landing on a sales pitch. That is a popup hell designed in a way that somebody involved just has to make a sale).

Then in the past century two smart guys discovered that links scraped from Web pages can be misused to provide humans with very accurate search results. They even created a new currency on the Web, and quickly assigned their price tags to Web pages. Naturally, folks began to trade green pixels instead of traffic. After a short while the Internet voluntarily transferred it’s world domination to the company founded by those two smart guys from Stanford.

Of course the huge amount of green pixel trades made the search results based on link popularity somewhat useless, because the webmasters gathering the most incoming links got the top 10 positions on the search result pages (SERPs). Search engines claimed that a few webmasters cheated on their way to the first SERPs, although lawyers say there’s no evidence of any illegal activities related to search engine optimization (SEO).

However, after suffering from heavy attacks from a whiny blogger, the Web’s dominating search engine got somewhat upset and required that all webmasters have to assign a machine-readable tag (link condom) to links sneakily inserted into their Web pages by other webmasters. “Sneakily inserted links” meant references to authors as well as links embedded in content supplied by users. All blogging platforms, CMS vendors and alike implemented the link condom, eliminating presumably 5.00% of the Web’s linkage at this time.

A couple of months later the world dominating search engine demanded that webmasters have to condomize their banner ads, intercompany linkage and other commercial links, as well as all hyperlinked references that do not count as pure academic citation (aka editorial links). The whole InterWeb complied, since this company controlled nearly all the free traffic available from Web search, as well as the Web’s purchasable traffic streams.

Roughly 3.00% of the Web’s links were condomized, as the search giant spotted that their users (searchers) missed out on lots and lots of valuable contents covered by link condoms. Ooops. Kinda dilemma. Taking back the link condom requirements was no option, because this would have flooded the search index with billions of unwanted links empowering commercial content to rank above boring academic stuff.

So the handling of link condoms in the search engine’s crawling engine as well as in it’s ranking algorithm was changed silently. Without telling anybody outside their campus, some condomized links gained power, whilst others were kept impotent. In fact they’ve developed a method to judge each and every link on the whole Web without a little help from their friends link condoms. In other words, the link condom became obsolete.

Of course that’s what they should have done in the first place, without asking the world’s webmasters for gazillions of free-of-charge man years producing shitloads of useless code bloat. Unfortunately, they didn’t have the balls to stand up and admit “sorry folks, we’ve failed miserably, link condoms are history”. Therefore the Web community still has to bother with an obsolete microformat. And if they –the link comdoms– are not dead, then they live today. In your markup. Hurting your rankings.

If you, dear reader, are a Googler, then please don’t feel too annoyed. You may have thought that you didn’t do evil, but the above said reflects what webmasters outside the ‘Plex got from your actions. Don’t ignore it, please think about it from our point of view. Thanks.

Still here and attentive? Great. Now lets talk about scenarios in WebDev where you still can’t avoid rel-nofollow. If there are any — We’ll see.

PageRank™ sculpting

Dude, PageRank™ sculpting with rel-nofollow doesn’t work for the average webmaster. It might even fail when applied as high sophisticated SEO tactic. So don’t even think about it. Simply remove the rel=nofollow from links to your TOS, imprint, and contact page. Cloak away your links to signup pages, login pages, shopping carts and stuff like that.

Link monkey business

I leave this paragraph empty, because when you know what you do, you don’t need advice.

Affiliate links

There’s no point in serving A elements to Googlebot at all. If you haven’t cloaked your aff links yet, go see a SEO doctor.

Advanced SEO purposes

See above.

So what’s left? User generated content. Lets concentrate our extremely superfluous condomizing efforts on the one and only occasion that might allow to apply rel-nofollow to a hyperlink on request of a major search engine, if there’s any good reason to paint shit brown at all.

Blogging

If you link out in a blog post, then you vouch for the link’s destination. In case you disagree with the link destination’s content, just put the link as

<strong class="blue_underlined" title="http://myworstenemy.org/" onclick="window.location=this.title;">My Worst Enemy</strong>

or so. The surfer can click the link and lands at the estimated URI, but search engines don’t pass reputation. Also, they don’t evaporate link juice, because they don’t interpret the markup as hyperlink.

Blog comments

My rule of thumb is: Moderate, DoFollow quality, DoDelete crap. Install a conditional do-follow plug-in, set everything on moderation, use captchas or something similar, then let the comment’s link juice flow. You can maintain a white list that allows instant appearance of comments from your buddies.

Forums, guestbooks and unmoderated stuff like that

Separate all Web site areas that handle user generated content. Serve “index,nofollow” meta tags or x-robots-headers for all those pages, and link them from a site map or so. If you gather index-worthy content from users, then feed crawlers the content in a parallel –crawlable– structure, without submit buttons, perhaps with links from trusted users, and redirect human visitors to the interactive pages. Vice versa redirect crawlers requesting live pages to the spider fodder. All those redirects go with a 301 HTTP response code.

If you lack the technical skills to accomplish that, then edit your /robots.txt file as follows:

User-agent: Googlebot
# Dear Googlebot, drop me a line when you can handle forum pages
# w/o rel-nofollow crap. Then I'll allow crawling.
# Treat that as conditional disallow:
Disallow: /forum

As soon as Google can handle your user generated content naturally, they might send you a message in their Webmaster console.

Anything else

Judge yourself. Most probably you’ll find a way to avoid rel-nofollow.

Conclusion

Absolutely nobody needs the rel-nofollow microformat. Not even search engines for the sake of their index. Hence webmasters as well as search engines can stop wasting resources. Farewell rel="nofollow", rest in peace. We won’t miss you.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

22 Comments to "How to handle a machine-readable pandemic that search engines cannot control"

  1. Hobo on 19 June, 2009  #link

    mmmm…. that looks like class to me lol

  2. […] Hard Core SEO go hereClaim your FREE SEO guide for beginners quick! Friend us on Twitter. […]

  3. Andy Beard on 20 June, 2009  #link

    This was a much longer comment with lots of html styling, but that appears to be lost.

    CSS styling isn’t passed to all RSS readers and RSS doesn’t use your site stylesheet

    Onclick is stripped out in Greader - still waiting for test to appear in Bloglines

    Index Nofollow or using Robots.txt just creates hanging/dangling pages that will still be in the index but pass no juice. he mesa combination the content will be in the index as well.
    I can’t see the point of this - if worried just link to forum using external javascript, but retain benefit of pages in the index providing juice.

  4. David on 20 June, 2009  #link

    There’s a few new words for the dictionary :)
    I never really looked at nofollow in the way you are portraying it here. Probably as I’ve only ever used it as a way to condomise comment links at my behest. I’ve always used noindex,follow to exclude peripheral pages.

  5. Richard Hearne on 21 June, 2009  #link

    @Andy - of course if RSS feeds are a concern you can always do some funny business with those and leave the links live. Esp. if you’re already mild-cloaking links on your web-pages. But most sculpting I see tends to be in boilerplate anyways ;)

  6. Sebastian on 21 June, 2009  #link

    Sorry Andy. I’ve lost a few books of text this way too. Now I always copy my comments to the clipboard before I hit submit.

    True, other sites displaying syndicated content might render formatting and client sided code different, esp. pseudo links. With HTML 5 GoogleReader will render the CITE attribute, and that does the trick, too: giving the reader a way to visit another URI (source) w/o vouching via link. I’ll keep that in mind when I put code examples of impotent links in the future.

    However, my point was not to show all the best syntax for an artificial link (in the sense of Web standards, not as Google has defined it in the context of unrelated “sneaky” promotional links).

    My point is that Google has used links for scoring. That worked while it was simple, that means while next to nobody was able to game Google’s algo. But links are complex when you’ve to judge intent and evaluate context to compute and apply scoring. Because they didn’t manage to automate this task in their early years, they came out with guidelines first, and then with a microformat that morphed multiple times triggered by even more guidelines.

    The Internet owns linking, not Google. They’re free to interpret links as votes, and to rank pages by those, but then they must do it in their very own realm, without changing the link as well as the usage of links on the Web.

    As for noindex and nofollow in my examples above: The 301 HTTP response code transports reputation (PageRank™, anchor text etc) quite well. If Googlebot gets the 301, it doesn’t interpret meta data, not even those in the HTTP header. The robots.txt example was meant as a work around for the technically challenged out there, who don’t care about a few PageRank™ leaks.

    However, depending on the particular site the procedures to cover the own butt from Google’s link espionage and misinterpretation of links dropped by users as votes from the site owner will be way more complex than the simplified method templates mentioned in my piece.

    So dear readers, please don’t copy ‘n paste the stuff above unchecked. Perhaps I’ll work out a few more detailed code examples in a future post.

    Thanks for your food for thoughts, Andy!

  7. Josh on 22 June, 2009  #link

    One possible solution for forums besides disallow, is to use vBulletin’s archive pages. You could feed Google the archive pages which don’t have links on them.

    I’m finding it hard to believe that Google is going to kill sites for having a lot of comments or being a forum. I’m guessing that there’s more to it than what Cutts is saying.

    Google can count the number of comments and forum posts in a thread. I’m guesing that they won’t lower rankings for having a lot of comments/replies.

  8. Sebastian on 22 June, 2009  #link

    Josh, that’s not a question of Google penalizing a site for too much user generated content. The opposite is true. User generated content, as long as somewhat unique and of a moderate or even good quality, helps a lot when it comes to generating traffic from SERPs. Link drops from users, signature links and so on that could point to URIs disliked by Google are the problem. If you link out to “crap” you’re in danger to get downranked or even delisted when the user generated content is spammy enough and it links out to enough spammy pages.

  9. Josh on 22 June, 2009  #link

    Has the entire meaning of nofollow changed? I thought they just went back to “nofollow means novouch” and dropped the “PR sculpting” concept.

    I nofollow my outbound links in my forums and whitelist ones from users who post a lot (no spammy links though).

    Another good feature vBulletin has is to only show signatures to logged in users.

  10. Sebastian on 22 June, 2009  #link

    Josh, that’s right. Rel-nofollow means “novouch”. Each outgoing link with condom lowers the PageRank™ distributed to your own pages and your editorial links as well. That’s not a problem IMO, because linking out a lot attracts lots of inbound links in return. It is, however, the fact most folks rant on in the current discussion.

    My point is that rel-nofollow is useless at all. Your example “showing signatures to logged-in users and not to robots”, as well as many other sensible methods in site design, demonstrate that. As for the remaining linkage, Google should do its job (ranking Web pages in search results) without demanding machine readable tagging from webmasters.

  11. Sebastian on 23 June, 2009  #link

    From a private message, with Andy’s permission:

    Andy Beard Been there, read that, do I get a t-shirt? You get too complicated on stuff people will never implement anyway - disallow… just say no [emphasis mine]

    Sorry Andy, you don’t get a t-shirt. I’ve no red-crab-t-shirts in my shop, but I do link un-condomized to everybody sending me a cute/neat/funny/pornographic/… XX(X)L t-shirt that I really like. ;)

    W/o kidding, I totally agree from the bottom of my heart.

    Rel-nofollow is the most annoying alien influence in WebDev, usually implemented under pressure for the sole purpose of helping search engines compensating their loopholes. It’s plain blackmailing that those search engines penalize webmasters for the lack of link condoms pulled on particular links, just because they can’t handle (respectively allege they can’t handle) their own shit.

    It’s not that complicated to identify page areas containing user generated content and judging embedded links according to whatever guidelines a search engines wants to apply. Forcing webmasters to castrate all –or at least all unmoderated– links dropped by users on the other hand doesn’t count as smart way to go in my book. Such very questionable business practices lead to “too complicated stuff people will never implement” correctly, respectively not without collateral damage in most cases. That goes for link condoms as well as for all other methods that castrate links just for search engines.

    Then when a search engine discovers that their very own actions result in valuable but uncrawlable content that search engine users should find on their SERPs, it’s time to revamp the guidelines, eliminating the root of the problem. Unqualified doctoring on symptoms, like morphing a useless microformat’s semantics again and again, is plain unprofessional.

    Of course I do know that the problem search engines are faced with is a little more complex than outlined here. But as a developer I also do know that this problem is solvable, without putting pressure on webmasters, when a search engine assigns enough resources to this task.

    I mean, the darn link-abuse-for-ranking-purposes thingy is known since Google exists. If I remember correctly, I got my first sites algorithmically penalized for link monkey business in 2000 or 2001 (saying hello and thanks again for your help and clarifications back then to Matt Cutts, Daniel Dulitz & colleagues!), and since then Google should have became even smarter. Well, in fact they’re way smarter nowadays, and I’m somewhat pissed because they don’t show it in public, still acting as if they’re still depending on machine-readable signals from webmasters, what they obviously aren’t.

  12. Timo on 25 June, 2009  #link

    Hmmm I still feel much better using the nofollow tag in my blog. It definitely deters people from spamming me. It also saves me a lot of work when moderating comments.

  13. […] How to handle a machine-readable pandemic that search engines cannot control […]

  14. Melanie Prough on 28 June, 2009  #link

    Your blog post tag is an excellent idea…. I was a bit curious so I played around with it a bit and found that it can even be made valid by using a pre or similar old school tag.

    You know, in case I link to some who is my worst enemy =-)

  15. Sebastian on 29 June, 2009  #link

    Timo, of course you know that’s not true. Blog comment spammers don’t care whether you condomize their links or not.

  16. […] Andy Beard comments too here – Pageranks Sculpting & Blog Comments and another of my favourite bloggers, Sebastian, pipes in with this gem – http://sebastians-pamphlets.com/rip-rel-nofollow-funeral-party/ […]

  17. Spamming Assclown on 30 June, 2009  #link

    I completely agree with your point that all blogs should become moderated ones and should be do follow, which would keep spammers in the bay and help people who chip in with quality comments. It should be a win-win game isn’t it?

    [Unlinked “Yuva_payday@moerpoundstillpayday.co.uk” spamming me from 203.76.139.118. The same goes for Joan Loan, Ben Mortgage and the whole McViagra family as well. Don’t bother.]

  18. Michael Thomas on 14 August, 2009  #link

    I thought no follow links were so we don’t lose any valuable link juice??

  19. Daniel Martin on 1 September, 2009  #link

    Hmm, I agree with most of this, but I have some worries.

    First of all, you hack together your own nofollow link using strong. Given that Google seems to be able to interpret javascript to some extent is this a good idea? Might it be considered spammy? What is wrong with using a nofollow link in this case?

    Secondly, you suggest that people redirect GoogleBot away from forums and guestbooks etc… Is this not a form of cloaking? Could you not get penalised for this?

    Interesting article though…

  20. Sebastian on 1 September, 2009  #link

    Daniel, although Google can and does interpret JavaScript to some extent, especially onclick trigger code, I consider “strong links” safe (of course you can abuse any other X/HTML element allowed within a P element too). By avoiding the A element the “linking” site makes perfectly clear that it doesn’t vouch for the link. I assume that Google doesn’t count such artificial links in their link graph. Also, the “strong link” is in fact a client sided redirect, equivalent to a 302 HTTP response code. Thus, even when counted in any way, Google should index the issuing URI. Of course all that is theory, and HTML5 providing proper syntax for such stuff will shredder it once the engines will have completely implemented HTML5.

    As for cloaking, the answer is no. Of course technically that’s cloaking, but not deceptive cloaking in the lines of search engine quality guidelines. Usually I would quote some search engine policies to back up my allegation, but I’m kinda sick of search engines trying to rule the Web. Therefore, inspired by Edward, I’ll bring my point home by calling a higher instance, the W3C:

    Content is “equivalent” to other content when both fulfill essentially the same function or purpose upon presentation to the user [emphasis mine]. In the context of this document, the equivalent must fulfill essentially the same function for the person with a disability (at least insofar as is feasible, given the nature of the disability and the state of technology), as the primary content does for the person without any disability. For example, the text “The Full Moon” might convey the same information as an image of a full moon when presented to users.

    Well, a search engine crawler’s disease is that it’s just another algo faking a human user. As long as robots aren’t more savvy than humans, there’s nothing wrong with feeding them with what they can actually digest, instead of the real and high sophisticated stuff aiming at a human audience.

    A search engine shouldn’t penalize a site for user agent optimization. There’s not much difference to serving browser optimized code and contents as well. So, as said before, my suggestion is not meant to get implemented across the boards, but it’s quite safe with the engines. The real shame is, that search engines force webmasters to bother with their faulty algos at all.

  21. […] NoFollow Funeral Party […]

  22. […] fully two years after PageRank Sculpting was announced to have been long since useless, and real professionals understood this, here, in 2011, a client was asking about it, as if it were an important, relevant consideration […]

Leave a reply


[If you don't do the math, or the answer is wrong, you'd better have saved your comment before hitting submit. Here is why.]

Be nice and feel free to link out when a link adds value to your comment. More in my comment policy.