Archived posts from the 'SEO' Category

The anatomy of a deceptive Tweet spamming Google Real-Time Search

Google real time search spammed and abusedMinutes after the launch of Google’s famous Real Time Search, the Internet marketing community began to spam the scrolling SERPs. Google gave birth to a new spam industry.

I’m sure Google’s WebSpam team will pull the plug sooner or later, but as of today Google’s real time search results are extremely vulnerable to questionable content.

The somewhat shady approach to make creative use of real time search I’m outlining below will not work forever. It can be used for really evil purposes, and Google is aware of the problem. Frankly, if I’d be the Googler in charge, I’d dump the whole real-time thingy until the spam defense lines are rock solid.

Here’s the recipe from Dr Evil’s WebSpam-Cook-Book:

Ingredients

  • 1 popular topic that pulls lots of searches, but not so many that the results scroll down too fast.
  • 1 landing page that makes the punter pull out the plastic in no time.
  • 1 trusted authority page totally lacking commercial intentions. View its source code, it must have a valid TITLE element with an appealing call for action related to your topic in its HEAD section.
  • 1 short domain, 1 cheap Web hosting plan (Apache, PHP), 1 plain text editor, 1 FTP client, 1 Twitter account, and a prize basic coding skills.

Preparation

Create a new text file and name it hot-topic.php or so. Then code:
<?php
$landingPageUri = "http://affiliate-program.com/?your-aff-id";
$trustedPageUri = "http://google.com/something.py";
if (stristr($_SERVER["HTTP_USER_AGENT"], "Googlebot")) {
header("HTTP/1.1 307 Here you go today", TRUE, 307);
header("Location: $trustedPageUri");
}
else {
header("HTTP/1.1 301 Happy shopping", TRUE, 301);
header("Location: $landingPageUri");
}
exit;
?>

Provided you’re a savvy spammer, your crawler detection routine will be a little more complex.

Save the file and upload it, then test the URI http://youspamaw.ay/hot-topic.php in your browser.

Serving

  • Login to Twitter and submit lots of nicely crafted, not too much keyword stuffed messages carrying your spammy URI. Do not use obscene language, e.g. don’t swear, and sail around phrases like ‘buy cheap viagra’ with synonyms like ‘brighten up your girl friend’s romantic moments’.
  • On their SERPs, Google will display the text from the trusted page’s TITLE element, linked to your URI that leads punters to a sales pitch of your choice.
  • Just for entertainment, closely monitor Google’s real time SERPs, and your real-time sales stats as well.
  • Be happy and get rich by end of the week.

Google removes links to untrusted destinations, that’s why you need to abuse authority pages. As long as you don’t launch f-bombs, Google’s profanity filters make flooding their real time SERPs with all sorts of crap a breeze.

Hey Google, for the sake of our children, take that as a spam report!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Hard facts about URI spam

I stole this pamphlet’s title (and more) from Google’s post Hard facts about comment spam for a reason. In fact, Google spams the Web with useless clutter, too. You doubt it? Read on. That’s the URI from the link above:

http://googlewebmastercentral.blogspot.com/2009/11/hard-facts-about-comment-spam.html?utm_source=feedburner&utm_medium=feed
&utm_campaign=Feed%3A+blogspot%2FamDG+%28Official+Google+Webmaster+Central+Blog%29

GA KrakenI’ve bolded the canonical URI, everything after the questionmark is clutter added by Google.

When your Google account lists both Feedburner and GoogleAnalytics as active services, Google will automatically screw your URIs when somebody clicks a link to your site in a feed reader (you can opt out, see below).

Why is it bad?

FACT: Google’s method to track traffic from feeds to URIs creates new URIs. And lots of them. Depending on the number of possible values for each query string variable (utm_source utm_medium utm_campaign utm_content utm_term) the amount of cluttered URIs pointing to the same piece of content can sum up to dozens or more.

FACT: Bloggers (publishers, authors, anybody) naturally copy those cluttered URIs to paste them into their posts. The same goes for user link drops at Twitter and elsewhere. These links get crawled and indexed. Currently Google’s search index is flooded with 28,900,000 cluttered URIs mostly originating from copy+paste links. Bing and Yahoo didn’t index GA tracking parameters yet.

That’s 29 million URIs with tracking variables that point to duplicate content as of today. With every link copied from a feed reader, this number will increase. Matt Cutts said “I don’t think utm will cause dupe issues” and points to John Müller’s helpful advice (methods a site owner can apply to tidy up Google’s mess).

Maybe Google can handle this growing duplicate content chaos in their very own search index. Lets forget that Google is the search engine that advocated URI canonicalization for ages, invented sitemaps, rel=canonical, and countless high sophisticated algos to merge indexed clutter under the canonical URI. It’s all water under the bridge now that Google is in the create-multiple-URIs-pointing-to-the-same-piece-of-content business itself.

So far that’s just disappointing. To understand why it’s downright evil, lets look at the implications from a technical point of view.

Spamming URIs with utm tracking variables breaks lots of things

Look at this URI: http://www.example.com/search.aspx?Query=musical+mobile?utm_source=Referral&utm_medium=Internet&utm_campaign=celebritybabies

Google added a query string to a query string. Two URI segment delimiters (“?”) can cause all sorts of troubles at the landing page.

Some scripts will process only variables from Google’s query string, because they extract GET input from the URI’s last questionmark to the fragment delimiter “#” or end of URI; some scripts expecting input variables in a particular sequence will be confused at least; some scripts might even use the same variable names … the number of possible errors caused by amateurish extended query strings is infinite. Even if there’s only one “?” delimiter in the URI.

In some cases the page the user gets faced with will lack the expected content, or will display a prominent error message like 404, or will consist of white space only because the underlying script failed so badly that the Web server couldn’t even show a 5xx error.

Regardless whether a landing page can handle query string parameters added to the original URI or not (most can), changing someone’s URI for tracking purposes is plain evil, IMHO, when implemented as opt-out instead of opt-in.

Appended UTM query strings can make trackbacks vanish, too. When a blog checks whether the trackback URI is carrying a link to the blog or not, for example with this plug-in, the comparision can fail and the trackback gets deleted on arrival, without notice. If I’d dig a little deeper, most probably I could compile a huge list of other functionalities on the Internet that are broken by Google’s UTM clutter.

Finally, GoogleAnalytics is not the one and only stats tool out there, and it doesn’t fulfil all needs. Many webmasters rely on simple server reports, for example referrer stats or tools like awstats, for various technical purposes. Broken. Specialized content management tools feeded by real-time traffic data. Broken. Countless tools for linkpop analysis group inbound links by landing page URI. Broken. URI canonicalization routines. Broken, respecively now acting counterproductive with regard to GA reporting. Google’s UTM clutter has impact on lots of tools that make sense in addition to Google Analytics. All broken.

What a glorious mess. Frankly, I’m somewhat puzzled. Google has hired tens of thousands of this planet’s brightest minds –I really mean that, literally!–, and they came out with half-assed crap like that? Un-fucking-believable.

What can I do to avoid URI spam on my site?

Boycott Google’s poor man’s approach to link feed traffic data to Web analytics. Go to Feedburner. For each of your feeds click on “Configure stats” and uncheck “Track clicks as a traffic source in Google Analytics”. Done. Wait for a suitable solution.

If you really can’t live with traffic sources gathered from a somewhat unreliable HTTP_REFERER, and you’ve deep pockets, then hire a WebDev crew to revamp all your affected code. Coward!

As a matter of fact, Google is responsible for this royal pain in the ass. Don’t fix Google’s errors on your site. Let Google do the fault recovery. They own the root of all UTM evil, so they have to fix it. There’s absolutely no reason why a gazillion of webmasters and developers should do Google’s job, again and again.

What can Google do?

Well, that’s quite simple. Instead of adding utterly useless crap to URIs found in feeds, Google can make use of a clever redirect script. When Feedburner serves feed items to anybody, the values of all GA tracking variables are available.

Instead of adding clutter to these URIs, Feedburner could replace them with a script URI that stores the timestamp, the user’s IP addy, and whatnot, then performs a 301 redirect to the canonical URI. The GA script invoked on the landing page can access and process these data quite accurately.

Perhaps this procedure would be even more accurate, because link drops can no longer mimick feed traffic.

Speak out!

So, if you don’t approve that Feedburner, GoogleReader, AdSense4Feeds, and GoogleAnalytics gang rape your well designed URIs, then link out to everything Google with a descriptive query string, like:

I mean, nicely designed canonical URIs should be the search engineer’s porn, so perhaps somebody at Google will listen. Will ya?

Update:2010 SEMMY Nominee

I’ve just added a “UTM Killer” tool, where you can enter a screwed URI and get a clean URI — all ‘utm_’ crap and multiple ‘?’ delimiters removed — in return. That’ll help when you copy URIs from your feedreader to use them in your blog posts.

By the way, please vote up this pamphlet so that I get the 2010 SEMMY Award. Thanks in advance!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Derek Powazek outed himself big-mouthed and ignorant, and why that’s a pity

Derek PowazekWith childish attacks on his colleagues, Derek Powazek didn’t do professional Web development –as an industry– a favor. As a matter of fact, Derek Powazek insulted savvy Web developers, Web designers, even search engine staff, as well as useability experts and search engine specialists, who team-up in countless projects helping small and large Web sites succeed.

I seriously can’t understand how Derek Powazek “has survived 13 years in the web biz” (source) without detailled knowledge of how things get done in Web projects. I mean, if a developer really has worked 13 years in the Web biz, he should know that the task of optimizing a Web site’s findability, crawlability, and accessibility for all user agents out there (SEO) is usually not performed by “spammers evildoers and opportunists”, but by highly professional experts who just master Web development better than the average designer, copy-writer, publisher, developer or marketing guy.

Boy, what an ego. Derek Powazek truly believes that if “[all SEO/SEM techniques are] not obvious to you, and you make websites, you need to get informed” (source). That translates to “if you aren’t 100% perfect in all aspects of Web development and Internet marketing, don’t bother making Web sites — go get a life”.

Derek PowazekWell, I consider very few folks capable of mastering everything in Web development and Internet marketing. Clearly, Derek Powazek is not a member of this elite. With one clueless, uninformed and way too offensive rant he has ruined his reputation in a single day. Shortly after his first thoughtless blog post libelling fellow developers and consultants, Google’s search result page for [Derek Powazek] is flooded with reasonable reactions revealing that Derek Powazek’s pathetic calls for ego food are factually wrong.

Of course calm and knowledgable experts in the field setting the records straight, like Danny Sullivan (search result #1 and #4 for [Derek Powazek] today) and Peter da Vanzo (SERP position #9), can outrank a widely unknown guy like Derek Powazek at all major search engines. Now, for the rest of his presence on this planet, Derek Powazek has to live with search results that tell the world what kind of an “expert” he really is (example ).

He should have read Susan Moskwa’s very informative article about reputation management on Google’s Blog a day earlier. Not that reputation management doesn’t count as an SEO skill … actually, that’s SEO basics (as well as URI canonicalization).

Dear Derek Powazek, guess what all the bright folks you’ve bashed so cleverly will do when you ask them to take down their responses to your uncalled-for dirty talk?

So what can we learn from this gratuitous debacle? Do not piss in someone’s roses when

  • you suffer from an oversized ego,
  • you’ve not the slightest clue what you’re talking about,
  • you can’t make a point with proven facts, so you’ve to use false pretences and clueless assumptions,
  • you tend to insult people when you’re out of valid arguments,
  • willy whacking is not for you, because your dick is, well, somewhat undersized.

Ok, it’s Friday evening, so I’m supposed to enjoy TGIF’s. Why the fuck am I wasting my valuable spare time writing this pamphlet? Here’s why:

Having worked in, led, and coached WebDev teams on crawlability and best practices with regard to search engine crawling and indexing for ages now, I was faced with brain amputated wannabe geniuses more than once. Such assclowns are able to shipwreck great projects. From my experience the one and only way to keep teams sane and productive is sacking troublemakers at the moment you realize they’re unconvinceable. This Powazek dude has perfectly proven that his ignorance is persistent, and that his anti-social attitude is irreversible. He’s the prime example of a guy I’d never hire (except if I’d work for my worst enemy). Go figure.


Update 2009-10-19: I consider this a lame excuse. Actually, it’s even more pathetic than the malicious slamming of many good folks in his previous posts. If Derek Powazek really didn’t know what “SEO” means in the first place, his brain farts attacking something he didn’t understand at the time of publishing his rants are indefensible, provided he was anything south of sane then. Danny Sullivan doesn’t agree, and he’s right when he says that every industry has some black sheep, but as much as I dislike comment spammers, I dislike bullshit and baseness.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Search engines should make shortened URIs somewhat persistent

URI shorteners are crap. Each and every shortened URI expresses a design flaw. All –or at least most– public URI shorteners will shut down sooner or later, because shortened URIs are hard to monetize. Making use of 3rd party URI shorteners translates to “put traffic at risk”. Not to speak of link love (PageRank, Google juice, link popularity) lost forever.

SEs could rescue tiny URLsSearch engines could provide a way out of the sURL dilemma that Twitter & Co created with their crappy, thoughtless and shortsighted software designs. Here’s how:

Most browsers support search queries in the address bar, as well as suggestions (aka search results) on DNS errors, and sometimes even 404s or other HTTP response codes other than 200/3x. That means browsers “ask a search engine” when an HTTP request fails.

When a TLD is out of service, search engines could have crawled a 301 or meta refresh from a page formerly living on a .yu domain for example. They know the new address and can lead the user to this (working) URI.

The same goes for shortened URIs created ages ago by URI shortening services that died in the meantime. Search engines have transferred all the link juice from the shortened URI to the destination page already, so why not point users that request a dead short URI to the right destination?

Search engines have all the data required for rescuing short URIs that are out of service in their datebases. Not de-indexing “outdated” URIs belonging to URI shorteners would be a minor tweak. At least Google has stored attributes and behavior of all links on the Web since the past century, and most probably other search engines are operated by data rats too.

URI shorteners can be identified by simple patterns. They gather tons of inbound links from foreign domains that get redirected (not always using a 301!) to URIs on other 3rd party domains. Of course that applies to some AdServers too, but rest assured search engines do know the differences.

So why the heck didn’t Google, Yahoo/MSN Bing, and Ask offer such a service yet? I thought it’s all about users, but I might have misread something. Sigh.

By the way, I’ve recorded search engine misbehavior with regard to shortened URIs that could arouse Jack The Ripper, but that’s a completely other story.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

The “just create compelling and useful content” lie

The compelling content lieI’m so sick of the universal answer to all SEO questions. Each and every search engine rep keeps telling me that “creating a great and useful site with compelling content will gain me all the rankings I deserve”. What a pile of bullshit. Nothing ranks without strong links. I deserve more, and certainly involving less work.

Honestly, why the heck should I invest any amount of time and even money to make others happy? It’s totally absurd to put up a great site with compelling content that’s easy to navigate and all that, just to please an ungrateful crowd of anonymous users! What a crappy concept. I don’t buy it.

I create websites for the sole purpose of making money, and lots of green. It’s all about ME! Here I said it. Now pull out the plastic. ;-)

And there’s another statement that really annoys me: “make sites for users, not for search engines”. Again, with self-serving commandments like this one search engine quality guidelines do insult my intelligence. Why should I make even a single Web page for search engines? Google, Yahoo Microsoft and Ask staff might all be wealthy folks, but from my experience they don’t purchase much porn on-line.

I create and publish Web contents to laugh all the way to the bank at the end of the day. For no other reason. I swear. To finally end the ridiculous discussion of utterly useless and totally misleading search engine guidelines, I’ll guide you step by step through a few elements of a successful website, explaining why and for whom I perform whatever, why it’s totally selfish, and what it’s worth.

In some cases that’ll be quite a bit geeky, so just skip the technical stuff if you’re a plain Internet marketer. Also, I don’t do everything by the book on Web development, so please read the invisible fine print carefully .

robots.txt

This gatekeeper prevents my sites from useless bot traffic. That goes for behaving bots at least. Others might meet a script that handles them with care. I’d rather serve human users bigger ads than wasting bandwidth for useless bots requesting shitloads of content.

.htaccess

I’m a big fan of the “one URI = one piece of content” principle. I consider gazillions of URI variations serving similar contents avoidable crap. That’s because I can’t remember complex URIs stuffed with tracking parameters and other superfluous clutter. Like the average bookmarking surfer, I prefer short and meaningful URIs. With a few simple .htaccess directives I make sure that everybody gets the requested piece of content advertising under the canonical URI.

ErrorDocuments

Error handling is important. Before I throw a 404-Not-found error to a human visitor, I analyze the request’s context (e.g. REQUEST_URI, HTTP_REFERER). If possible, I redirect the user to the page s/he should have requested then, or at least to a page with related links banner ads. Bouncing punters don’t make me any money.

HTTP headers

There’s more than HTTP response codes doable with creative headers. For example cost savings. In some cases a single line of text in an HTTP header tells the user agent more than a bunch of markup.

Sensible page titles and summaries

There’s nothing better to instantly catch the user’s interest than a prominent page title, followed by a short and nicely formatted summary that the average reader can skim in few seconds. Fast loading and eye-catching graphics can do wonders, too. Just in case someone’s scraping bookmarking my stuff, I duplicate my titles into usually invisible TITLE elements, and provide seducing calls for action in descriptive META elements. Keeping the visitor interested for more than a few seconds results in monetary opportunities.

Unique content

Writing unique, compelling, and maybe even witty product descriptions increases my sales. Those are even more attractive when I add neat stuff that’s not available from the vendor’s data feed (I don’t mean free shipping, that’s plain silly). A good (linkworthy) product page comes with practical use cases and/or otherwise well presented, not too longish outlined, USPs. Producers as well as distributors do suck at this task.

User generated content

Besides faked testimonials and the usual stuff, asking vistors questions like “Just in case you buy product X here, what will you actually do with it? How will you use it? Whom will you give it to?” leads to unique text snippets creating needs. Of course all user generated content gets moderated.

Ajax’ed “Buy now” widgets

Most probably a punter who has clicked the “add to shopping cart” link on a page nicely gathering quite a few products will not buy another one of them, if the mouse click invokes a POST request of the shopping cart script requiring a full round trip to the server. Out of sight, out of mind.

Sitemaps

Both static as well as dynamically themed sitemap pages funnel a fair amount of vistors to appropriate landing pages. Dumping major parts of a site’s structure in XML format attracts traffic, too. There’s no such thing as bad traffic, just weak upselling procedures.

Ok, that’s enough examples to bring my point home. Probably I’ve bored you to death anyway. You see, whatever I do as a site owner, I do it only for myself (inevitably accepting collateral damage like satisfied punters and compliance to search engine quality guidelines). Recap: I don’t need no stinkin’ advice restrictions from search engines.

Seriously, since you’re still awake and following my long-winded gobbledygook, here’s a goodie:

The Number One SEO Secret

Think like a search engine engineer. Why? Because search engines are as selfish as you are, or at least as you should be.

In order to make money from advertising, search engines need boatloads of traffic every second, 24/7/365. Since there are only so many searchers populating this planet, search engines rely on recurring traffic. Sounds like a pretty good reason to provide relevant search results, doesn’t it?

That’s why search engines develop high sophisticated algorithms that try to emulate human surfers, supposed to extract the most useful content from the Web’s vast litter boxes. Their engineers tweak those algos on a daily basis, factoring in judgements of more and more signals as communities, services, opportunities, behavior, and techniques used on the Web evolve.

They try really hard to provide their users with the cream of the crop. However, SE engineers are just humans like you and me, therefore their awesome algos –as well as the underlying concepts– can fail. So don’t analyze too many SERPs to figure out how a SE engineer might tick, just be pragmatic and imagine you’ve got enough SE shares to temporarily throw away your Internet marketer hat.

Anytime when you have to make a desicion on content, design, navigation or whatever, switch to search engine engineer mode and ask yourself questions like “does this enhance the visitor’s surfing experience?”, “would I as a user appreciate this?” or simply “would I buy this?”. (Of course, instead of emulating an imaginary SE engineer, you also could switch to plain user mode. The downside of the latter brain emulation is, unfortunately, that especially geeks tend to trust the former more easily.)

Bear in mind that even if search engines don’t cover particular (optimization) techniques today, they might do so tomorrow. The same goes for newish forms of content presentation etc. Eventually search engines will find a way to work with any signal. Most of our neat little tricks bypassing today’s spam filters will stop working some day.

After completing this sanity check, heal your schizophrenia and evaluate whether it will make you money or not. Eventually, that’s the goal.

By the way, the above said doean’t mean that only so-called ‘purely white hat’ traffic optimization techniques work with search engines. Actually, SEO = hex’53454F’, and that’s a pretty dark gray. :-)

Related thoughts: Optimize for user experience, but keep the engines in mind. Misleading some folks just like this pamphlet.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Debugging robots.txt with Google Webmaster Tools

Although Google’s Webmaster Console is a really neat toolkit, it can mislead the not-that-savvy crowd every once in a while.

When you go to “Diagnostics::Crawl Errors::Restricted by robots.txt” and you find URIs that aren’t disallow’ed or even noindex’ed in your very own robots.txt, calm down.

Google’s cool robots.txt validator withdraws its knowledge of redirects and approves your redirecting URIs, driving you nuts until you check each URI’s HTTP response code for redirects (HTTP response codes 301, 302 and 307, as well as undelayed meta refreshs).

Google obeys robots.txt even in a chain of redirects. If for Google’s user agent(s) an URI given in an HTTP header’s location is disallow’ed or noindex’ed, Googlebot doesn’t fetch it, regardless the position in the current chain of redirects. Even a robots.txt block in the 5th hop stops the greedy Web robot. Those URIs are correctly reported back as “restricted by robots.txt”, Google just refuses to tell you that the blocking crawler directive origins from a foreign server.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Getting new stuff crawled in real-time

Thanks to PubSubHubbub we’ve got another method to invite Googlebot.

Just add a link to FriendFeed, for example by twittering or bookmarking it. Once the URI hits your FriendFeed account, FriendFeed shares it with Googlebot, and both of them request it before the tweet appears on your timeline:

Date/Time Request URI IP User agent
2009–08–17 15:47:19 Your URI 38.99.68.206 / 38.99.68.206 Mozilla/5.0 (compatible; FriendFeedBot/0.1; +Http://friendfeed.com/about/bot)
2009-08-17 15:47:19 Your robots.txt 66.249.71.141 / crawl-66-249-71-141.googlebot.com Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2009-08-17 15:47:19 Your URI 66.249.71.141 / crawl-66-249-71-141.googlebot.com Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

 

Neat. Of course there are other methods to invite search engine crawlers in lightning speed, but this one comes handy because it can run on autopilot […]. Even when the actual fetch is meant to feed Google Reader or something, you’ve got your stuff into Google’s crawling cache from where other services like the Web indexer can pick it without requesting a crawl.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

How to handle a machine-readable pandemic that search engines cannot control

R.I.P. rel-nofollowWhen you’re familiar with my various rants on the ever morphing rel-nofollow microformat infectious link disease, don’t read further. This post is not polemic, ironic, insulting, or otherwise meant to entertain you. I’m just raving about a way to delay the downfall of the InterWeb.

Lets recap: The World Wide Web is based on hyperlinks. Hyperlinks are supposed to lead humans to interesting stuff they want to consume. This simple and therefore brilliant concept worked great for years. The Internet grew up, bubbled a bit, but eventually it gained world domination. Internet traffic was counted, sold, bartered, purchased, and even exchanged for free in units called “hits”. (A “hit” means one human surfer landing on a sales pitch. That is a popup hell designed in a way that somebody involved just has to make a sale).

Then in the past century two smart guys discovered that links scraped from Web pages can be misused to provide humans with very accurate search results. They even created a new currency on the Web, and quickly assigned their price tags to Web pages. Naturally, folks began to trade green pixels instead of traffic. After a short while the Internet voluntarily transferred it’s world domination to the company founded by those two smart guys from Stanford.

Of course the huge amount of green pixel trades made the search results based on link popularity somewhat useless, because the webmasters gathering the most incoming links got the top 10 positions on the search result pages (SERPs). Search engines claimed that a few webmasters cheated on their way to the first SERPs, although lawyers say there’s no evidence of any illegal activities related to search engine optimization (SEO).

However, after suffering from heavy attacks from a whiny blogger, the Web’s dominating search engine got somewhat upset and required that all webmasters have to assign a machine-readable tag (link condom) to links sneakily inserted into their Web pages by other webmasters. “Sneakily inserted links” meant references to authors as well as links embedded in content supplied by users. All blogging platforms, CMS vendors and alike implemented the link condom, eliminating presumably 5.00% of the Web’s linkage at this time.

A couple of months later the world dominating search engine demanded that webmasters have to condomize their banner ads, intercompany linkage and other commercial links, as well as all hyperlinked references that do not count as pure academic citation (aka editorial links). The whole InterWeb complied, since this company controlled nearly all the free traffic available from Web search, as well as the Web’s purchasable traffic streams.

Roughly 3.00% of the Web’s links were condomized, as the search giant spotted that their users (searchers) missed out on lots and lots of valuable contents covered by link condoms. Ooops. Kinda dilemma. Taking back the link condom requirements was no option, because this would have flooded the search index with billions of unwanted links empowering commercial content to rank above boring academic stuff.

So the handling of link condoms in the search engine’s crawling engine as well as in it’s ranking algorithm was changed silently. Without telling anybody outside their campus, some condomized links gained power, whilst others were kept impotent. In fact they’ve developed a method to judge each and every link on the whole Web without a little help from their friends link condoms. In other words, the link condom became obsolete.

Of course that’s what they should have done in the first place, without asking the world’s webmasters for gazillions of free-of-charge man years producing shitloads of useless code bloat. Unfortunately, they didn’t have the balls to stand up and admit “sorry folks, we’ve failed miserably, link condoms are history”. Therefore the Web community still has to bother with an obsolete microformat. And if they –the link comdoms– are not dead, then they live today. In your markup. Hurting your rankings.

If you, dear reader, are a Googler, then please don’t feel too annoyed. You may have thought that you didn’t do evil, but the above said reflects what webmasters outside the ‘Plex got from your actions. Don’t ignore it, please think about it from our point of view. Thanks.

Still here and attentive? Great. Now lets talk about scenarios in WebDev where you still can’t avoid rel-nofollow. If there are any — We’ll see.

PageRank™ sculpting

Dude, PageRank™ sculpting with rel-nofollow doesn’t work for the average webmaster. It might even fail when applied as high sophisticated SEO tactic. So don’t even think about it. Simply remove the rel=nofollow from links to your TOS, imprint, and contact page. Cloak away your links to signup pages, login pages, shopping carts and stuff like that.

Link monkey business

I leave this paragraph empty, because when you know what you do, you don’t need advice.

Affiliate links

There’s no point in serving A elements to Googlebot at all. If you haven’t cloaked your aff links yet, go see a SEO doctor.

Advanced SEO purposes

See above.

So what’s left? User generated content. Lets concentrate our extremely superfluous condomizing efforts on the one and only occasion that might allow to apply rel-nofollow to a hyperlink on request of a major search engine, if there’s any good reason to paint shit brown at all.

Blogging

If you link out in a blog post, then you vouch for the link’s destination. In case you disagree with the link destination’s content, just put the link as

<strong class="blue_underlined" title="http://myworstenemy.org/" onclick="window.location=this.title;">My Worst Enemy</strong>

or so. The surfer can click the link and lands at the estimated URI, but search engines don’t pass reputation. Also, they don’t evaporate link juice, because they don’t interpret the markup as hyperlink.

Blog comments

My rule of thumb is: Moderate, DoFollow quality, DoDelete crap. Install a conditional do-follow plug-in, set everything on moderation, use captchas or something similar, then let the comment’s link juice flow. You can maintain a white list that allows instant appearance of comments from your buddies.

Forums, guestbooks and unmoderated stuff like that

Separate all Web site areas that handle user generated content. Serve “index,nofollow” meta tags or x-robots-headers for all those pages, and link them from a site map or so. If you gather index-worthy content from users, then feed crawlers the content in a parallel –crawlable– structure, without submit buttons, perhaps with links from trusted users, and redirect human visitors to the interactive pages. Vice versa redirect crawlers requesting live pages to the spider fodder. All those redirects go with a 301 HTTP response code.

If you lack the technical skills to accomplish that, then edit your /robots.txt file as follows:

User-agent: Googlebot
# Dear Googlebot, drop me a line when you can handle forum pages
# w/o rel-nofollow crap. Then I'll allow crawling.
# Treat that as conditional disallow:
Disallow: /forum

As soon as Google can handle your user generated content naturally, they might send you a message in their Webmaster console.

Anything else

Judge yourself. Most probably you’ll find a way to avoid rel-nofollow.

Conclusion

Absolutely nobody needs the rel-nofollow microformat. Not even search engines for the sake of their index. Hence webmasters as well as search engines can stop wasting resources. Farewell rel="nofollow", rest in peace. We won’t miss you.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Vaporize yourself before Google burns your linking power

PIC-1: Google PageRank(tm) 2007I couldn’t care less about PageRank™ sculpting, because a well thought out link architecture does the job with all search engines, not just Google. That’s where Google is right on the money.

They own PageRank™, hence they can burn, evaporate, nillify, and even divide by zero or multiply by -1 as much PageRank™ as they like; of course as long as they rank my stuff nicely above my competitors.

Picture 1 shows Google’s PageRank™ factory as of 2007 or so. Actually, it’s a pretty simplified model, but since they’ve changed the PageRank™ algo anyway, you don’t need to bother with all the geeky details.

As a side note: you might ask why I don’t link to Matt Cutts and Danny Sullivan discussing the whole mess on their blogs? Well, probably Matt can’t afford my advertising rates, and the whole SEO industry has linked to Danny anyway. If you’re nosy, check out my source code to learn more about state of the art linkage very compliant to Google’s newest guidelines for advanced SEOs (summary: “Don’t trust underlined blue text on Web pages any longer!”).

PIC-2: Google PageRank(tm) 2009What really matters is picture 2, revealing Google’s new PageRank™ facilities, silently launched in 2008. Again, geeky details are of minor interest. If you really want to know everything, then search for [operation bendover] at !Yahoo (it’s still top secret, and therefore not searchable at Google).

Unfortunately, advanced SEO folks (whatever that means, I use this term just because it seems to be an essential property assigned to the participants of the current PageRank™ uprising discussion) always try to confuse you with overcomplicated graphics and formulas when it comes to PageRank™. Instead, I ask you to focus on the (important) hard core stuff. So go grab a magnifier, and work out the differences:

  • PageRank™ 2009 in comparision to PageRank™ 2007 comes with a pipeline supplying unlimited fuel. Also, it seems they’ve implemented the green new deal, switching from gas to natural gas. That means they can vaporize way more link juice than ever before.
  • PageRank™ 2009 produces more steam, and the clouds look slightly different. Whilst PageRank™ 2007 ignored nofollow crap as well as links put with client sided scripting, PageRank™ 2009 evaporates not only juice covered with link condoms, but also tons of other permutations of the standard A element.
  • To compensate the huge overall loss of PageRank™ caused by those changes, Google has decided to pass link juice from condomized links to their target URI hidden to Googlebot with JavaScript. Of course Google formerly has recommended the use of JavaScript-links to prevent the webmasters from penalties for so-called “questionable” outgoing links. Just as they’ve not only invented rel-nofollow, but heavily recommended the use of this microformat with all links disliked by Google, and now they take that back as if a gazillion links on the Web could magically change just because Google tweeks their algos. Doh! I really hope that the WebSpam-team checks the age of such links before they penalize everything implemented according to their guidelines before mid-2009 or the InterWeb’s downfall, whatever comes last.

I guess in the meantime you’ve figured out that I’m somewhat pissed. Not that the secretly changed flow of PageRank™ a year ago in 2008 had any impact on my rankings, or SERP traffic. I’ve always designed my stuff with PageRank™ flow in mind, but without any misuses of rel=”nofollow”, so I’m still fine with Google.

What I can’t stand is when a search engine tries to tell me how I’ve to link (out). Google engineers are really smart folks, they’re perfectly able to develop a PageRank™ algo that can decide how much Google-juice a particular link should pass. So dear Googlers, please –WRT to the implementation of hyperlinks– leave us webmasters alone, dump the rel-nofollow crap and rank our stuff in the best interest of your searchers. No longer bother us with linking guidelines that change yearly. It’s not our job nor responsibility to act as your cannon fodder slavish code monkeys when you spot a loophole in your ranking- or spam-detection-algos.

Of course the above said is based on common sense, so Google won’t listen (remember: I’m really upset, hence polemic statements are absolutely appropriate). To prevent webmasters from irrational actions by misleaded search engines, I hereby introduce the

Webmaster guidelines for search engine friendly links

What follows is pseudo-code, implement it with your preferred server sided scripting language.

if (getAttribute($link, 'rel') matches '*nofollow*' &&
    $userAgent matches '*Googlebot*') {
    print '<strong rev="' + getAttribute(link, 'href') + '"'
    + ' style="color:blue; text-decoration:underlined;"'
    + ' onmousedown="window.location=document.getElementById(this.id).rev; "'
    + '>' + getAnchorText($link) + '</strong>';
}
else {
    print $link;
}

Probably it’s a good idea to snip both the onmousedown trigger code as well as the rev attribute, when the script gets executed by Googlebot. Just because today Google states that they’re going to pass link juice to URIs grabbed from the onclick trigger, that doesn’t mean they’ll never look at the onmousedown event or misused (X)HTML attributes.

This way you can deliver Googlebot exactly the same stuff that the punter surfer gets. You’re perfectly compliant to Google’s cloaking restrictions. There’s no need to bother with complicated stuff like iFrames or even disabled blog comments, forums or guestbooks.

Just feed the crawlers with all the crap the search engines require, then concentrate all your efforts on your UI for human vistors. Web robots (bots, crawlers, spiders, …) don’t supply your signup-forms w/ credit card details. Humans do. If you find the time to upsell them while search engines keep you busy with thoughtless change requests all day long.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Dump your self-banning CMS

CMS developer's output: unusable dogshitWhen it comes to cluelessness [silliness, idiocy, stupidity … you name it], you can’t beat CMS developers. You really can’t. There’s literally no way to kill search engine traffic that the average content management system (CMS) developer doesn’t implement. Poor publishers, probably you suffer from the top 10 issues on my shitlist. Sigh.

Imagine you’re the proud owner of a Web site that enables logged-in users customizing the look & feel and whatnot. Here’s how your CMS does the trick:

Unusable user interface

The user control panel offers a gazillion of settings that can overwrite each and every CSS property out there. To keep the user-cp pages lean and fast loading, the properties are spread over 550 pages with 10 attributes each, all with very comfortable Previous|Next-Page navigation. Even when the user has choosen a predefined template, the CMS saves each property in the user table. Of course that’s necessary because the site admin could change a template in use.

Amateurish database design

Not only for this purpose each user tuple comes with 512 mandatory attributes. Unfortunately, the underlaying database doesn’t handle tables with more than 512 columns, so the overflow gets stored in an array, using the large text column #512.

Cookie hell

Since every database access is expensive, the login procedure creates a persistent cookie (today + 365 * 30) for each user property. Dynamic and user specific external CSS files as well as style-sheets served in the HEAD section could fail to apply, so all CMS scripts use a routine that converts the user settings into inline style directives like style="color:red; text-align:bolder; text-decoration:none; ...". The developer consults the W3C CSS guidelines to make sure that not a single CSS property is left out.

Excessive query strings

Actually, not all user agents handle cookies properly. Especially cached pages clicked from SERPs load with a rather weird design. The same goes for standard compliant browsers. Seems to depend on the user agent string, so the developer adds a if ($well_behaving_user_agent_string <> $HTTP_USER_AGENT) then [read the user record and add each property as GET variable to the URI’s querystring]) sanity check. Of course the $well_behaving_user_agent_string variable gets populated with a constant containing the developer’s ancient IE user agent, and the GET inputs overwrite the values gathered from cookies.

Even more sanitizing

Some unhappy campers still claim that the CMS ignores some user properties, so the developer adds a routine that reads the user table and populates all variables that previously were filled from GET inputs overwriting cookie inputs. All clients are happy now.

Covering robots

“Cached copy” links from SERPs still produce weird pages. The developer stumbles upon my blog and adds crawler detection. S/he creates a tuple for each known search engine crawler in the user table of her/his local database and codes if ($isSpider) then [select * from user where user.usrName = $spiderName, populating the current script's CSS property variables from the requesting crawler's user settings]. Testing the rendering with a user agent faker gives fine results: bug fixed. To make sure that all user agents get a nice page, the developer sets the output default to “printer”, which produces a printable page ignoring all user settings that assign style="display:none;" to superfluous HTML elements.

Results

Users are happy, they don’t spot the code bloat. But search engine crawlers do. They sneakily request a few pages as a crawler, and as a browser. Comparing the results they find the “poor” pages delivered to the feigned browser way too different from the “rich” pages serving as crawler fodder. The domain gets banned for poor-man’s-cloaking (as if cloaking in general could be a bad thing, but that’s a completely different story). The publisher spots decreasing search engine traffic and wonders why. No help avail from the CMS vendor. Must be unintentionally deceptive SEO copywritig or so. Crap. That’s self-banning by software design.

Ok, before you read on: get a calming tune.

How can I detect a shitty CMS?

Well, you can’t, at least not as a non-geeky publisher. Not really. Of course you can check the “cached copy” links from your SERPs all night long. If they show way too different results compared to your browser’s rendering you’re at risk. You can look at your browser’s address bar to check your URIs for query strings with overlength, and if you can’t find the end of the URI perhaps you’re toast, search engine wise. You can download tools to check a page’s cookies, then if there are more than 50 you’re potentially search-engine-dead. Probably you can’t do a code review yourself coz you can’t read source code natively, and your CMS vendor has delivered spaghetti code. Also, as a publisher, you can’t tell whether your crappy rankings depend on shitty code or on your skills as as a copywriter. When you ask your CMS vendor, usually the search engine algo is faulty (especially Google, Yahoo, Microsoft and Ask) but some exotic search engine from Togo or so sets the standards for state of the art search engine technology.

Last but not least, as a non-search-geek challenged by Web development techniques you won’t recognize most of the laughable –but very common– mistakes outlined above. Actually, most savvy developers will not be able to create a complete shitlist from my scenario. Also, there a tons of other common CMS issues that do resolve in different crawlability issues - each as bad as this one, or even worse.

Now what can you do? Well, my best advice is: don’t click on Google ads titled “CMS”, and don’t look at prices. The cheapest CMS will cost you the most at the end of the day. And if your budget exceeds a grand or two, then please hire an experienced search engine optimizer (SEO) or search savvy Web developer before you implement a CMS.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

« Previous Page  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13  Next Page »