Crap

Archived posts from the 'Crap' Category

Hard facts about URI spam

Posted on 1 December, 2009

I stole this pamphlet’s title (and more) from Google’s post Hard facts about comment spam for a reason. In fact, Google spams the Web with useless clutter, too. You doubt it? Read on. That’s the URI from the link above:

http://googlewebmastercentral.blogspot.com/2009/11/hard-facts-about-comment-spam.html?utm_source=feedburner&utm_medium=feed &utm_campaign=Feed%3A+blogspot%2FamDG+%28Official+Google+Webmaster+Central+Blog%29

GA Kraken I’ve bolded the canonical URI, everything after the questionmark is clutter added by Google.

When your Google account lists both Feedburner and GoogleAnalytics as active services, Google will automatically screw your URIs when somebody clicks a link to your site in a feed reader (you can opt out, see below).

Why is it bad?

FACT: Google’s method to track traffic from feeds to URIs creates new URIs. And lots of them. Depending on the number of possible values for each query string variable (utm_source utm_medium utm_campaign utm_content utm_term) the amount of cluttered URIs pointing to the same piece of content can sum up to dozens or more.

FACT: Bloggers (publishers, authors, anybody) naturally copy those cluttered URIs to paste them into their posts. The same goes for user link drops at Twitter and elsewhere. These links get crawled and indexed. Currently Google’s search index is flooded with 28,900,000 cluttered URIs mostly originating from copy+paste links. Bing and Yahoo didn’t index GA tracking parameters yet.

That’s 29 million URIs with tracking variables that point to duplicate content as of today. With every link copied from a feed reader, this number will increase. Matt Cutts said “I don’t think utm will cause dupe issues” and points to John Müller’s helpful advice (methods a site owner can apply to tidy up Google’s mess).

Maybe Google can handle this growing duplicate content chaos in their very own search index. Lets forget that Google is the search engine that advocated URI canonicalization for ages, invented sitemaps, rel=canonical, and countless high sophisticated algos to merge indexed clutter under the canonical URI. It’s all water under the bridge now that Google is in the create-multiple-URIs-pointing-to-the-same-piece-of-content business itself.

So far that’s just disappointing. To understand why it’s downright evil, lets look at the implications from a technical point of view.

Spamming URIs with utm tracking variables breaks lots of things

Look at this URI: http://www.example.com/search.aspx?Query=musical+mobile?utm_source=Referral&utm_medium=Internet&utm_campaign=celebritybabies

Google added a query string to a query string. Two URI segment delimiters (“?”) can cause all sorts of troubles at the landing page.

Some scripts will process only variables from Google’s query string, because they extract GET input from the URI’s last questionmark to the fragment delimiter “#” or end of URI; some scripts expecting input variables in a particular sequence will be confused at least; some scripts might even use the same variable names … the number of possible errors caused by amateurish extended query strings is infinite. Even if there’s only one “?” delimiter in the URI.

In some cases the page the user gets faced with will lack the expected content, or will display a prominent error message like 404, or will consist of white space only because the underlying script failed so badly that the Web server couldn’t even show a 5xx error.

Regardless whether a landing page can handle query string parameters added to the original URI or not (most can), changing someone’s URI for tracking purposes is plain evil, IMHO, when implemented as opt-out instead of opt-in.

Appended UTM query strings can make trackbacks vanish, too. When a blog checks whether the trackback URI is carrying a link to the blog or not, for example with this plug-in, the comparision can fail and the trackback gets deleted on arrival, without notice. If I’d dig a little deeper, most probably I could compile a huge list of other functionalities on the Internet that are broken by Google’s UTM clutter.

Finally, GoogleAnalytics is not the one and only stats tool out there, and it doesn’t fulfil all needs. Many webmasters rely on simple server reports, for example referrer stats or tools like awstats, for various technical purposes. Broken. Specialized content management tools feeded by real-time traffic data. Broken. Countless tools for linkpop analysis group inbound links by landing page URI. Broken. URI canonicalization routines. Broken, respecively now acting counterproductive with regard to GA reporting. Google’s UTM clutter has impact on lots of tools that make sense in addition to Google Analytics. All broken.

What a glorious mess. Frankly, I’m somewhat puzzled. Google has hired tens of thousands of this planet’s brightest minds -I really mean that, literally!-, and they came out with half-assed crap like that? Un-fucking-believable.

What can I do to avoid URI spam on my site?

Boycott Google’s poor man’s approach to link feed traffic data to Web analytics. Go to Feedburner. For each of your feeds click on “Configure stats” and uncheck “Track clicks as a traffic source in Google Analytics”. Done. Wait for a suitable solution.

If you really can’t live with traffic sources gathered from a somewhat unreliable HTTP_REFERER, and you’ve deep pockets, then hire a WebDev crew to revamp all your affected code. Coward!

As a matter of fact, Google is responsible for this royal pain in the ass. Don’t fix Google’s errors on your site. Let Google do the fault recovery. They own the root of all UTM evil, so they have to fix it. There’s absolutely no reason why a gazillion of webmasters and developers should do Google’s job, again and again.

What can Google do?

Well, that’s quite simple. Instead of adding utterly useless crap to URIs found in feeds, Google can make use of a clever redirect script. When Feedburner serves feed items to anybody, the values of all GA tracking variables are available.

Instead of adding clutter to these URIs, Feedburner could replace them with a script URI that stores the timestamp, the user’s IP addy, and whatnot, then performs a 301 redirect to the canonical URI. The GA script invoked on the landing page can access and process these data quite accurately.

Perhaps this procedure would be even more accurate, because link drops can no longer mimick feed traffic.

Speak out!

So, if you don’t approve that Feedburner, GoogleReader, AdSense4Feeds, and GoogleAnalytics gang rape your well designed URIs, then link out to everything Google with a descriptive query string, like:

?utm_source=sebastian&utm_medium=pamphlet&utm_campaign=thou+shalt+not+fuck+with+my+uris

I mean, nicely designed canonical URIs should be the search engineer’s porn, so perhaps somebody at Google will listen. Will ya?

Update:

I’ve just added a “UTM Killer” tool, where you can enter a screwed URI and get a clean URI — all ‘utm_’ crap and multiple ‘?’ delimiters removed — in return. That’ll help when you copy URIs from your feedreader to use them in your blog posts.

By the way, please vote up this pamphlet so that I get the 2010 SEMMY Award. Thanks in advance!

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

19 comments Sebastian | Search Quality, Duplicate Content, Analytics, Internet Marketing, Webspam, Spam, SEO, Crap, Copy+Paste-Penalties, AdSense, Google

The most sexy browsers screw your analytics

Posted on 12 November, 2009

Chrome and Safari fuck with the HTTP_REFERER Now that IE is quite unusable due to the lack of websites that support its non-standard rendering, and the current FireFox version suffers from various maladies, more and more users switch to browsers that are supposed to comply to Web standards, such as Chrome, Safari, or Opera.

Those sexy user agents execute client sided scripts in lightning speed, making surfers addicted to nifty rounded corners very very happy. Of course they come with massive memory leaks, but surfers who shut down their browser every once in a while won’t notice such geeky details.

Why is that bad news for Internet marketers? Because Chrome and Safari screw your analytics. Your stats are useless with regard to bookmarkers and type-in traffic. Your referrer stats lack all hits from Chrome/Safari users who have opened your landing page in a new tab or window.

Google’s Chrome and Apple’s Safari do not provide an HTTP_REFERER. (The typo is standardized, too.)

This bug was reported in September 2008. It’s not yet fixed. Not even in beta versions.

Guess from which (optional) HTTP header line your preferred stats tool compiles the search terms to create all the cool keyword statistics? Yup, that’s the HTTP_REFERER’s query string when the visitor came from a search result page (SERP). Especially on SERPs many users open links in new tabs. That means with every searcher switching to a sexy browser your keyword analysis becomes more useless.

That’s not only an analytics issue. Many sites provide sensible functionality based on the referrer (the Web page a user came from), for example default search terms for site-search facilities gathered from SERP-referrers. Many sites evaluate the HTTP_REFERER to prevent themselves from hotlinking, so their users can’t view the content they’ve paid for when they open a link in a new tab or window.

Passing a blank HTTP_REFERER when this information is available to the user agent is plain evil. Of course lots of so-called Internet security apps do this by default, but just because others do evil that doesn’t mean a top-notch Web browser like Safari or Chrome can get away with crap like this for months and years to come.

Please nudge the developers!

Here you go. Post in this thread why you want them to fix this bug asap. Tell the developers that you can’t live with screwed analytics, and that your site’s users rely on reliable HTTP_REFERERs. Even if you don’t run a website yourself, tell them that your favorite porn site bothers you with countless error messages instead of delivering smut, just because WebKit browsers are buggy.

You can test whether your browser passes the HTTP_REFERER or not: Go to this Google SERP. On the link to this post chose “Open link in new tab” (or window) in the context menu (right click over the link). Scroll down.

Your browser passed this HTTP_REFERER: http://sebastians-pamphlets.com/category/crap/

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

27 comments Sebastian | Internet Marketing, Usability, Analytics, Crap

Derek Powazek outed himself big-mouthed and ignorant, and why that’s a pity

Posted on 16 October, 2009

With childish attacks on his colleagues, Derek Powazek didn’t do professional Web development -as an industry- a favor. As a matter of fact, Derek Powazek insulted savvy Web developers, Web designers, even search engine staff, as well as useability experts and search engine specialists, who team-up in countless projects helping small and large Web sites succeed.

I seriously can’t understand how Derek Powazek “has survived 13 years in the web biz” (source) without detailled knowledge of how things get done in Web projects. I mean, if a developer really has worked 13 years in the Web biz, he should know that the task of optimizing a Web site’s findability, crawlability, and accessibility for all user agents out there (SEO) is usually not performed by “spammers evildoers and opportunists”, but by highly professional experts who just master Web development better than the average designer, copy-writer, publisher, developer or marketing guy.

Boy, what an ego. Derek Powazek truly believes that if “[all SEO/SEM techniques are] not obvious to you, and you make websites, you need to get informed” (source). That translates to “if you aren’t 100% perfect in all aspects of Web development and Internet marketing, don’t bother making Web sites — go get a life”.

Well, I consider very few folks capable of mastering everything in Web development and Internet marketing. Clearly, Derek Powazek is not a member of this elite. With one clueless, uninformed and way too offensive rant he has ruined his reputation in a single day. Shortly after his first thoughtless blog post libelling fellow developers and consultants, Google’s search result page for [Derek Powazek] is flooded with reasonable reactions revealing that Derek Powazek’s pathetic calls for ego food are factually wrong.

Of course calm and knowledgable experts in the field setting the records straight, like Danny Sullivan (search result #1 and #4 for [Derek Powazek] today) and Peter da Vanzo (SERP position #9), can outrank a widely unknown guy like Derek Powazek at all major search engines. Now, for the rest of his presence on this planet, Derek Powazek has to live with search results that tell the world what kind of an “expert” he really is (example …).

He should have read Susan Moskwa’s very informative article about reputation management on Google’s Blog a day earlier. Not that reputation management doesn’t count as an SEO skill … actually, that’s SEO basics (as well as URI canonicalization).

Dear Derek Powazek, guess what all the bright folks you’ve bashed so cleverly will do when you ask them to take down their responses to your uncalled-for dirty talk?

So what can we learn from this gratuitous debacle? Do not piss in someone’s roses when

you suffer from an oversized ego,
you’ve not the slightest clue what you’re talking about,
you can’t make a point with proven facts, so you’ve to use false pretences and clueless assumptions,
you tend to insult people when you’re out of valid arguments,
willy whacking is not for you, because your dick is, well, somewhat undersized.

Ok, it’s Friday evening, so I’m supposed to enjoy TGIF’s. Why the fuck am I wasting my valuable spare time writing this pamphlet? Here’s why:

Having worked in, led, and coached WebDev teams on crawlability and best practices with regard to search engine crawling and indexing for ages now, I was faced with brain amputated wannabe geniuses more than once. Such assclowns are able to shipwreck great projects. From my experience the one and only way to keep teams sane and productive is sacking troublemakers at the moment you realize they’re unconvinceable. This Powazek dude has perfectly proven that his ignorance is persistent, and that his anti-social attitude is irreversible. He’s the prime example of a guy I’d never hire (except if I’d work for my worst enemy). Go figure.

Update 2009-10-19: I consider this a lame excuse. Actually, it’s even more pathetic than the malicious slamming of many good folks in his previous posts. If Derek Powazek really didn’t know what “SEO” means in the first place, his brain farts attacking something he didn’t understand at the time of publishing his rants are indefensible, provided he was anything south of sane then. Danny Sullivan doesn’t agree, and he’s right when he says that every industry has some black sheep, but as much as I dislike comment spammers, I dislike bullshit and baseness.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

7 comments Sebastian | Web development, Crap, SEO

Full disclosure @ FTC

Posted on 12 October, 2009

Trying to avoid an $11,000 fine in the Federal Trade Commission’s war on bloggers:

When I ~~write about~~ praise search engines, that’s totally paid-for because I’ve received free search results upfront.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

1 comment Sebastian | MSN, Technorati, Blogging, Twitter, Yahoo, Crap, Google

Search engines should make shortened URIs somewhat persistent

Posted on 7 October, 2009

URI shorteners are crap. Each and every shortened URI expresses a design flaw. All -or at least most- public URI shorteners will shut down sooner or later, because shortened URIs are hard to monetize. Making use of 3rd party URI shorteners translates to “put traffic at risk”. Not to speak of link love (PageRank, Google juice, link popularity) lost forever.

SEs could rescue tiny URLs Search engines could provide a way out of the sURL dilemma that Twitter & Co created with their crappy, thoughtless and shortsighted software designs. Here’s how:

Most browsers support search queries in the address bar, as well as suggestions (aka search results) on DNS errors, and sometimes even 404s or other HTTP response codes other than 200/3x. That means browsers “ask a search engine” when an HTTP request fails.

When a TLD is out of service, search engines could have crawled a 301 or meta refresh from a page formerly living on a .yu domain for example. They know the new address and can lead the user to this (working) URI.

The same goes for shortened URIs created ages ago by URI shortening services that died in the meantime. Search engines have transferred all the link juice from the shortened URI to the destination page already, so why not point users that request a dead short URI to the right destination?

Search engines have all the data required for rescuing short URIs that are out of service in their datebases. Not de-indexing “outdated” URIs belonging to URI shorteners would be a minor tweak. At least Google has stored attributes and behavior of all links on the Web since the past century, and most probably other search engines are operated by data rats too.

URI shorteners can be identified by simple patterns. They gather tons of inbound links from foreign domains that get redirected (not always using a 301!) to URIs on other 3rd party domains. Of course that applies to some AdServers too, but rest assured search engines do know the differences.

So why the heck didn’t Google, ~~Yahoo/MSN~~ Bing, and Ask offer such a service yet? I thought it’s all about users, but I might have misread something. Sigh.

By the way, I’ve recorded search engine misbehavior with regard to shortened URIs that could arouse Jack The Ripper, but that’s a completely other story.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

10 comments Sebastian | Social Web, Redirects, MSN, URI shortening, Twitter, Risky Linkage, Yahoo, SEO, Crap, Google

The “just create compelling and useful content” lie

Posted on 25 September, 2009

The compelling content lie I’m so sick of the universal answer to all SEO questions. Each and every search engine rep keeps telling me that “creating a great and useful site with compelling content will gain me all the rankings I deserve”. What a pile of bullshit. Nothing ranks without strong links. I deserve more, and certainly involving less work.

Honestly, why the heck should I invest any amount of time and even money to make others happy? It’s totally absurd to put up a great site with compelling content that’s easy to navigate and all that, just to please an ungrateful crowd of anonymous users! What a crappy concept. I don’t buy it.

I create websites for the sole purpose of making money, and lots of green. It’s all about ME! Here I said it. Now pull out the plastic.

And there’s another statement that really annoys me: “make sites for users, not for search engines”. Again, with self-serving commandments like this one search engine quality guidelines do insult my intelligence. Why should I make even a single Web page for search engines? Google, ~~Yahoo~~ Microsoft and Ask staff might all be wealthy folks, but from my experience they don’t purchase much porn on-line.

I create and publish Web contents to laugh all the way to the bank at the end of the day. For no other reason. I swear. To finally end the ridiculous discussion of utterly useless and totally misleading search engine guidelines, I’ll guide you step by step through a few elements of a successful website, explaining why and for whom I perform whatever, why it’s totally selfish, and what it’s worth.

In some cases that’ll be quite a bit geeky, so just skip the technical stuff if you’re a plain Internet marketer. Also, I don’t do everything by the book on Web development, so please read the invisible fine print carefully .

robots.txt

This gatekeeper prevents my sites from useless bot traffic. That goes for behaving bots at least. Others might meet a script that handles them with care. I’d rather serve human users bigger ads than wasting bandwidth for useless bots requesting shitloads of content.

.htaccess

I’m a big fan of the “one URI = one piece of content” principle. I consider gazillions of URI variations serving similar contents avoidable crap. That’s because I can’t remember complex URIs stuffed with tracking parameters and other superfluous clutter. Like the average bookmarking surfer, I prefer short and meaningful URIs. With a few simple .htaccess directives I make sure that everybody gets the requested ~~piece of content~~ advertising under the canonical URI.

ErrorDocuments

Error handling is important. Before I throw a 404-Not-found error to a human visitor, I analyze the request’s context (e.g. REQUEST_URI, HTTP_REFERER). If possible, I redirect the user to the page s/he should have requested then, or at least to a page with related ~~links~~ banner ads. Bouncing punters don’t make me any money.

HTTP headers

There’s more than HTTP response codes doable with creative headers. For example cost savings. In some cases a single line of text in an HTTP header tells the user agent more than a bunch of markup.

Sensible page titles and summaries

There’s nothing better to instantly catch the user’s interest than a prominent page title, followed by a short and nicely formatted summary that the average reader can skim in few seconds. Fast loading and eye-catching graphics can do wonders, too. Just in case someone’s ~~scraping~~ bookmarking my stuff, I duplicate my titles into usually invisible TITLE elements, and provide seducing calls for action in descriptive META elements. Keeping the visitor interested for more than a few seconds results in monetary opportunities.

Unique content

Writing unique, compelling, and maybe even witty product descriptions increases my sales. Those are even more attractive when I add neat stuff that’s not available from the vendor’s data feed (I don’t mean free shipping, that’s plain silly). A good (linkworthy) product page comes with practical use cases and/or otherwise well presented, not too longish outlined, USPs. Producers as well as distributors do suck at this task.

User generated content

Besides faked testimonials and the usual stuff, asking vistors questions like “Just in case you buy product X here, what will you actually do with it? How will you use it? Whom will you give it to?” leads to unique text snippets creating needs. Of course all user generated content gets moderated.

Ajax’ed “Buy now” widgets

Most probably a punter who has clicked the “add to shopping cart” link on a page nicely gathering quite a few products will not buy another one of them, if the mouse click invokes a POST request of the shopping cart script requiring a full round trip to the server. Out of sight, out of mind.

Sitemaps

Both static as well as dynamically themed sitemap pages funnel a fair amount of vistors to appropriate landing pages. Dumping major parts of a site’s structure in XML format attracts traffic, too. There’s no such thing as bad traffic, just weak upselling procedures.

…

Ok, that’s enough examples to bring my point home. Probably I’ve bored you to death anyway. You see, whatever I do as a site owner, I do it only for myself (inevitably accepting collateral damage like satisfied punters and compliance to search engine quality guidelines). Recap: I don’t need no stinkin’ ~~advice~~ restrictions from search engines.

Seriously, since you’re still awake and following my long-winded gobbledygook, here’s a goodie:

The Number One SEO Secret

Think like a search engine engineer. Why? Because search engines are as selfish as you are, or at least as you should be.

In order to make money from advertising, search engines need boatloads of traffic every second, 24/7/365. Since there are only so many searchers populating this planet, search engines rely on recurring traffic. Sounds like a pretty good reason to provide relevant search results, doesn’t it?

That’s why search engines develop high sophisticated algorithms that try to emulate human surfers, supposed to extract the most useful content from the Web’s vast litter boxes. Their engineers tweak those algos on a daily basis, factoring in judgements of more and more signals as communities, services, opportunities, behavior, and techniques used on the Web evolve.

They try really hard to provide their users with the cream of the crop. However, SE engineers are just humans like you and me, therefore their awesome algos -as well as the underlying concepts- can fail. So don’t analyze too many SERPs to figure out how a SE engineer might tick, just be pragmatic and imagine you’ve got enough SE shares to temporarily throw away your Internet marketer hat.

Anytime when you have to make a desicion on content, design, navigation or whatever, switch to search engine engineer mode and ask yourself questions like “does this enhance the visitor’s surfing experience?”, “would I as a user appreciate this?” or simply “would I buy this?”. (Of course, instead of emulating an imaginary SE engineer, you also could switch to plain user mode. The downside of the latter brain emulation is, unfortunately, that especially geeks tend to trust the former more easily.)

Bear in mind that even if search engines don’t cover particular (optimization) techniques today, they might do so tomorrow. The same goes for newish forms of content presentation etc. Eventually search engines will find a way to work with any signal. Most of our neat little tricks bypassing today’s spam filters will stop working some day.

After completing this sanity check, heal your schizophrenia and evaluate whether it will make you money or not. Eventually, that’s the goal.

By the way, the above said doean’t mean that only so-called ‘purely white hat’ traffic optimization techniques work with search engines. Actually, SEO = hex’53454F’, and that’s a pretty dark gray.

Related thoughts: Optimize for user experience, but keep the engines in mind. Misleading some folks just like this pamphlet.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

19 comments Sebastian | Fun, Crap, SEO

Vaporize yourself before Google burns your linking power

Posted on 16 June, 2009

PIC-1: Google PageRank(tm) 2007 I couldn’t care less about PageRank™ sculpting, because a well thought out link architecture does the job with all search engines, not just Google. That’s where Google is right on the money.

They own PageRank™, hence they can burn, evaporate, nillify, and even divide by zero or multiply by -1 as much PageRank™ as they like; of course as long as they rank my stuff nicely above my competitors.

Picture 1 shows Google’s PageRank™ factory as of 2007 or so. Actually, it’s a pretty simplified model, but since they’ve changed the PageRank™ algo anyway, you don’t need to bother with all the geeky details.

As a side note: you might ask why I don’t link to Matt Cutts and Danny Sullivan discussing the whole mess on their blogs? Well, probably Matt can’t afford my advertising rates, and the whole SEO industry has linked to Danny anyway. If you’re nosy, check out my source code to learn more about state of the art linkage very compliant to Google’s newest guidelines for advanced SEOs (summary: “Don’t trust underlined blue text on Web pages any longer!”).

PIC-2: Google PageRank(tm) 2009 What really matters is picture 2, revealing Google’s new PageRank™ facilities, silently launched in 2008. Again, geeky details are of minor interest. If you really want to know everything, then search for [operation bendover] at !Yahoo (it’s still top secret, and therefore not searchable at Google).

Unfortunately, advanced SEO folks (whatever that means, I use this term just because it seems to be an essential property assigned to the participants of the current PageRank™ ~~uprising~~ discussion) always try to confuse you with overcomplicated graphics and formulas when it comes to PageRank™. Instead, I ask you to focus on the (important) hard core stuff. So go grab a magnifier, and work out the differences:

PageRank™ 2009 in comparision to PageRank™ 2007 comes with a pipeline supplying unlimited fuel. Also, it seems they’ve implemented the green new deal, switching from gas to natural gas. That means they can vaporize way more link juice than ever before.
PageRank™ 2009 produces more steam, and the clouds look slightly different. Whilst PageRank™ 2007 ignored nofollow crap as well as links put with client sided scripting, PageRank™ 2009 evaporates not only juice covered with link condoms, but also tons of other permutations of the standard A element.
To compensate the huge overall loss of PageRank™ caused by those changes, Google has decided to pass link juice from condomized links to their target URI hidden to Googlebot with JavaScript. Of course Google formerly has recommended the use of JavaScript-links to prevent the webmasters from penalties for so-called “questionable” outgoing links. Just as they’ve not only invented rel-nofollow, but heavily recommended the use of this microformat with all links disliked by Google, and now they take that back as if a gazillion links on the Web could magically change just because Google tweeks their algos. Doh! I really hope that the WebSpam-team checks the age of such links before they penalize everything implemented according to their guidelines before mid-2009 or the InterWeb’s downfall, whatever comes last.

I guess in the meantime you’ve figured out that I’m somewhat pissed. Not that the secretly changed flow of PageRank™ a year ago in 2008 had any impact on my rankings, or SERP traffic. I’ve always designed my stuff with PageRank™ flow in mind, but without any misuses of rel=”nofollow”, so I’m still fine with Google.

What I can’t stand is when a search engine tries to tell me how I’ve to link (out). Google engineers are really smart folks, they’re perfectly able to develop a PageRank™ algo that can decide how much Google-juice a particular link should pass. So dear Googlers, please -WRT to the implementation of hyperlinks- leave us webmasters alone, dump the rel-nofollow crap and rank our stuff in the best interest of your searchers. No longer bother us with linking guidelines that change yearly. It’s not our job nor responsibility to act as your ~~cannon fodder~~ slavish code monkeys when you spot a loophole in your ranking- or spam-detection-algos.

Of course the above said is based on common sense, so Google won’t listen (remember: I’m really upset, hence polemic statements are absolutely appropriate). To prevent webmasters from irrational actions by misleaded search engines, I hereby introduce the

Webmaster guidelines for search engine friendly links

What follows is pseudo-code, implement it with your preferred server sided scripting language.

if (getAttribute($link, 'rel') matches '*nofollow*' && $userAgent matches '*Googlebot*') { print '<strong rev="' + getAttribute(link, 'href') + '"' + ' style="color:blue; text-decoration:underlined;"' + ' onmousedown="window.location=document.getElementById(this.id).rev; "' + '>' + getAnchorText($link) + '</strong>'; } else { print $link; }

Probably it’s a good idea to snip both the onmousedown trigger code as well as the rev attribute, when the script gets executed by Googlebot. Just because today Google states that they’re going to pass link juice to URIs grabbed from the onclick trigger, that doesn’t mean they’ll never look at the onmousedown event or misused (X)HTML attributes.

This way you can deliver Googlebot exactly the same stuff that the ~~punter~~ surfer gets. You’re perfectly compliant to Google’s cloaking restrictions. There’s no need to bother with complicated stuff like iFrames or even disabled blog comments, forums or guestbooks.

Just feed the crawlers with all the crap the search engines require, then concentrate all your efforts on your UI for human vistors. Web robots (bots, crawlers, spiders, …) don’t supply your signup-forms w/ credit card details. Humans do. If you find the time to upsell them while search engines keep you busy with thoughtless change requests all day long.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

18 comments Sebastian | Webspam, Paid Links, Search Quality, Web development, Internet Marketing, Crawler Directives, Crap, Google, Microformats, SEO, Cloaking, Anchor Text, Nofollow

Opting out: mailto://me is history

Posted on 27 May, 2009

Finally quitting email Today I’ve removed all instances of the thunderbird icon from my computers, and from my memory as well. I’m finally done with email. I’ve forwarded¹⁾ all my email accounts to [email protected], and here’s why:

Sebastian’s Pamphlets

Dear Sebastian,

I visited your web site earlier today and it seems you are also a seo company like us. As an SEO company we are in this field since 1998 in India(CHD). We have developed and maintained high quality websites.

We understand link building better than other because of our 11 year experience in linking industry and we follows the right manual link building approach in seeking, obtaining and attracting topic specific trusted inbound links. We have different themes related sites, directories and blogs and i would like to make a request to enter a mutual understanding by EXCHANGING LINKS with your website in order to get targeted visitors, higher ranking and link popularity.

We look forward to linking our site with yours, as exchanging links would Benefit both of us.

You\’ve received this email simply because you have been found while searching for related sites in Google, MSN and Yahoo If you do not wish to receive future emails, simply reply with this email and let us know.

Waiting for your positive and quick response.

PLEASE NOTE THAT THIS IS NOT A SPAM OR AUTOMATED EMAIL, IT\’S ONLY A REQUEST FOR A LINK EXCHANGE. YOUR EMAIL ADDRESS HAS NOT BEEN ADDED TO ANY LISTS, AND YOU WILL NOT BE CONTACTED AGAIN.

Regards:
Lara

Lara
Megrisoft
[email protected]

Direct message from Spamdiggalot

Hi, Sebastian.

You have a new direct message:

Spamdiggalot: hi!I think you should like my article “12 addons to get the most out of safer-sex”, here: digg.com/x010101 please RT!

Reply on the web at http://twitter.com/direct_messages/create/Spamdiggalot

Send me a direct message from your phone: D SPAMDIGGALOT

our company proposal

Dear Sebastian Pamphlets,

My name is Vincentas and I am member of board in multi-location hosting company - Host1Plus (http:// www . host1plus . com). Our servers are in U.S., U.K., the Netherlands, Germany, Lithuania and Singapore.

I just visited your website which I found interested and it provides excellent complementary content.
We would like to offer you free hosting for your site in Host1Plus hosting service the only thing we would ask you is to place our visitors counter to your website here is the link http:// www . count1plus . com or it could be any other feature.

So let me know if you are interested for my offer and I hope that offer is interested to you. Hope to hear you soon.

Kind Regards,
Vincentas Grinius

Host1Plus.com Team
part of Digital Energy Technologies Ltd.
26 York Street
London

W1U 6PZ
United Kingdom
T: +44 (0) 808 101 2277
E: [email protected]
W: http:// www . host1plus . com

Vincentas Grinius
Host1Plus.com
[email protected]

Link Exchange

Hi,

I think if I receive something like this I would pay more attention to that.
\”Dear Webmaster I am so happy to find your website and I like it so much! So I want to be a link partner of your site.

If you are interested to make us your link partner , please inform us and we will be glad to make our link partner within 24 hours.

Our Link Details :

Title: Social Network Development UK

URL: http:// www . dassnagar . co . uk/

Description: Web Development Company UK: Premier Interactive Agency, specializing in custom website design, Social network development, Sports betting portal development, Travel portal design, Flash gaming portal design and development.

Link\’s HTML Code:

<a href=\”http:// www . dassnagar . co . uk/\” target=\”new\”>Social Network Development UK
</a> Web Development Company UK: Premier Interactive Agency, specializing in custom website design, Social network development, Sports betting portal development, Travel portal design, Flash gaming portal design and development.

Please accept my apology if already partner or not interested.

Reasons to exchange link with us.

1. Our site is regularly crawled by google, so there are better chances googlebot visiting your website regularly.
2. We ask you to link back to only those pages where your url is present, indirectly you are increasing your own link value.
3. By linking to our articles and technology blog you can provide useful content to your visitors.

This is an advertisement and a promotional mail strictly on the guidelines of CAN-SPAM act of 2003 . We have clearly mentioned the source mail-id of this mail, also clearly mentioned the subject lines and they are in no way misleading in any form. We have found your mail address through our own efforts on the web search and not through any illegal way. If you find this mail unsolicited, please reply with \”Unsubscribe\” in the subject line and we will take care that you do not receive any further promotional mail.

Please feel free to contact me if you have any questions.

Kind regards,
Tom
Webmaster

John
dassnagar . co . uk
[email protected]

Trust me, quitting email is a time-saver. And yes, I’ve an idea how to waste the additional spare time: Tomorrow I’ll have paid me a beer for a link to myself. And I can think of way more link monkey business that doesn’t involve email.

I'm such a devil!

¹⁾ Actually, “forwarding” comes with a slighly shady downside:
If you continue to send me your (unsolicited) emails, you’ll find all your awkward secrets on literally tons of automatically generated Web pages -nicely plastered with very targeted ads and usually x-rated or otherwise NSFW banners-, hosted on throw-away domains.
I’m such a devil.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

2 comments Sebastian | Spam, Link Building, Internet Marketing, Reciprocal Links, Fun, Paid Links, Spam Report, Risky Linkage, Crap

Avoid catch-22 situations - don’t try to store more than the current screen values

Posted on 19 May, 2009

Enough is enough. Folks following me at Twitter may have noticed that suffering from an unchangeable, seriously painful all-red-in-red Twitter color scheme over weeks and weeks results in a somewhat grumpy mood of yours truly.

I’ve learned that Twitter’s customer support dept. operates a FINO queue. If there’s a listener assigned to the queue at all, it’s mounted to /dev/null. For you non-geeks out there: the Twitter support is a black hole. You can stuff it with support requests to no avail. Its insert trigger assigns the “solved” status automatically, without notice. The life cycle of a Twitter support request is a tiny fraction of a snowball in hell. Apropos Twitter operator from hell. If the picture on the right (showing the Twitter employee responsible for this pamphlet at work) is representative, I might apply for a job. Wait … reality sucks.

Ok ok ok, I’ve ranted enough, back to the topic: avoiding catch-22 scenarios in Web development. For the following example it’s not relevant how the weird user settings were produced (profile hacked by Mikkey, plain dumbfucked user actions, Twitter bugs …), the problem is that the Twitter Web UI doesn’t offer a way out of the dilemma.

Say you’ve developed a user control panel like this one:

Each group of attributes is nicely gathered in its own tab. Each tab has a [save] button. The average user will assume that pressing the button will save exactly the values shown on the tab’s screen. Nothing more, nothing less.

When it comes to Twitter’s UI design, this assumption is way too optimistic — IOW based on common sense, not thoughtless Twitter architectural design decisions. Imagine one attribute of the current “account” tab has an invalid value, e.g. the email address was set equal to user name. Here is what happens when you, the user, try to correct the invalid value, providing your working email address:

The Twitter-save-~~bug~~ routine validates the whole user record, not just the fields enabled on the “account” frame. Of course the design settings are invalid too, so any storing of corrections is prohibited. This catch-22 situation gets even ~~laughable~~ worse. When you follow Twitter’s advice and edit the design settings, the error message is utterly meaningless. Instead of “Email address: You must provide a working email addy” it says:

“There was a problem saving your profile’s customization” easily translates to “You douchebag can’t provide an email addy, so why should I allow you to choose a design template? Go fuck yourself!”. Dear Twitter, can you imagine why I’m that pissed? Of course you can’t, because you don’t read support requests, nor forum posts, nor tweets. Would you keep calm when your Twitter UI looks like mine?

Not yet convinced? Here I’ve higlighted what you WebDev artists hide from me:

And during the frequent Twitter-hiccups you can make it even uglier:

So my dear Twitter developer … You might look quite classy, but your code isn’t sexy. You’ve messed-up the Web-UI. Go back to the white board. Either cache the attributes edited in all tabs per session in a cookie or elsewhere and validate the whole thingy on save-of-any-tab like you do it now (adding meaningful error messages!), or better split the validation into chunks as displayed per tab. Don’t try to validate values that aren’t changeable in the current form’s scope!

And don’t forget to send me a DM when you’ve fixed your buggy code, because -as you hopefully might remember from the screenshots above- the email addy of my account is screwed-up, as well as the design settings.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

5 comments Sebastian | Web development, Twitter, Crap

Dump your self-banning CMS

Posted on 11 May, 2009

CMS developer's output: unusable dogshit When it comes to cluelessness [silliness, idiocy, stupidity … you name it], you can’t beat CMS developers. You really can’t. There’s literally no way to kill search engine traffic that the average content management system (CMS) developer doesn’t implement. Poor publishers, probably you suffer from the top 10 issues on my shitlist. Sigh.

Imagine you’re the proud owner of a Web site that enables logged-in users customizing the look & feel and whatnot. Here’s how your CMS does the trick:

Unusable user interface

The user control panel offers a gazillion of settings that can overwrite each and every CSS property out there. To keep the user-cp pages lean and fast loading, the properties are spread over 550 pages with 10 attributes each, all with very comfortable Previous|Next-Page navigation. Even when the user has choosen a predefined template, the CMS saves each property in the user table. Of course that’s necessary because the site admin could change a template in use.

Amateurish database design

Not only for this purpose each user tuple comes with 512 mandatory attributes. Unfortunately, the underlaying database doesn’t handle tables with more than 512 columns, so the overflow gets stored in an array, using the large text column #512.

Cookie hell

Since every database access is expensive, the login procedure creates a persistent cookie (today + 365 * 30) for each user property. Dynamic and user specific external CSS files as well as style-sheets served in the HEAD section could fail to apply, so all CMS scripts use a routine that converts the user settings into inline style directives like style="color:red; text-align:bolder; text-decoration:none; ...". The developer consults the W3C CSS guidelines to make sure that not a single CSS property is left out.

Excessive query strings

Actually, not all user agents handle cookies properly. Especially cached pages clicked from SERPs load with a rather weird design. The same goes for standard compliant browsers. Seems to depend on the user agent string, so the developer adds a if ($well_behaving_user_agent_string <> $HTTP_USER_AGENT) then [read the user record and add each property as GET variable to the URI’s querystring]) sanity check. Of course the $well_behaving_user_agent_string variable gets populated with a constant containing the developer’s ancient IE user agent, and the GET inputs overwrite the values gathered from cookies.

Even more sanitizing

Some unhappy campers still claim that the CMS ignores some user properties, so the developer adds a routine that reads the user table and populates all variables that previously were filled from GET inputs overwriting cookie inputs. All clients are happy now.

Covering robots

“Cached copy” links from SERPs still produce weird pages. The developer stumbles upon my blog and adds crawler detection. S/he creates a tuple for each known search engine crawler in the user table of her/his local database and codes if ($isSpider) then [select * from user where user.usrName = $spiderName, populating the current script's CSS property variables from the requesting crawler's user settings]. Testing the rendering with a user agent faker gives fine results: bug fixed. To make sure that all user agents get a nice page, the developer sets the output default to “printer”, which produces a printable page ignoring all user settings that assign style="display:none;" to superfluous HTML elements.

Results

Users are happy, they don’t spot the code bloat. But search engine crawlers do. They sneakily request a few pages as a crawler, and as a browser. Comparing the results they find the “poor” pages delivered to the feigned browser way too different from the “rich” pages serving as crawler fodder. The domain gets banned for poor-man’s-cloaking (as if cloaking in general could be a bad thing, but that’s a completely different story). The publisher spots decreasing search engine traffic and wonders why. No help avail from the CMS vendor. Must be unintentionally deceptive SEO copywritig or so. Crap. That’s self-banning by software design.

Ok, before you read on: get a calming tune.

How can I detect a shitty CMS?

Well, you can’t, at least not as a non-geeky publisher. Not really. Of course you can check the “cached copy” links from your SERPs all night long. If they show way too different results compared to your browser’s rendering you’re at risk. You can look at your browser’s address bar to check your URIs for query strings with overlength, and if you can’t find the end of the URI perhaps you’re toast, search engine wise. You can download tools to check a page’s cookies, then if there are more than 50 you’re potentially search-engine-dead. Probably you can’t do a code review yourself coz you can’t read source code natively, and your CMS vendor has delivered spaghetti code. Also, as a publisher, you can’t tell whether your crappy rankings depend on shitty code or on your skills as as a copywriter. When you ask your CMS vendor, usually the search engine algo is faulty (especially Google, Yahoo, Microsoft and Ask) but some exotic search engine from Togo or so sets the standards for state of the art search engine technology.

Last but not least, as a non-search-geek challenged by Web development techniques you won’t recognize most of the laughable -but very common- mistakes outlined above. Actually, most savvy developers will not be able to create a complete shitlist from my scenario. Also, there a tons of other common CMS issues that do resolve in different crawlability issues - each as bad as this one, or even worse.

Now what can you do? Well, my best advice is: don’t click on Google ads titled “CMS”, and don’t look at prices. The cheapest CMS will cost you the most at the end of the day. And if your budget exceeds a grand or two, then please hire an experienced search engine optimizer (SEO) or search savvy Web developer before you implement a CMS.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

26 comments Sebastian | Blogging, Web development, Crap, Cloaking, SEO

« Previous Page 1 | 2 | 3 | 4 | 5 | 6 Next Page »

Archived posts from the 'Crap' Category

Why is it bad?

Spamming URIs with utm tracking variables breaks lots of things

What can I do to avoid URI spam on my site?

What can Google do?

Speak out!

Please nudge the developers!

robots.txt

.htaccess

ErrorDocuments

HTTP headers

Sensible page titles and summaries

Unique content

User generated content

Ajax’ed “Buy now” widgets

Sitemaps

The Number One SEO Secret

Webmaster guidelines for search engine friendly links

Sebastian’s Pamphlets

Direct message from Spamdiggalot

our company proposal

Link Exchange

Unusable user interface

Amateurish database design

Cookie hell

Excessive query strings

Even more sanitizing

Covering robots

Results

How can I detect a shitty CMS?

Categories

Monthly Archives

Links

RSS Feeds