Debugging robots.txt with Google Webmaster Tools

Although Google’s Webmaster Console is a really neat toolkit, it can mislead the not-that-savvy crowd every once in a while.

When you go to “Diagnostics::Crawl Errors::Restricted by robots.txt” and you find URIs that aren’t disallow’ed or even noindex’ed in your very own robots.txt, calm down.

Google’s cool robots.txt validator withdraws its knowledge of redirects and approves your redirecting URIs, driving you nuts until you check each URI’s HTTP response code for redirects (HTTP response codes 301, 302 and 307, as well as undelayed meta refreshs).

Google obeys robots.txt even in a chain of redirects. If for Google’s user agent(s) an URI given in an HTTP header’s location is disallow’ed or noindex’ed, Googlebot doesn’t fetch it, regardless the position in the current chain of redirects. Even a robots.txt block in the 5th hop stops the greedy Web robot. Those URIs are correctly reported back as “restricted by robots.txt”, Google just refuses to tell you that the blocking crawler directive origins from a foreign server.

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Getting new stuff crawled in real-time

Thanks to PubSubHubbub we’ve got another method to invite Googlebot.

Just add a link to FriendFeed, for example by twittering or bookmarking it. Once the URI hits your FriendFeed account, FriendFeed shares it with Googlebot, and both of them request it before the tweet appears on your timeline:

Date/Time Request URI IP User agent
2009–08–17 15:47:19 Your URI / Mozilla/5.0 (compatible; FriendFeedBot/0.1; +Http://
2009-08-17 15:47:19 Your robots.txt / Mozilla/5.0 (compatible; Googlebot/2.1; +
2009-08-17 15:47:19 Your URI / Mozilla/5.0 (compatible; Googlebot/2.1; +


Neat. Of course there are other methods to invite search engine crawlers in lightning speed, but this one comes handy because it can run on autopilot […]. Even when the actual fetch is meant to feed Google Reader or something, you’ve got your stuff into Google’s crawling cache from where other services like the Web indexer can pick it without requesting a crawl.

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Less is more. Google Chrome is my preferred browser. Here’s why:

Recently I’ve bitched a lot, especially tearing utterly useles microformats that the InterWeb really doesn’t need (rel-nofollow, common tag …). Naturally, those pamplets get noticed as Google search engine bashing. Wait. Of course not everything a search engine company launches is crap. Actually, I do love –and use daily– lots of awesome services provided by Google, Yahoo & Co.

Google Chrome BrowserWhilst some SE engineers –probably due to my endless rants– have unsubscribed from my various you-porn social media streams, others have noticed that there’s also laudatory stuff in a grumpy old fart’s Twitter output, and asked for input. Thank you! Dear Johannes Müller, bypassing your WebForm, here is my greedy Google Chrome wish list (you do know the goodies yourself, hence I skip the cute stuff I should praise).

I’ll focus on functionality that I like or miss as a plain user, but I can’t resist to mention a few geeky thingies upfront. As a developer I do love that Chrome doesn’t die on faulty scripts (or on .htpasswd protected pages during startup with session restore like the current FF … on evaling perfectly valid JavaScript code from a server’s response to an AJAX request that exceeds 50 or 65k like IE …). Also, the debugging facilities are awesome (although I still can’t throw away Firebug and a few more FireFox plug-ins). I very much appreciate Chrome’s partial HTML-5 support, but besides neat video controls I’d love to see it render plain HTML-5 stuff like CITE attributes in Q elements correctly (4.7.3. User agents should allow users to follow such citation links), even when DOCTYPE says HTML 4.x or XHTML. ;)

WebKit is great, but it comes with disadvantages. Try to put radio buttons in a SPAN or DIV element with CSS controlling horizontal/vertical appearance as well as special label formats –instead of a RADIO-GROUP– and you’re toast. FF can handle that. Or set the MULTIPLE attribute of a SELECT element to FALSE (instead of ommitting it for combo-boxes) and you’ll suffer from select lists that you just can’t handle as a user, because WebKit (as well as other layout engines!) doesn’t render the element as a drop down list any more. Of course that’s non-standard coding, but stuff like that isn’t really uncommon on the Web. Just because other layout engines handle crap like this equally wrong, that doesn’t mean that the WebKit version used by Google Chrome must come with the same maladies, right?

What totally annoys me is that on the WordPress /wp-admin/post.php page the plus icons of “Post Slug” or “Post Status” just don’t work with Chrome. That means I’ve to fire up FF only to type in a value in a form field that Google Chrome sneakily hides from me. Nasty. Really nasty. Don’t tell me that I’m using an outdated WordPress version. I do know that, but I won’t upgrade because WP 0.87 (beta) perfectly fits my needs.

Ok, what do I like as a user? Google Chrome is lean and very easy to use, it eats less memory than any other browser I allow on my machines, and it executes JavaScript as well as nifty rounded corners amazingly fast. Because –at least with the naked version– I can’t install a gazillion of add-ons, I usually see complete landing pages rendered — instead of just the H1, an advertisement, and the very first P element along with an 1/6 clip of an image or video, because all the FF toolbars occupy nearly 3/4 of the browser window’s height. Try FireFox with a few plug-ins vs. Chrome on a machine running 1024*768 (not that unusual when traveling) and you’re convinced in a fraction of a nanosecond.

Now that I’ve completely switched to Chrome, at least at home (at work I have to test my stuff with everything except IE because that’s a not supportable user agent), I preferably sooner than later do want the FireFox nuggets. Dear Google Chrome developers, please find a way to extract the most wanted stuff from FF plug-ins. You can implement those as right-click popup menus, as well as an one-line toolbar (not stealing too much screen real estate), or both, or otherwise. It’s not too hard to detect that a user has a delicious or stumble-upon account (you read the cookies anyway …). You easily could show icons for the core functionality of such services, along with context sensitive menus enabling the whole functionality of a particular service as provided with overcrowded toolbars in other browsers. Examples:

Delicious  An icon “Remember this” to submit a page to delicious is enough, when “my delicious” and so on is available via context menu.
StumbleUpon  The same goes for StumbleUpon. Two icons, thumbs-up and thumbs-down, would provide 99% of the functionality I need quite often. Ok, my thumbs-down votes are rare, so you can even dump the second one.
TinyUrl  How cool would it be to create a tiny URI for the current tab with just one click?
PrefBar checkboxes  Next up, please feel somewhat challenged by PrefBar, an instrument I really can’t miss on the long haul.PrefBar combo-boxes 
Switching user agent strings, faking referrers, checking out Web pages without cookies, JavaScript and so on is a must have. Ok, I admit that’s geek stuff, so take it as an example transferable to some girlish stuff I refuse to recognize in my monster’s Web browsers.
Twitter  Also, let’s not forget Twitter, blogstuff and whatnot.
Imagine your preferred services, iconized in a one-line toolbar configurable compiled from single items of various 3rd party toolbars available on the InterWeb (of course you should enable Google Toolbar icons too). How cool would that be, in comparisation to the bookmarklets I must live with now?

Google Chrome bookmarklets


Context-menu stuff like “image properties” et cetera –as well known from other browsers– would be very helpful too. “Inspect element” is really neat and informative (for geeks), but way too complicated for the average user.

Another issue is Chrome’s lack of “Babylon functionality”. I want to configure my native language as well as a preferred language (read that as “at least one“). Say I’ve set native language to de-DE and preferred language to en-US, then when hovering a word or phrase on any Web object, I want to see a tooltip displaying the english translation from whatever gibberish the Web page is written in (of course for english text I’d expect the german translation); and when I select a piece of text I want to read the german (english) translation on right-click:translate in a popup dialog that allows copying to the clipboard as well as changing languages. I know you’ve the technology at your hands.

Oh, and please disable the defaulted DNS caching, that’s a royal PITA when you mostly consume dynamic contents because lots of previously visited URIs get displayed as error messages. Also, “reload” should pull images again, replacing their cached copies; right-click:reload should reposition to the current viewpoint.

I’d like to have “project windows”, that is on-demand Chrome windows loaded with particular tabs with URIs I’ve previouisly saved from a window under a project name. Those shouldn’t come up when I’ve set “load previous session at start-up”, but only when I want to restore such a window.

After a quite longish test phase I’d say that Google Chrome’s advantages beat the lack of functionality with ease. Pretty often the snipping of a particular commonly supplied feature (like search boxes in toolbars) dramatically enhances Chrome’s usability. Chrome’s KISS approach kicks ass. And I see it evolve.

Now that you’ve read my appraisal and suggestions, please consider picking a few items from my t-shirt wish list. You know, I’ve promised to link out to everybody sending me a (geeky|pornographic|funny|) XX(X)L t-shirt that I really like. ;) Just in case you’re not the type of reader who buys the author of a pamphlet a t-shirt, please subscribe to my RSS feed. Thanks.

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

How to handle a machine-readable pandemic that search engines cannot control

R.I.P. rel-nofollowWhen you’re familiar with my various rants on the ever morphing rel-nofollow microformat infectious link disease, don’t read further. This post is not polemic, ironic, insulting, or otherwise meant to entertain you. I’m just raving about a way to delay the downfall of the InterWeb.

Lets recap: The World Wide Web is based on hyperlinks. Hyperlinks are supposed to lead humans to interesting stuff they want to consume. This simple and therefore brilliant concept worked great for years. The Internet grew up, bubbled a bit, but eventually it gained world domination. Internet traffic was counted, sold, bartered, purchased, and even exchanged for free in units called “hits”. (A “hit” means one human surfer landing on a sales pitch. That is a popup hell designed in a way that somebody involved just has to make a sale).

Then in the past century two smart guys discovered that links scraped from Web pages can be misused to provide humans with very accurate search results. They even created a new currency on the Web, and quickly assigned their price tags to Web pages. Naturally, folks began to trade green pixels instead of traffic. After a short while the Internet voluntarily transferred it’s world domination to the company founded by those two smart guys from Stanford.

Of course the huge amount of green pixel trades made the search results based on link popularity somewhat useless, because the webmasters gathering the most incoming links got the top 10 positions on the search result pages (SERPs). Search engines claimed that a few webmasters cheated on their way to the first SERPs, although lawyers say there’s no evidence of any illegal activities related to search engine optimization (SEO).

However, after suffering from heavy attacks from a whiny blogger, the Web’s dominating search engine got somewhat upset and required that all webmasters have to assign a machine-readable tag (link condom) to links sneakily inserted into their Web pages by other webmasters. “Sneakily inserted links” meant references to authors as well as links embedded in content supplied by users. All blogging platforms, CMS vendors and alike implemented the link condom, eliminating presumably 5.00% of the Web’s linkage at this time.

A couple of months later the world dominating search engine demanded that webmasters have to condomize their banner ads, intercompany linkage and other commercial links, as well as all hyperlinked references that do not count as pure academic citation (aka editorial links). The whole InterWeb complied, since this company controlled nearly all the free traffic available from Web search, as well as the Web’s purchasable traffic streams.

Roughly 3.00% of the Web’s links were condomized, as the search giant spotted that their users (searchers) missed out on lots and lots of valuable contents covered by link condoms. Ooops. Kinda dilemma. Taking back the link condom requirements was no option, because this would have flooded the search index with billions of unwanted links empowering commercial content to rank above boring academic stuff.

So the handling of link condoms in the search engine’s crawling engine as well as in it’s ranking algorithm was changed silently. Without telling anybody outside their campus, some condomized links gained power, whilst others were kept impotent. In fact they’ve developed a method to judge each and every link on the whole Web without a little help from their friends link condoms. In other words, the link condom became obsolete.

Of course that’s what they should have done in the first place, without asking the world’s webmasters for gazillions of free-of-charge man years producing shitloads of useless code bloat. Unfortunately, they didn’t have the balls to stand up and admit “sorry folks, we’ve failed miserably, link condoms are history”. Therefore the Web community still has to bother with an obsolete microformat. And if they –the link comdoms– are not dead, then they live today. In your markup. Hurting your rankings.

If you, dear reader, are a Googler, then please don’t feel too annoyed. You may have thought that you didn’t do evil, but the above said reflects what webmasters outside the ‘Plex got from your actions. Don’t ignore it, please think about it from our point of view. Thanks.

Still here and attentive? Great. Now lets talk about scenarios in WebDev where you still can’t avoid rel-nofollow. If there are any — We’ll see.

PageRank™ sculpting

Dude, PageRank™ sculpting with rel-nofollow doesn’t work for the average webmaster. It might even fail when applied as high sophisticated SEO tactic. So don’t even think about it. Simply remove the rel=nofollow from links to your TOS, imprint, and contact page. Cloak away your links to signup pages, login pages, shopping carts and stuff like that.

Link monkey business

I leave this paragraph empty, because when you know what you do, you don’t need advice.

Affiliate links

There’s no point in serving A elements to Googlebot at all. If you haven’t cloaked your aff links yet, go see a SEO doctor.

Advanced SEO purposes

See above.

So what’s left? User generated content. Lets concentrate our extremely superfluous condomizing efforts on the one and only occasion that might allow to apply rel-nofollow to a hyperlink on request of a major search engine, if there’s any good reason to paint shit brown at all.


If you link out in a blog post, then you vouch for the link’s destination. In case you disagree with the link destination’s content, just put the link as

<strong class="blue_underlined" title="" onclick="window.location=this.title;">My Worst Enemy</strong>

or so. The surfer can click the link and lands at the estimated URI, but search engines don’t pass reputation. Also, they don’t evaporate link juice, because they don’t interpret the markup as hyperlink.

Blog comments

My rule of thumb is: Moderate, DoFollow quality, DoDelete crap. Install a conditional do-follow plug-in, set everything on moderation, use captchas or something similar, then let the comment’s link juice flow. You can maintain a white list that allows instant appearance of comments from your buddies.

Forums, guestbooks and unmoderated stuff like that

Separate all Web site areas that handle user generated content. Serve “index,nofollow” meta tags or x-robots-headers for all those pages, and link them from a site map or so. If you gather index-worthy content from users, then feed crawlers the content in a parallel –crawlable– structure, without submit buttons, perhaps with links from trusted users, and redirect human visitors to the interactive pages. Vice versa redirect crawlers requesting live pages to the spider fodder. All those redirects go with a 301 HTTP response code.

If you lack the technical skills to accomplish that, then edit your /robots.txt file as follows:

User-agent: Googlebot
# Dear Googlebot, drop me a line when you can handle forum pages
# w/o rel-nofollow crap. Then I'll allow crawling.
# Treat that as conditional disallow:
Disallow: /forum

As soon as Google can handle your user generated content naturally, they might send you a message in their Webmaster console.

Anything else

Judge yourself. Most probably you’ll find a way to avoid rel-nofollow.


Absolutely nobody needs the rel-nofollow microformat. Not even search engines for the sake of their index. Hence webmasters as well as search engines can stop wasting resources. Farewell rel="nofollow", rest in peace. We won’t miss you.

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Vaporize yourself before Google burns your linking power

PIC-1: Google PageRank(tm) 2007I couldn’t care less about PageRank™ sculpting, because a well thought out link architecture does the job with all search engines, not just Google. That’s where Google is right on the money.

They own PageRank™, hence they can burn, evaporate, nillify, and even divide by zero or multiply by -1 as much PageRank™ as they like; of course as long as they rank my stuff nicely above my competitors.

Picture 1 shows Google’s PageRank™ factory as of 2007 or so. Actually, it’s a pretty simplified model, but since they’ve changed the PageRank™ algo anyway, you don’t need to bother with all the geeky details.

As a side note: you might ask why I don’t link to Matt Cutts and Danny Sullivan discussing the whole mess on their blogs? Well, probably Matt can’t afford my advertising rates, and the whole SEO industry has linked to Danny anyway. If you’re nosy, check out my source code to learn more about state of the art linkage very compliant to Google’s newest guidelines for advanced SEOs (summary: “Don’t trust underlined blue text on Web pages any longer!”).

PIC-2: Google PageRank(tm) 2009What really matters is picture 2, revealing Google’s new PageRank™ facilities, silently launched in 2008. Again, geeky details are of minor interest. If you really want to know everything, then search for [operation bendover] at !Yahoo (it’s still top secret, and therefore not searchable at Google).

Unfortunately, advanced SEO folks (whatever that means, I use this term just because it seems to be an essential property assigned to the participants of the current PageRank™ uprising discussion) always try to confuse you with overcomplicated graphics and formulas when it comes to PageRank™. Instead, I ask you to focus on the (important) hard core stuff. So go grab a magnifier, and work out the differences:

  • PageRank™ 2009 in comparision to PageRank™ 2007 comes with a pipeline supplying unlimited fuel. Also, it seems they’ve implemented the green new deal, switching from gas to natural gas. That means they can vaporize way more link juice than ever before.
  • PageRank™ 2009 produces more steam, and the clouds look slightly different. Whilst PageRank™ 2007 ignored nofollow crap as well as links put with client sided scripting, PageRank™ 2009 evaporates not only juice covered with link condoms, but also tons of other permutations of the standard A element.
  • To compensate the huge overall loss of PageRank™ caused by those changes, Google has decided to pass link juice from condomized links to their target URI hidden to Googlebot with JavaScript. Of course Google formerly has recommended the use of JavaScript-links to prevent the webmasters from penalties for so-called “questionable” outgoing links. Just as they’ve not only invented rel-nofollow, but heavily recommended the use of this microformat with all links disliked by Google, and now they take that back as if a gazillion links on the Web could magically change just because Google tweeks their algos. Doh! I really hope that the WebSpam-team checks the age of such links before they penalize everything implemented according to their guidelines before mid-2009 or the InterWeb’s downfall, whatever comes last.

I guess in the meantime you’ve figured out that I’m somewhat pissed. Not that the secretly changed flow of PageRank™ a year ago in 2008 had any impact on my rankings, or SERP traffic. I’ve always designed my stuff with PageRank™ flow in mind, but without any misuses of rel=”nofollow”, so I’m still fine with Google.

What I can’t stand is when a search engine tries to tell me how I’ve to link (out). Google engineers are really smart folks, they’re perfectly able to develop a PageRank™ algo that can decide how much Google-juice a particular link should pass. So dear Googlers, please –WRT to the implementation of hyperlinks– leave us webmasters alone, dump the rel-nofollow crap and rank our stuff in the best interest of your searchers. No longer bother us with linking guidelines that change yearly. It’s not our job nor responsibility to act as your cannon fodder slavish code monkeys when you spot a loophole in your ranking- or spam-detection-algos.

Of course the above said is based on common sense, so Google won’t listen (remember: I’m really upset, hence polemic statements are absolutely appropriate). To prevent webmasters from irrational actions by misleaded search engines, I hereby introduce the

Webmaster guidelines for search engine friendly links

What follows is pseudo-code, implement it with your preferred server sided scripting language.

if (getAttribute($link, 'rel') matches '*nofollow*' &&
    $userAgent matches '*Googlebot*') {
    print '<strong rev="' + getAttribute(link, 'href') + '"'
    + ' style="color:blue; text-decoration:underlined;"'
    + ' onmousedown="window.location=document.getElementById(; "'
    + '>' + getAnchorText($link) + '</strong>';
else {
    print $link;

Probably it’s a good idea to snip both the onmousedown trigger code as well as the rev attribute, when the script gets executed by Googlebot. Just because today Google states that they’re going to pass link juice to URIs grabbed from the onclick trigger, that doesn’t mean they’ll never look at the onmousedown event or misused (X)HTML attributes.

This way you can deliver Googlebot exactly the same stuff that the punter surfer gets. You’re perfectly compliant to Google’s cloaking restrictions. There’s no need to bother with complicated stuff like iFrames or even disabled blog comments, forums or guestbooks.

Just feed the crawlers with all the crap the search engines require, then concentrate all your efforts on your UI for human vistors. Web robots (bots, crawlers, spiders, …) don’t supply your signup-forms w/ credit card details. Humans do. If you find the time to upsell them while search engines keep you busy with thoughtless change requests all day long.

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Opting out: mailto://me is history

Finally quitting emailToday I’ve removed all instances of the thunderbird icon from my computers, and from my memory as well. I’m finally done with email. I’ve forwarded1) all my email accounts to, and here’s why:

Sebastian’s Pamphlets

Dear Sebastian,

I visited your web site earlier today and it seems you are also a seo company like us. As an SEO company we are in this field since 1998 in India(CHD). We have developed and maintained high quality websites.

We understand link building better than other because of our 11 year experience in linking industry and we follows the right manual link building approach in seeking, obtaining and attracting topic specific trusted inbound links. We have different themes related sites, directories and blogs and i would like to make a request to enter a mutual understanding by EXCHANGING LINKS with your website in order to get targeted visitors, higher ranking and link popularity.

We look forward to linking our site with yours, as exchanging links would Benefit both of us.

You\’ve received this email simply because you have been found while searching for related sites in Google, MSN and Yahoo If you do not wish to receive future emails, simply reply with this email and let us know.

Waiting for your positive and quick response.





Direct message from Spamdiggalot

Hi, Sebastian.

You have a new direct message:

Spamdiggalot: hi!I think you should like my article “12 addons to get the most out of safer-sex”, here: please RT!

Reply on the web at

Send me a direct message from your phone: D SPAMDIGGALOT

our company proposal

Dear Sebastian Pamphlets,

My name is Vincentas and I am member of board in multi-location hosting company - Host1Plus (http:// www . host1plus . com). Our servers are in U.S., U.K., the Netherlands, Germany, Lithuania and Singapore.

I just visited your website which I found interested and it provides excellent complementary content.
We would like to offer you free hosting for your site in Host1Plus hosting service the only thing we would ask you is to place our visitors counter to your website here is the link http:// www . count1plus . com or it could be any other feature.

So let me know if you are interested for my offer and I hope that offer is interested to you. Hope to hear you soon.

Kind Regards,
Vincentas Grinius Team
part of Digital Energy Technologies Ltd.
26 York Street

United Kingdom
T: +44 (0) 808 101 2277
W: http:// www . host1plus . com

Vincentas Grinius

Link Exchange


I think if I receive something like this I would pay more attention to that.
\”Dear Webmaster I am so happy to find your website and I like it so much! So I want to be a link partner of your site.

If you are interested to make us your link partner , please inform us and we will be glad to make our link partner within 24 hours.

Our Link Details :

Title: Social Network Development UK

URL: http:// www . dassnagar . co . uk/

Description: Web Development Company UK: Premier Interactive Agency, specializing in custom website design, Social network development, Sports betting portal development, Travel portal design, Flash gaming portal design and development.

Link\’s HTML Code:

<a href=\”http:// www . dassnagar . co . uk/\” target=\”new\”>Social Network Development UK
</a> Web Development Company UK: Premier Interactive Agency, specializing in custom website design, Social network development, Sports betting portal development, Travel portal design, Flash gaming portal design and development.

Please accept my apology if already partner or not interested.

Reasons to exchange link with us.

1. Our site is regularly crawled by google, so there are better chances googlebot visiting your website regularly.
2. We ask you to link back to only those pages where your url is present, indirectly you are increasing your own link value.
3. By linking to our articles and technology blog you can provide useful content to your visitors.

This is an advertisement and a promotional mail strictly on the guidelines of CAN-SPAM act of 2003 . We have clearly mentioned the source mail-id of this mail, also clearly mentioned the subject lines and they are in no way misleading in any form. We have found your mail address through our own efforts on the web search and not through any illegal way. If you find this mail unsolicited, please reply with \”Unsubscribe\” in the subject line and we will take care that you do not receive any further promotional mail.

Please feel free to contact me if you have any questions.

Kind regards,

dassnagar . co . uk


Trust me, quitting email is a time-saver. And yes, I’ve an idea how to waste the additional spare time: Tomorrow I’ll have paid me a beer for a link to myself. And I can think of way more link monkey business that doesn’t involve email.

 I'm such a devil!

1) Actually, “forwarding” comes with a slighly shady downside:
If you continue to send me your (unsolicited) emails, you’ll find all your awkward secrets on literally tons of automatically generated Web pages –nicely plastered with very targeted ads and usually x-rated or otherwise NSFW banners–, hosted on throw-away domains.
I’m such a devil.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Avoid catch-22 situations - don’t try to store more than the current screen values

Sexy Twitter developer screwing the Web UIEnough is enough. Folks following me at Twitter may have noticed that suffering from an unchangeable, seriously painful all-red-in-red Twitter color scheme over weeks and weeks results in a somewhat grumpy mood of yours truly.

I’ve learned that Twitter’s customer support dept. operates a FINO queue. If there’s a listener assigned to the queue at all, it’s mounted to /dev/null. For you non-geeks out there: the Twitter support is a black hole. You can stuff it with support requests to no avail. Its insert trigger assigns the “solved” status automatically, without notice. The life cycle of a Twitter support request is a tiny fraction of a snowball in hell. Apropos Twitter operator from hell. If the picture on the right (showing the Twitter employee responsible for this pamphlet at work) is representative, I might apply for a job. Wait … reality sucks.

Ok ok ok, I’ve ranted enough, back to the topic: avoiding catch-22 scenarios in Web development. For the following example it’s not relevant how the weird user settings were produced (profile hacked by Mikkey, plain dumbfucked user actions, Twitter bugs …), the problem is that the Twitter Web UI doesn’t offer a way out of the dilemma.

Say you’ve developed a user control panel like this one:

Twitter user account UI

Each group of attributes is nicely gathered in its own tab. Each tab has a [save] button. The average user will assume that pressing the button will save exactly the values shown on the tab’s screen. Nothing more, nothing less.

Invalid Twitter account settingWhen it comes to Twitter’s UI design, this assumption is way too optimistic — IOW based on common sense, not thoughtless Twitter architectural design decisions. Imagine one attribute of the current “account” tab has an invalid value, e.g. the email address was set equal to user name. Here is what happens when you, the user, try to correct the invalid value, providing your working email address:

Error messages on save of Twitter user account settings

The Twitter-save-bug routine validates the whole user record, not just the fields enabled on the “account” frame. Of course the design settings are invalid too, so any storing of corrections is prohibited. This catch-22 situation gets even laughable worse. When you follow Twitter’s advice and edit the design settings, the error message is utterly meaningless. Instead of “Email address: You must provide a working email addy” it says:

Error messages on save of Twitter user design settings

“There was a problem saving your profile’s customization” easily translates to “You douchebag can’t provide an email addy, so why should I allow you to choose a design template? Go fuck yourself!”. Dear Twitter, can you imagine why I’m that pissed? Of course you can’t, because you don’t read support requests, nor forum posts, nor tweets. Would you keep calm when your Twitter UI looks like mine?

Ugly red-in-red Twitter color scheme

Not yet convinced? Here I’ve higlighted what you WebDev artists hide from me:

Ugly red-in-red Twitter color scheme: What I'm missing

And during the frequent Twitter-hiccups you can make it even uglier:

Ugly red-in-red Twitter color scheme with partially loaded CSS

So my dear Twitter developer … You might look quite classy, but your code isn’t sexy. You’ve messed-up the Web-UI. Go back to the white board. Either cache the attributes edited in all tabs per session in a cookie or elsewhere and validate the whole thingy on save-of-any-tab like you do it now (adding meaningful error messages!), or better split the validation into chunks as displayed per tab. Don’t try to validate values that aren’t changeable in the current form’s scope!

And don’t forget to send me a DM when you’ve fixed your buggy code, because –as you hopefully might remember from the screenshots above– the email addy of my account is screwed-up, as well as the design settings.

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Dump your self-banning CMS

CMS developer's output: unusable dogshitWhen it comes to cluelessness [silliness, idiocy, stupidity … you name it], you can’t beat CMS developers. You really can’t. There’s literally no way to kill search engine traffic that the average content management system (CMS) developer doesn’t implement. Poor publishers, probably you suffer from the top 10 issues on my shitlist. Sigh.

Imagine you’re the proud owner of a Web site that enables logged-in users customizing the look & feel and whatnot. Here’s how your CMS does the trick:

Unusable user interface

The user control panel offers a gazillion of settings that can overwrite each and every CSS property out there. To keep the user-cp pages lean and fast loading, the properties are spread over 550 pages with 10 attributes each, all with very comfortable Previous|Next-Page navigation. Even when the user has choosen a predefined template, the CMS saves each property in the user table. Of course that’s necessary because the site admin could change a template in use.

Amateurish database design

Not only for this purpose each user tuple comes with 512 mandatory attributes. Unfortunately, the underlaying database doesn’t handle tables with more than 512 columns, so the overflow gets stored in an array, using the large text column #512.

Cookie hell

Since every database access is expensive, the login procedure creates a persistent cookie (today + 365 * 30) for each user property. Dynamic and user specific external CSS files as well as style-sheets served in the HEAD section could fail to apply, so all CMS scripts use a routine that converts the user settings into inline style directives like style="color:red; text-align:bolder; text-decoration:none; ...". The developer consults the W3C CSS guidelines to make sure that not a single CSS property is left out.

Excessive query strings

Actually, not all user agents handle cookies properly. Especially cached pages clicked from SERPs load with a rather weird design. The same goes for standard compliant browsers. Seems to depend on the user agent string, so the developer adds a if ($well_behaving_user_agent_string <> $HTTP_USER_AGENT) then [read the user record and add each property as GET variable to the URI’s querystring]) sanity check. Of course the $well_behaving_user_agent_string variable gets populated with a constant containing the developer’s ancient IE user agent, and the GET inputs overwrite the values gathered from cookies.

Even more sanitizing

Some unhappy campers still claim that the CMS ignores some user properties, so the developer adds a routine that reads the user table and populates all variables that previously were filled from GET inputs overwriting cookie inputs. All clients are happy now.

Covering robots

“Cached copy” links from SERPs still produce weird pages. The developer stumbles upon my blog and adds crawler detection. S/he creates a tuple for each known search engine crawler in the user table of her/his local database and codes if ($isSpider) then [select * from user where user.usrName = $spiderName, populating the current script's CSS property variables from the requesting crawler's user settings]. Testing the rendering with a user agent faker gives fine results: bug fixed. To make sure that all user agents get a nice page, the developer sets the output default to “printer”, which produces a printable page ignoring all user settings that assign style="display:none;" to superfluous HTML elements.


Users are happy, they don’t spot the code bloat. But search engine crawlers do. They sneakily request a few pages as a crawler, and as a browser. Comparing the results they find the “poor” pages delivered to the feigned browser way too different from the “rich” pages serving as crawler fodder. The domain gets banned for poor-man’s-cloaking (as if cloaking in general could be a bad thing, but that’s a completely different story). The publisher spots decreasing search engine traffic and wonders why. No help avail from the CMS vendor. Must be unintentionally deceptive SEO copywritig or so. Crap. That’s self-banning by software design.

Ok, before you read on: get a calming tune.

How can I detect a shitty CMS?

Well, you can’t, at least not as a non-geeky publisher. Not really. Of course you can check the “cached copy” links from your SERPs all night long. If they show way too different results compared to your browser’s rendering you’re at risk. You can look at your browser’s address bar to check your URIs for query strings with overlength, and if you can’t find the end of the URI perhaps you’re toast, search engine wise. You can download tools to check a page’s cookies, then if there are more than 50 you’re potentially search-engine-dead. Probably you can’t do a code review yourself coz you can’t read source code natively, and your CMS vendor has delivered spaghetti code. Also, as a publisher, you can’t tell whether your crappy rankings depend on shitty code or on your skills as as a copywriter. When you ask your CMS vendor, usually the search engine algo is faulty (especially Google, Yahoo, Microsoft and Ask) but some exotic search engine from Togo or so sets the standards for state of the art search engine technology.

Last but not least, as a non-search-geek challenged by Web development techniques you won’t recognize most of the laughable –but very common– mistakes outlined above. Actually, most savvy developers will not be able to create a complete shitlist from my scenario. Also, there a tons of other common CMS issues that do resolve in different crawlability issues - each as bad as this one, or even worse.

Now what can you do? Well, my best advice is: don’t click on Google ads titled “CMS”, and don’t look at prices. The cheapest CMS will cost you the most at the end of the day. And if your budget exceeds a grand or two, then please hire an experienced search engine optimizer (SEO) or search savvy Web developer before you implement a CMS.

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Professional Twitter-Stalking

Today Kelvin Newman asked me for a Twitter-tip. Well, I won’t reveal what he’s gathered so far until he publishes his collection, but I thought I could post a TwitterTip myself. I’m on a dead slow Internet connection, so here’s the KISS-guide to professional stalking on Twitter:

Collect RSS-Feed URIs

Every Twitter-user maintains an RSS feed. Regardless whether you can spot the RSS icon on her/his profile page or not. If there’s no public link to the feed, then click “view source”, scroll down to the RSS link element type="application/rss+xml"in the HEAD section, and scrape the URI from the HREF attribute. It should look like (that’s mine).

Merge the Feeds

Actually, I hate this service coz they apply nofollow-toxin to my links, but it’s quite easy to use and reliable (awful slow in design mode, though). So, (outch) go to Yahoo Pipes, sign in with any Yahoo-ID you’ve not yet burned with spammy activties, and click on “Create New Pipe”.

Grab a “Fetch Feed” element and insert your collected RSS-URIs. You can have multiple feed-suckers in a pipe, for example one per stalked Twitter user, or organize your idols in groups. In addition to the Twitter-feed you could add her/his blog-feed, and or you-porn stuff as well to get the big picture.

Create a “Union” element from the “operator” menu and connect all your feed-suckers to the merger.

Next create a “Sort” element and connect it to the merger. Sort by date of publication in descending order to get the latest tweets at the top. Bear in mind that feeds aren’t real time. When you subscribe later on, you’ll miss out on the latest tweets, but your feed reader will show you even deleted updates.

Finally connect the sorter to the outputter and save the whole thingy. Click on “Run Pipe” or the debugger to preview the results.

Here’s how such a stalker tool gets visualized in Yahoo Pipes:

Pipe: Twitter-Stalker-Feed

Subscribe and Enjoy

On the “Run Feed” page Yahoo shows the pipe’s RSS-URI, e.g. You can prettify this rather ugly address if you prefer talking URIs.

Copy the pipe’s URI and subscribe with your preferred RSS reader. Enjoy.

Thou shalt not stalk me!

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

About the bad taste of shameless ego food

2009 SEMMY Finalist Seems I’ve made it on the short-list at the SEMMY 2009 Awards in the Search Tech category. Great ego food. I’m honored. Thanks for nominating me that often! And thanks to John Andrews and Todd Mintz for the kind judgement!

Now that you’ve read the longish introduction, why not click here and vote for my pamphlet?

Ok Ok Ok, it’s somewhat technically and you perhaps even consider it plain geek food. However, it’s hopefully / nevertheless useful for your daily work. BTW … I wish more search engine engineers would read it. ;) It could help them to tidy up their flawed REP support.

Does this post smell way too selfish? I guess it does, but I’ll post it nonetheless coz I’m fucking keen on your votes. ;) Thanks in advance!

2009 SEMMY Winner  Wow, I won! Thank you all!

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

« Previous Page  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28  Next Page »