Archived posts from the 'SEO' Category

SEO-sanitizing a WordPress theme in 5 minutes

When you start a blog with , you get an overall good crawlability like with most blogging platforms. To get it ranked at search engines your first priority should be to introduce it to your communities acquiring some initial link love. However, those natural links come with disadvantages: canonicalization issues.

“Canonicalization”, what a geeky word. You won’t find it in your dictionary, you must ask Google for a definition. Do the search, the number one result leads to Matt’s blog. Please study both posts before you read on.

Most bloggers linking to you will copy your URLs from the browser’s address bar, or use the neat FireFox “Copy link location” thingy, what leads to canonical inbound links so to say. Others will type in incomplete URLs, or “correct” pasted URLs by removing trailing slashes, “www” prefixes or whatever. Unfortunately, usually both your Web server as well as WordPress are smart enough to find the right page, says your browser at least. What happens totally unseen in the background is that some of these page requests produce a 302-Found elsewhere response, and that search engine crawlers get feeded with various URLs all pointing to the same piece of content. That’s a bad thing with regard to search engine rankings (and enough stuff for a series of longish posts, so just trust me).

Lets begin the WordPress SEO-sanitizing with a fix of the most popular canonicalization issues. Your first step is to tell WordPress that you prefer sane and meaningful URLs without gimmicks. Go to the permalink options, check custom, type in /%postname%/ and save. Later on give each post a nice keyword rich title like “My get rich in a nanosecond scam” and a corresponding slug like “get-rich-in-a-nanosecond”. Next create a plain text file with this code

# Disallow directory browsing:
Options -Indexes
<IfModule mod_rewrite.c>
RewriteEngine On
# Fix www vs. non-www issues:
RewriteCond %{HTTP_HOST} !^your-blog\.com [NC]
RewriteRule (.*)$1 [R=301,L]
# WordPress permalinks:
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

and upload it in ASCII mode to your server’s root as “.htaccess” (if you don’t host your blog in the root or prefer the “www” prefix change the code accordingly). Change your-blog to your domain name respectively www\.your-domain-name.

This setup will not only produce user friendly post URLs like, it will also route all server errors to your theme’s error page. If you don’t blog in the root, learn here how you should handle HTTP errors outside the /blog/ directory (in any case you should use ErrorDocument directives to capture stuff WordPress can’t/shouldn’t handle, e.g. 401, 403, 5xx errors). Load 404.php in an ASCII editor to check whether it will actually send a 404 response. If the very first lines of code don’t look like

@header("HTTP/1.1 404 Not found", TRUE, 404);

then insert the code above and make absolutely sure that you’ve not a single whitespace (space, tab, new line) or visible character before the <?php (grab the code). It doesn’t hurt to make the 404 page friendlier by the way, and don’t forget to check the HTTP response code. Consider calling a 404grabber before you send the 404 header, this is a neat method to do page by page redirects capturing outdated URLs before the visitor gets the error page.

Next you need to hack header.php to fix canonicalization issues which the rewrite rule in the .htaccess file above doesn’t cover. By default WordPress delivers the same page for both the canonical URL and the crappy variant without the trailing slash. Unfortunately many people who are unaware of Web standards as well as scripts written by clueless assclowns remove trailing slashes to save bandwidth (lame excuse for this bullshit by the way, although even teeny search engines suffer from brain dead code monkeys implementing crap like that).

At the very top of header.php add this PHP code:

$requestUri = $_SERVER[”REQUEST_URI”];
$uriArr = explode(”#”, $requestUri);
$requestUriBase = $uriArr[0];
$requestUriFragment = $uriArr[1];
$uriArr = explode(”?”, $requestUriBase);
$requestUriBase = $uriArr[0];
$queryString = $_SERVER[”QUERY_STRING”];
if ( substr($requestUriBase, strlen($requestUriBase) -1, 1) != “/” ) {
$canonicalUrl = “”;
$canonicalUrl .= $requestUriBase .”/”;
if ($queryString) {
$canonicalUrl .= “?” .$queryString;
@header(”HTTP/1.1 301 Moved Permanently”, TRUE, 301);
@header(”Location: $canonicalUrl”);
(Again, not a single whitespace (space, tab, new line) or visible character before the <?php! Grab this code.)

Of course you need to change my URL in the canonicalUrl variable to yours but I don’t mind when you forget it. There’s no such thing as bad traffic. Beyond the canonicalization done above, at this point you can perform all sorts of URL checks and manipulations.

Now you understand why you’ve added the trailing slash in the permalink settings. Not only does the URL look better as a directory link, the trailing slash also allows you to canonicalize your URLs with ease. This works with all kind of WordPress URLs, even archives (although they shall not be crawlable), category archives, pages, of course the main page and whatnot. It can break links when your template has hard coded references to “index.php”, what is quite popular with the search form and needs a fix anyway, because it leads to at least two URLs serving identical contents.

It’s possible to achieve that with a hack in the blog root’s index.php, respectively a PHP script called in .htaccess to handle canonicalization and then including index.php. The index.php might be touched when you update WordPress, so that’s a file you shouldn’t hack. Use the other variant if you see serious performance issues by running through the whole WordPress logic before a possible 301-redirect is finally done in the template’s header.php.

While you’re at the header.php file, you should fix the crappy post titles WordPress generates by default. Prefixing title tags with your blog’s name is downright obscene, irritates readers, and kills your search engine rankings. Hence replace the PHP statement in the TITLE tag with
$pageTitle = wp_title(“”,false);
if (empty($pageTitle)) {
$pageTitle = get_bloginfo(”name”);
$pageTitle = trim($pageTitle);
print $pageTitle;
(Grab code.)

Next delete the references to the archives in the HEAD section:
<?php // wp_get_archives(’type=monthly&format=link’); ?>
The “//” tells PHP that the line contains legacy code, SEO wise at least. If your template comes with “index,follow” robots meta tags and other useless meta crap, delete these unnecessary meta tags too.

Well, there are a few more hacks which make sense, for example level-1 page links in the footer and so on, but lets stick with the mere SEO basics. Now we proceed with plugins you really need.

  • Install automated meta tags, activate and forget it.
  • Next grab Arne’s sitemaps plugin, activate it and uncheck the archives option in the settings. Don’t generate or submit a sitemap.xml before you’re done with the next steps!
  • Because you’re a nice gal/guy, you want to pass link juice to your commenters. Hence you install nofollow case by case or another dofollow plugin preventing you from nofollow insane.
  • Stephen Spencer’s SEO title tag plugin is worth a try too. I didn’t get it to work on this blog (that’s why I hacked the title tag’s PHP code in the first place), but that was probably caused by alzheimer light (=lame excuse for either laziness or goofiness) because it works fine on other blogs I’m involved with. Also, meanwhile I’ve way more code in the title tag, for example to assign sane titles to URLs with query strings, so I can’t use it here.
  • To eliminate the built-in death by pagination flaw –also outlined here–, you install PagerFix from Richard’s buddy Jaimie Sirovich. Activate it, then hack your templates (category-[category ID].php, category.php, archive.php and index.php) at the very bottom:
    // posts_nav_link(’ — ‘, __(’« Previous Page’), __(’Next Page »’));
    (Grab code.) The pager_fix() function replaces the single previous/next links with links pointing to all relevant pages, so that every post is maximal two clicks away from the main page respectively its categorized archive page. Clever.

Did I really promise that applying basic SEO to a WordPress blog is done in five minutes? Well, that’s a lie, but you knew that beforehand. I want you to change your sidebar too. First set the archives widget to display as a drop down list or remove it completely. Second, if you’ve more than a handful of categories, remove the categories widget and provide another topically organized navigation like category index pages. Linking to every category from every page dilutes relevancy with regard to search engine indexing, and is –with long lists of categories in tiny sidebar widgets– no longer helpful to visitors.

When you’re at your sidebar.php, go check whether the canonicalization recommended above broke the site search facility or not. If you find a line of HTML code like
<form id="searchform" method="get" action="./index.php">

then search is defunct. You should replace this line by
<form id="searchform" method="post” action=”<?php
$searchUrl = get_bloginfo(’url’);
if (substr($searchUrl, -1, 1) != “/”) {
$searchUrl .= “/”;
print $searchUrl; ?>”>
(grab code)
and here is why: The URL canonicalization routine adds a trailing slash to ./index.php what results in a 404 error. Next if the method is “get” you really want to replace that with “post” because firstly with regard to Google’s guildelines crawlable search results are a bad idea, and secondly GET-forms are a nice hook for negative SEO (that means that folks not exactly on your buddy list can get you ranked for all sorts of naughty out-of-context search terms).

Finally fire up your plain text editor again and create a robots.txt file:

User-agent: *
# …
Disallow: /2005/
Disallow: /2006/
Disallow: /2007/
Disallow: /2008/
Disallow: /2009/
Disallow: /2010/
# …
(If you go for the “www” thingy then you must write “” in the sitemaps-autodiscovery statement! The robots.txt goes to the root directory, change the paths to /blog/2007/ etcetera if you don’t blog in the root.)

You may ask why I tell you to remove all references to the archives. The answer is that firstly nobody needs them, and secondly they irritate search engines with senseless and superfluous content duplication. As long as you provide logical, topically organized and short paths to your posts, none of your visitors will browse the archives. Would you use the white pages to lookup a phone number when entries aren’t ordered alphabetically but by date of birth instead? Nope, solely blogging software produces crap like that as sole or at least primary navigation. There are only very few good reasons to browse a blog’s monthly archives, thus a selection list is the perfect navigational widget in this case.

Once you’ve written a welcome post, submit your sitemap to Google and Yahoo!, and you’re done with your basic WordPress SEOing. Bear in mind that you don’t receive shitloads of search engine traffic before you’ve acquired good inbound links. However, then you’ll do much better with the search engines when your crawlability is close to perfect.

Updated 09/03/2007 to add Richard’s pager_fix tip from the comments.

Updated 09/05/2007 Lucia made a very good point. When you copy the code from this page where WordPress “prettifies” even <code>PHP code</code>, you end up with crap. I had to learn that not every reader knows that code must be changed when copied from a page where WordPress replaces wonderful plain single as well as double quotes within code/pre tags with fancy symbols stolen from M$-Word. (Actually, I knew it but sometimes I’m bonelazy.) So here is the clean PHP code from above (don’t mess with the quotes, especially don’t replace double quotes with single quotes!).

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

How to feed old WordPress posts with link love

WordPress phases old posts out. That’s fine with timely contents, but bloggers publishing persistent stuff suffer from a loss of traffic to evergreens by design. If you’re able to hack your template, you can enhance your link structure in a way that funnels visitors and link love to old posts.

For the sake of this case study, lets assume that the archives are uncrawlable, that is blocked in robots.txt and not linked anywhere in a way that search engine crawlers follow the links. To understand the problem, lets look at the standard link structure of WordPress blogs:
Standard WordPress link structure
Say your category archives display 10 posts per page. The first 10 posts gain a fair amount of visibility (for visitors and search engines), but the next 10 posts land on category archive page #2, which is linked solely from the archive pages #1 and #3. And so on, the freshest 10 posts per category are reachable, older posts phase out.

Lets count that in clicks from the main page. The freshest 10 posts are 2 clicks away. The next bunch of 10 posts is 3 clicks away. Posts on category archive page #3 are 4 clicks away. Consider a link level depth greater than 3 crap. Search engines may index these deeply buried posts on popular blogs, but most visitors don’t navigate that deep into the archives.

Now let me confuse you with another picture. This enhanced link structure should feed each and every post on your blog with links:
Standard WordPress link structure
The structure outlined in the picture is an additional layer, it does not replace the standard architecture.

You get one navigational path connecting any page via one hop (category index page #2 on the image) to any post. That’s two clicks from any page on your blog to any post, but such a mega hub (example) comes with disadvantages when you’ve large archives.

Hence we create more paths to deeply buried posts. Both new category index pages provide links to lean categorized links pages which list all post by category (”[category name] overview” in this example corresponding to category index page #1 on the image above). If both category index pages are linked in the sidebar of all pages, you get a couple two-hop-links to all posts from all pages. That means that via a category index page and a lean category links page (example) each and every post is 3 clicks away from any other page.

Now we’ve got a few shorter paths to old posts, but that’s not enough. We want to make use of the lean category links pages to create topical one-hop-links to related posts too. With most templates every post links to one or more category pages. We can’t replace these links because blog savvy readers clicking these category links expect a standard category page. I’ve added my links to the lean categorized links pages below the comments, and there are many more ways to put them, not only on single post pages.

It’s possible to tweak this concept by flooding pages with navigational links to swipe a click level here and there, but that can dilute topical relevancy because it leads to “every page links to every page” patterns which are not exactly effective nor useful. Also, ornate pages load awfully slow and that’s a sure fire way to lose visitors. By the way that’s the reason why I don’t put a category links widget onto my sidebar.

To implement this concept I hacked the template and wrote a PHP script to output the links lists which is embedded in a standard page (/links/categories/). At the moment this experiment is just food for thoughts, because this blog is quite new (I’ve registered the domain a few days ago). However, I expect it will work.

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Google manifested the axe on reciprocal link exchanges

Yesterday Fantomaster via Threadwatcher pointed me to this page of Google’s Webmaster help system. The cache was a few days old and didn’t show a difference, I don’t archive each and every change of the guidelines, so I asked and a friendly and helpful Googler told me that this item was around for a while now. Today this page made it on Sphinn and probably a few other Webmaster hangouts too.

So what the heck is the scandal all about? When you ask Google for help on “link exchange“, the help machine rattles for a second, sighs, coughs, clears its throat and then yells out the answer in bold letters: “Link schemes“, bah!

Ok, we already knew what Google thinks about artificial linkage: “Don’t participate in link schemes designed to increase your site’s ranking or PageRank”. Honestly, what is the intent when I suggest that you link to me and concurrently I link to you? Yup, it means I boost your PageRank and you boost mine, also we chose some nice anchor text and that makes the link deal perfect. In the eyes of Google even such a tiny deal is a link scheme, because both links weren’t put up for users but for search engines.

Pre-Google this kind of link deal was business as usual and considered natural, but frankly back then the links were exchanged for traffic and not for search engine love. We can rant and argue as much as we want, that will not revert the changed character of link swaps nor Google’s take on manipulative links.

Consequently Google has devalued artificial reciprocal links for ages. Pretty much simplified these links nullify each other in Google’s search index. That goes for tiny sins. Folks raising the concept onto larger link networks got caught too but penalized or even banned for link farming.

Obviously all kinds of link swaps are easy to detect algorithmically, even triangular link deals, three way link exchanges and whatnot. I called that plain vanilla link ’swindles’, but only just recently Google has caught up with a scalable solution and seems to detect and penalize most if not all variants covering the whole search index, thanks to the search quality folks in Dublin and Zurich even overseas in whatever languages.

The knowledge that the days of free link trading are numbered was out for years before the exodus. Artificial reciprocal links as well as other linkage considered link spam by Google was and is a pet peeve of Matt’s team. Google sent lots of warnings, and many sane SEOs and Webmasters heard their traffic master’s voice and acted accordingly. Successful link trading just went underground leaving the great unwashed alone with their obsession about exchanging reciprocal links in the public.

Also old news is, that Google does not penalize reciprocal links in general. Google almost never penalizes a pattern or a technique. Instead they try to figure out the Webmaster’s intent and judge case by case based on their findings. And yes, that’s doable with algos, perhaps sometimes with a little help from humans to compile the seed, but we don’t know how perfect the algo is when it comes to evaluations of intent. Natural reciprocal links are perfectly fine with Google. That applies to well maintained blogrolls too, despite the often reciprocal character of these links. Reading the link schemes page completely should make that clear.

Google defines link scheme as “[…] Link exchange and reciprocal links schemes (’Link to me and I’ll link to you.’) […]”. The “I link to you and vice versa” part literally addresses link trading of any kind, not a situation where I link to your compelling contents because I like a particular page, and you return the favour later on because you find my stuff somewhat useful. As Perkiset puts it “linking is now supposed to be like that well known sex act, ‘68? - or, you do me and I’ll owe you one’” and there is truth in this analogy. Sometimes a favor will not be returned. That’s the way the cookie crumbles when you’re keen on Google traffic.

The fact that Google openly said that link exchange schemes designed “exclusively for the sake of cross-linking” of any kind violate their guidelines indicates that first they were sure to have invented the catchall algo, and second that they felt safe to launch it without too much collateral damage. Not everybody agrees, I quote Fantomaster’s critique not only because I like his inimitably parlance:

This is essentially a theological debate: Attempting to determine any given action’s (and by inference: actor’s) “intention” (as in “sinning”) is always bound to open a can of worms or two.

It will always have to work by conjecture, however plausible, which makes it a fundamentally tacky, unreliable and arbitrary process.

The delusion that such a task, error prone as it is even when you set the most intelligent and well informed human experts to it (vide e.g. criminal law where “intention” can make all the difference between an indictment for second or first degree murder…) can be handled definitively by mechanistic computer algorithms is arguably the most scary aspect of this inane orgy of technological hubris and naivety the likes of Google are pressing onto us.

I’ve seen some collateral damage already, but pragmatic Webmasters will find –respectively have found long ago– their way to build inbound links under Google’s regime.

And here is the context of Google’s definition link exchanges = link schemes which makes clear that not each and every reciprocal link is evil:

[…] However, some webmasters engage in link exchange schemes and build partner pages exclusively for the sake of cross-linking, disregarding the quality of the links, the sources, and the long-term impact it will have on their sites. This is in violation of Google’s webmaster guidelines and can negatively impact your site’s ranking in search results. Examples of link schemes can include:

• Links intended to manipulate PageRank
• Links to web spammers or bad neighborhoods on the web
• Link exchange and reciprocal links schemes (’Link to me and I’ll link to you.’)
• Buying or selling links […]

Again, please read the whole page.

Bear in mind that all this is Internet history, it just boiled up yesterday as the help page was discovered.

Related article: Eric Ward on reciprocal links, why they do good, and where they do bad.

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Handling Google’s neat X-Robots-Tag - Sending REP header tags with PHP

It’s a bad habit to tell the bad news first, and I’m guilty of that. Yesterday I linked to Dan Crow telling Google that the unavailable_after tag is useless IMHO. So todays post is about a great thing: REP header tags aka X-Robots-Tags, unfortunately mentioned as second news somewhat concealed in Google’s announcement.

The REP is not only a theatre, it stands for Robots Exclusion Protocol (robots.txt and robots meta tag). Everything you can shove into a robots meta tag on a HTML page can now be delivered in the HTTP header for any file type:

  • INDEX|NOINDEX - Tells whether the page may be indexed or not
  • FOLLOW|NOFOLLOW - Tells whether crawlers may follow links provided on the page or not
  • NOODP - tells search engines not to use page titles and descriptions from the ODP on their SERPs.
  • NOYDIR - tells Yahoo! search not to use page titles and descriptions from the Yahoo! directory on the SERPs.
  • NOARCHIVE - Google specific, used to prevent archiving (cached page copy)
  • NOSNIPPET - Prevents Google from displaying text snippets for your page on the SERPs
  • UNAVAILABLE_AFTER: RFC 850 formatted timestamp - Removes an URL from Google’s search index a day after the given date/time

So how can you serve X-Robots-Tags in the HTTP header of PDF files for example? Here is one possible procedure to explain the basics, just adapt it for your needs:

Rewrite all requests of PDF documents to a PHP script knowing wich files must be served with REP header tags. You could do an external redirect too, but this may confuse things. Put this code in your root’s .htaccess:

RewriteEngine On
RewriteBase /pdf
RewriteRule ^(.*)\.pdf$ serve_pdf.php

In /pdf you store some PDF documents and serve_pdf.php:

$requestUri = $_SERVER[’REQUEST_URI’];

if (stristr($requestUri, “my.pdf”)) {
header(’X-Robots-Tag: index, noarchive, nosnippet’, TRUE);
header(’Content-type: application/pdf’, TRUE);

This setup routes all requests of *.pdf files to /pdf/serve_pdf.php which outputs something like this header when a user agent asks for /pdf/my.pdf:

Date: Tue, 31 Jul 2007 21:41:38 GMT
Server: Apache/1.3.37 (Unix) PHP/4.4.4
X-Powered-By: PHP/4.4.4
X-Robots-Tag: index, noarchive, nosnippet
Connection: close
Transfer-Encoding: chunked
Content-Type: application/pdf

You can do that with all kind of file types. Have fun and say thanks to Google :)

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Unavailable_After is totally and utterly useless

I’ve a lot of respect for Dan Crow, but I’m struggling with my understanding, or possible support, of the unavailable_after tag. I don’t want to put my reputation for bashing such initiatives from search engines at risk, so sit back and grab your popcorn, here comes the roasting:

As a Webmaster, I did not find a single scenario where I could or even would use it. That’s because I’m a greedy traffic whore. A bazillion other Webmasters are greedy too. So how the heck is Google going to sell the newish tag to the greedy masses?

Ok, from a search engine’s perspective unavailable_after makes sound sense. Outdated pages bind resources, annoy searchers, and in a row of useless crap the next bad thing after an outdated page is intentional Webspam.

So convincing the great unwashed to put that thingy on their pages inviting friends and family to granny’s birthday party on 25-Aug-2007 15:00:00 EST would improve search quality. Not that family blog owners care about new meta tags, RFC 850-ish date formats, or search engine algos rarely understanding that the announced party is history on Aug/26/2007. Besides there may be painful aftermaths worth submitting a desperate call for aspirins the day after in the comments, what would be news of the day after expiration. Kinda dilemma, isn’t it?

Seriously, unless CMS vendors support the new tag, tiny sites and clique blogs aren’t Google’s target audience. This initiative addresses large sites which are responsible for a huge amount of outdated contents in Google’s search index.

So what is the large site Webmaster’s advantage of using the unavailable_after tag? A loss of search engine traffic. A loss of link juice gained by the expired page. And so on. Losses of any kind are not that helpful when it comes to an overdue raise nor in salary negotiations. Hence the Webmaster asks for the sack when s/he implements Google’s traffic terminator.

Who cares about Google’s search quality problems when it leads to traffic losses? Nobody. Caring Webmasters do the right thing anyway. And they don’t need no more useless meta tags like unavailable_after. “We don’t need no stinking metas” from “Another Brick in the Wall Part Web 2.0″ expresses my thoughts perfectly.

So what separates the caring Webmaster from the ‘ruthless traffic junky’ who Google wants to implement the unavailable_after tag? The traffic junkie lets his stuff expire without telling Google about it’s state, is happy that frustrated searchers click the URL from the SERPs even years after the event, and enjoys the earnings from tons of ads placed above the content minutes after the party was over. Dear Google, you can’t convince this guy.

[It seems this is a post about repetitive “so whats”. And I came to the point before the 4th paragraph … wow, that’s new … and I’ve put a message in the title which is not even meant as link bait. Keep on reading.]

So what does the caring Webmaster do without the newish unavailable_after tag? Business as usual. Examples:

Say I run a news site where the free contents go to the subscription area after a while. I’d closely watch which search terms generate traffic, write a search engine optimized summary containing those keywords, put that on the sales pitch, and move the original article to the archives accessible to subscribers only. It’s not my fault that the engines think they point to the original article after the move. When they recrawl and reindex the page my traffic will increase because my summary fits their needs more perfectly.

Say I run an auction site. Unfortunately particular auctions expire, but I’m sure that the offered products will return to my site. Hence I don’t close the page, but I search my database for similar offerings and promote them under a H3 heading like “[product] (stuffed keywords) is hot” /H3 P buy [product] here: /P followed by a list of identical products for sale or similar auctions.

Say I run a poll expiring in two weeks. With Google’s newish near real time indexing that’s enough time to collect keywords from my stats, so the textual summary under the poll’s results will attract the engines as well as visitors when the poll is closed. Also, many visitors will follow the links to related respectively new polls.

From Google’s POV there’s nothing wrong with my examples, because the visitor gets what s/he was searching for, and I didn’t cheat. Now tell me, why should I give up these valuable sources of nicely targeted search engine traffic just to make Google happy? Rather I’d make my employer happy. Dear Google, you didn’t convince me.

Update: Tanner Christensen posted a remarkable comment at Sphinn:

I’m sure there is some really great potential for the tag. It’s just none of us have a need for it right now.

Take, for example, when you buy your car without a cup holder. You didn’t think you would use it. But then, one day, you find yourself driving home with three cups of fruit punch and no cup holders. Doh!

I say we wait it out for a while before we really jump on any conclusions about the tag.

John Andrews was the first to report an evil use of unavailable_after.

Also, Dan Crow from Google announced a pretty neat thing in the same post: With the X-Robots-Tag you can now apply crawler directives valid in robots meta tags to non-HTML documents like PDF files or images.

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Analyzing search engine rankings by human traffic

Recently I’ve discussed ranking checkers at several places, and I’m quite astonished that folks still see some value in ranking reports. Frankly, ranking reports are –in most cases– a useless waste of paper and/or disk space. That does not mean that SERP positions per keyword phrase aren’t interesting. They’re just useless without context, that is traffic data. Converting traffic pays the bills, not sole rankings. The truth is in your traffic data.

That said, I’d like to outline a method to get a particular useful information out of raw traffic data: underestimated search terms. That’s not a new attempt, and perhaps you have the reports already, but maybe you don’t look at the information which is somewhat hidden in stats ordered by success, not failure. And you should be –respective employ– a programmer to implement it.

The first step is gathering data. Create a database table to record all hits, then in a footer include or so, when the complete page got outputted already, write all data you have in that table. All data means URL, timestamp, and variables like referrer, user agent, IP, language and so on. Be a data rat, log everything you can get hold of. With dynamic sites it’s easy to add page title, (product) IDs etcetera, with static sites write a tool to capture these attributes separately.

For performance reasons it makes sense to work with a raw data table, which has just a primary key, to log the requests, and normalized working tables which have lots of indexes to allow aggregations, ad hoc queries, and fast reports from different perspectives. Also think of regular purging the raw log table and historization. While transferring raw log data to the working tables in low traffic hours or on another machine you can calculate interesting attributes and add data from other sources which were not available to the logging process.

You’ll need that traffic data collector anyway for a gazillion of purposes where your analytics software fails, is not precise enough, or just can’t deliver a particular evaluation perspective. It’s a prerequisite for the method discussed here, but don’t build a monster sized cannon to chase a fly. You can gather search engine referrer data from logfiles too.

For example an interesting information is on which SERP a user clicked a link pointing to your site. Simplified you need three attributes in your working tables to store this info: search engine, search term, and SERP number. You can extract these values from the HTTP_REFERER.

1. “google” in the server name tells you the search engine.
2. The “q” variable’s value tells you the search term “keyword1 keyword2″.
3. The lack of a “start” variable tells you that the result was placed on the first SERP. The lack of a “num” variable lets you assume that the user got 10 results per SERP, so it’s quite safe to say that you rank in the top 10 for this term. Actually, the number of results per page is not always extractable from the URL because it’s pulled from a cookie usually, but not so many surfers change their preferences (e.g. less than 0.5% surf with 100 results according to JohnMu and my data as well). If you’ve got a “num” value then add 1 and divide the result by 10 to make the data comparable. If that’s not precise enough you’ll spot it afterwards, and you can always recalculate SERP numbers from the canned referrer.

1. and 2. as above.
3. The “start” variable’s value 10 tells you that you got a hit from the second SERP. When start=10 and there is no “num” variable, most probably the searcher got 10 results per page.

1. and 2. as above.
3. The empty “startIndex” variable and startPage=1 are useless, but the lack of “start” and “num” tells you that you’ve got a hit from the 1st spanish SERP.

1. and 2. as above.
3. num=20 tells you that the searcher views 20 results per page, and start=20 indicates the second SERP, so you rank between #21 and #40, thus the (averaged) SERP# is 3.5 (provided SERP# is not an integer in your database).

You got the idea, here is a cheat sheet and official documentation on Google’s URL parameters. Analyze the URLs in your referrer logs and call them with cookies off what disables your personal search preferences, then play with the values. Do that with other search engines too.

Now a subset of your traffic data has a value in “search engine”. Aggregate tuples where search engine is not NULL, then select the results for example where SERP number is lower or equal 3.99 (respectively 4), ordered by SERP number ascending, hits descending and keyword phrase, break by search engine. (Why sorted by traffic descending? You have a report of your best performing keywords already.)

The result is a list of search terms you rank for on the first 4 SERPs, beginning with keywords you’ve probably not optimized for. At least you didn’t optimize the snippet to improve CTR, so your ranking doesn’t generate a reasonable amount of traffic. Before you study the report, throw away your site owner hat and try to think like a consumer. Sometimes those make use of a vocabulary you didn’t think of before.

Research promising keywords, and decide whether you want to push, bury or ignore them. Why bury? Well, in some cases you just don’t want to rank for a particular search term, [your product sucks] being just one example. If the ranking is fine, the search term smells somewhat lucrative, and just the snippet sucks in a particular search query’s context, enhance your SERP listing.

Every once in a while you’ll discover a search term making a killing for your competitors whilst you never spotted it because your stats package reports only the best 500 monthly referrers or so. Also, you’ll get the most out of your rankings by optimizing their SERP CTRs.

Be crative, over time your traffic database becomes more and more valuable, allowing other unconventional and/or site specific reports which off-the-shelf analytics software usually does not deliver. Most probably your competitors use standard analytics software, individually developed algos and reports can make a difference. That does not mean you should throw away your analytics software to reinvent the wheel. However, once you’re used to self developed analytic tools you’ll think of more interesting methods not only to analyse and monitor rankings by human traffic than you can implement in this century ;)

Bear in mind that the method outlined above does not and cannot replace serious keyword research.

Another –very popular– approach to get this info would be automated ranking checks mashed up with hits by keyword phrase. Unfortunately, Google and other engines do not permit automated queries for the purpose of ranking checks, and this method works with preselected keywords, that means you don’t find (all) search terms created by users. Even when you compile your ranking checker’s keyword lists via various keyword research tools, you’ll still miss out on some interesting keywords in your seed list.

Related thoughts: Why regular and automated ranking checks are necessary when you operate seasonal sites by Donna

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Rediscover Google’s free ranking checker!

Nowadays we’re searching via toolbar, personalized homepage, or in the browser address bar by typing in “google” to get the search box, typing in a search query using “I feel lucky” functionality, or -my favorite- typing in

Old fashioned, uncluttered and nevertheless sexy user interfaces are forgotten, and pretty much disliked due to the lack of nifty rounded corners. Luckily Google still maintains them. Look at this beautiful SERP:
Google's free ranking checker
It’s free of personalized search, wonderful uncluttered because the snippets appear as tooltip only, results are nicely numbered from 1 to 1,000 on just 10 awesome fast loading pages, and when I’ve visited my URLs before I spot my purple rankings quickly. is an ideal free ranking checker. It supports &filter=0 and other URL parameters, so it’s a perfect tool when I need to lookup particular search terms.

Mass ranking checks are totally and utterly useless, at least for the average site, and penalized by Google. Well, I can think of ways to semi-automate a couple queries, but honestly, I almost never need that. Providing fully automated ranking reports to clients gave SEO services a more or less well deserved snake oil reputation, because nice rankings for preselected keywords may be great ego food, but they don’t pay the bills. I admit that with some setups automated mass ranking checks make sense, but those are off-topic here.

By the way, Google’s query stats are a pretty useful resource too.

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Getting the most out of Google’s 404 stats

The 404 reports in Google’s Webmaster Central panel are great to debug your site, but they contain URLs generated by invalid –respectively truncated– URL drops or typos of other Webmasters too. Are you sick of wasting the link love from invalid inbound links, just because you lack a suitable procedure to 301-redirect all these 404 errors to canonical URLs?

Your pain ends here. At least when you’re on a *ix server running Apache with PHP 4+ or 5+ and .htaccess enabled. (If you suffer from IIS go search another hobby.)

I’ve developed a tool which grabs all 404 requests, letting you map a canonical URL to each 404 error. The tool captures and records 404s, and you can add invalid URLs from Google’s 404-reports, if these aren’t recorded (yet) from requests by Ms. Googlebot.

It’s kinda layer between your standard 404 handling and your error page. If a request results in a 404 error, your .htaccess calls the tool instead of the error page. If you’ve assigned a canonical URL to an invalid URL, the tool 301-redirects the request to the canonical URL. Otherwise it sends a 404 header and outputs your standard 404 error page. Google’s 404-probe requests during the Webmaster Tools verification procedure are unredirectable (is this a word?).

Besides 1:1 mappings of invalid URLs to canonical URLs you can assign keywords to canonical URLs. For example you can define that all invalid requests go to /fruit when the requested URI or the HTTP referrer (usually a SERP) contain the strings “apple”, “orange”, “banana” or “strawberry”. If there’s no persistent mapping, these requests get 302-redirected to the guessed canonical URL, thus you should view the redirect log frequently to find invalid URLs which deserve a persistent 301-redirect.

Next there are tons of bogus requests from spambots searching for exploits or whatever, or hotlinkers, resulting in 404 errors, where it makes no sense to maintain URL mappings. Just update an ignore list to make sure those get 301-redirected to or a cruel and scary image hosted on your domain or a free host of your choice.

Everything not matching a persistent redirect rule or an expression ends up in a 404 response, as before, but logged so that you can define a mapping to a canonical URL. Also, you can use this tool when you plan to change (a lot of) URLs, it can 301-redirect the old URL to the new one without adding those to your .htaccess file.

I’ve tested this tool for a while on a couple of smaller sites and I think it can get trained to run smoothly without too many edits once the ignore lists etcetera are up to date, that is matching the site’s requisites. A couple of friends got the script and they will provide useful input. Thanks! If you’d like to join the BETA test drop me a message.

Disclaimer: All data get stored in flat files. With large sites we’d need to change that to a database. The UI sucks, I mean it’s usable but it comes with the browser’s default fonts and all that. IOW the current version is still in the stage of “proof of concept”. But it works just fine ;)

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Google helps those who help themselves

And if that’s not enough to survive on Google’s SERPs, try Google’s Webmaster Forum where you can study Adam Lasnik’s FAQ which covers even questions the Webmaster Help Center provides no comprehensive answer for (yet), and where Googlers working in Google’s Search Quality, Webspam, and Webmaster Central teams hang out. Google dumps all sorts of questioners to the forum, where a crowd of hardcore volunteers (aka regulars as Google calls them) invests a lot of time to help out Webmasters and site owners facing problems with the almighty Google.

Despite the sporadic posts by Googlers, the backbone of Google’s Webmaster support channel is this crew of regulars from all around the globe. Google monitors the forum for input and trends, and intervenes when the periodic scandal escalates every once in a while. Apropos scandal … although the list of top posters mentions a few of the regulars, bear in mind that trolls come with a disgusting high posting cadency. Fortunately, currently the signal drowns the noise (again), and I appreciate very much that the Googlers participate more and more.

Some of the regulars like seo101 don’t reveal their URLs and stay anonymous. So here is an incomplete list of folks giving good advice:

If I’ve missed anyone, please drop me a line (I stole the list above from JLH and Red Cardinal, so it’s all their fault!).

So when you’re a Webmaster or site owner, don’t hesitate to post your Google related question (but read the FAQ before posting, and search for your topics), chances are one of these regulars or even a Googler offers assistance. Otherwise when you’re questionless carrying a swag of valuable answers, join the group and share your knowledge. Finally, when you’re a Googler, donate the sites linked above a boost on the SERPs ;)

Micro-meme started by John Honeck, supported by Richard Hearne, Bert Vierstra

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

LZZR Linking™

LZZR Link LoveIn why it is a good thing to link out loud LZZR explains a nicely designed method to accelerate the power of inbound links. Unfortunately this technique involves Yahoo! Pipes, which is evil. Certainly that’s a nice tool to compose feeds, but Yahoo! Pipes automatically inserts the evil nofollow crap. Hence using Pipes’ feed output to amplify links faults caused by the auto-nofollow. I’m sure LZZR can replace this component with ease, if that’s not done already.

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

« Previous Page  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13  Next Page »