Google and Yahoo accept undelayed meta refreshs as 301 redirects

Although the meta refresh often gets abused to trick visitors into popup hells by sneaky pages on low-life free hosts (poor man’s cloaking), search engines don’t treat every instance of the meta refresh as Webspam. Folks moving their free hosted stuff to their own domains rely on it to redirect to the new location:
<meta http-equiv=refresh content="0; url=http://example.com/newurl" />

Yahoo clearly states how they treat a zero meta refresh, that is a redirect with a delay of zero seconds:

META Refresh: <meta http-equiv=”refresh” content=…> is recognized as a 301 if it specifies little or no delay or as a 302 if it specifies noticeable delay.

Google is in the process of rewriting their documentation, in the current version of their help documents the meta refresh is not (yet!) mentioned. The Google Mini treats all meta refreshs as 302:

A META tag that specifies http-equiv=”refresh” is handled as a 302 redirect.

but that’s handled differently on the Web. I’ve asked Google’s search evangelist Adam Lasnik and he said:

[The] best idea is to use 301/302s directly whenever possible; otherwise, next best is to do a metarefresh with 0 for a 301. I don’t believe we recommend or support any 302-alternative.

Thanks Adam! I’ll update the last meta refresh thread.

If you have the chance to do 301 redirects don’t mess with the meta refresh. Utilize this method only when there’s absolutely no other chance.

Full stop for search geeks. What follows is an explanation for not that experienced Webmasters in need to move their stuff away from greedy Web content funeral services, aka free hosts of any sort.

Ok, now that we know the major search engines accept an undelayed meta refresh as poor man’s 301 redirect, how should a page having this tag look like in order to act as a provisional permanent redirect? As plain and functional as possible:
<html>
<head>
<title>Moved to new URL: http://example.com/newurl</title>
<meta http-equiv=refresh content="0; url=http://example.com/newurl" />
<meta name="robots" content="noindex,follow" />
</head>
<body>
<h1>This page has been moved to http://example.com/newurl</h1>
<p>If your browser doesn't redirect you to the new location please <a href="http://example.com/newurl"><b>click here</b></a>, sorry for the hassles!</p>
</body>
</html>

As long as the server delivers the content above under the old URL sending a 200-OK, Google’s crawl stats should not list the URL under 404 errors. If it does appear under “Not found”, something went awfully bad, probably on the free host’s side. As long as you’ve control over the account, you must not delete the page because the search engines revisit it from time to time checking whether you still redirect with that URL or not.

[Excursus: When a search engine crawler fetches this page, the server returns a 200-OK because, well, it’s there. Acting as a 301/302 does not make it a standard redirect. That sounds confusing to some people, so here is the technical explanation. Server sided response codes like 200, 302, 301, 404 or 410 are sent by the Web server to the user agent in the HTTP header before the server delivers any page content to the user agent (Web browser, search engine crawler, …). The meta refresh OTOH is a client sided directive telling the user agent to disregard the page’s content and to fetch the given (new) URL to render it instead of the initially requested URL. The browser parses the redirect directive out of the file which was received with a HTTP response code 200 (OK). That’s why you don’t get a 302 or 301 when you use a server header checker.]

When a search engine crawler fetches the page above, that’s just the beginning of a pretty complex process. Search engines are large scaled systems which make use of asynchronous communication between tons of highly specialized programs. The crawler itself has nothing to do with indexing. Maybe it follows server sided redirects instantly, but that’s unlikely with meta refreshs because crawlers just fetch Web contents for unprocessed delivery to a data pool from where all sorts of processes like (vertical) indexers pull their fodder. Deleting a redirecting page in the search index might be done by process A running hourly, whilst process B instructing the crawler to fetch the redirect’s destination runs once a day, then the crawler may be swamped so that it delivers the new content a month later to process C which ran just five minutes before the content delivery and starts again not before next Monday if that’s not a bank holiday…

That means the old page may gets deindexed way before the new URL makes it in the search index. If you change anything during this period, you just confuse the pretty complex chain of processes what means that perhaps the search engine starts over by rolling back all transactions and refetching the redirecting page. Not good. Keep all kind of permanent redirects forever.

Actually, a zero meta refresh works like a 301 redirect because the engines (shall) treat is as a permanent redirect, but it’s not a native 301. In fact, due to so much abuse by spammers it might be considered less reliable than a server sided 301 sent in the HTTP header. Hence you want to express your intention clearly to the engines. You do that with several elements of the meta refresh’ing page:

  • The page title says that the resource was moved and tells the new location. Words like “moved” and “new URL” without surrounding gimmicks clear the message.
  • The zero (second) delay parameter shows that you don’t deliver visible content to (most) human visitors but switch their user agent right to the new URL.
  • The “noindex” robots meta tag telling the engines not to index the actual page’s contents is a signal that you don’t cheat. The “follow” value (referring to links in BODY) is just a fallback mechanismn to ensure that engines having troubles to understand the redirect at least follow and index the “click here” link.
  • The lack of indexable content and keywords makes clear that you don’t try to achieve SE rankings for anything except the new URL.
  • The H1 heading repeating the title tag’s content on the page, visible for users surfing with meta refresh = off, accelerates the message and helps the engines to figure out the seriousness of your intent.
  • The same goes for the text message with a clear call for action underlined with the URL introduced by other elements.

Meta refreshs like other client sided redirects (e.g. window.location = "http://example.com/newurl"; in JavaScript) can be found in every spammer’s toolbox, so don’t leave the outdated content on the page and add a JavaScript redirect only to contentless pages like the sample above. Actually, you don’t need to do that, because the number of users surfing with meta-refresh=off is only a tiny fraction of your visitors, and using JavaScript redirects is way more risky (WRT picky search engines) than a zero meta refresh. Also, JavaScript redirects –if captured by a search engine– should count as 302 and you really don’t want to deal with all the disadvantages of soft redirects.

Another interesting question is whether removing the content from the outdated page makes a difference or not. Doing a mass search+replace to insert the meta tags (refresh and robots) with no further changes to the HTML source might seem attractive from a Webmaster’s perspective. It’s fault-prone however. Creating a list mapping outdated pages to their new locations to feed a quick+dirty desktop program generating the simple HTML code above is actually easier and eliminates a couple points of failure.

Finally: Make use of meta refreshs on free hosts only. Professional hosting firms let you do server sided redirects!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

SEO-sanitizing a WordPress theme in 5 minutes

When you start a blog with , you get an overall good crawlability like with most blogging platforms. To get it ranked at search engines your first priority should be to introduce it to your communities acquiring some initial link love. However, those natural links come with disadvantages: canonicalization issues.

“Canonicalization”, what a geeky word. You won’t find it in your dictionary, you must ask Google for a definition. Do the search, the number one result leads to Matt’s blog. Please study both posts before you read on.

Most bloggers linking to you will copy your URLs from the browser’s address bar, or use the neat FireFox “Copy link location” thingy, what leads to canonical inbound links so to say. Others will type in incomplete URLs, or “correct” pasted URLs by removing trailing slashes, “www” prefixes or whatever. Unfortunately, usually both your Web server as well as WordPress are smart enough to find the right page, says your browser at least. What happens totally unseen in the background is that some of these page requests produce a 302-Found elsewhere response, and that search engine crawlers get feeded with various URLs all pointing to the same piece of content. That’s a bad thing with regard to search engine rankings (and enough stuff for a series of longish posts, so just trust me).

Lets begin the WordPress SEO-sanitizing with a fix of the most popular canonicalization issues. Your first step is to tell WordPress that you prefer sane and meaningful URLs without gimmicks. Go to the permalink options, check custom, type in /%postname%/ and save. Later on give each post a nice keyword rich title like “My get rich in a nanosecond scam” and a corresponding slug like “get-rich-in-a-nanosecond”. Next create a plain text file with this code

# Disallow directory browsing:
Options -Indexes
<IfModule mod_rewrite.c>
RewriteEngine On
# Fix www vs. non-www issues:
RewriteCond %{HTTP_HOST} !^your-blog\.com [NC]
RewriteRule (.*) http://your-blog.com/$1 [R=301,L]
# WordPress permalinks:
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

and upload it in ASCII mode to your server’s root as “.htaccess” (if you don’t host your blog in the root or prefer the “www” prefix change the code accordingly). Change your-blog to your domain name respectively www\.your-domain-name.

This setup will not only produce user friendly post URLs like http://your-blog.com/hello-world/, it will also route all server errors to your theme’s error page. If you don’t blog in the root, learn here how you should handle HTTP errors outside the /blog/ directory (in any case you should use ErrorDocument directives to capture stuff WordPress can’t/shouldn’t handle, e.g. 401, 403, 5xx errors). Load 404.php in an ASCII editor to check whether it will actually send a 404 response. If the very first lines of code don’t look like

<?php
@header("HTTP/1.1 404 Not found", TRUE, 404);
?>

then insert the code above and make absolutely sure that you’ve not a single whitespace (space, tab, new line) or visible character before the <?php (grab the code). It doesn’t hurt to make the 404 page friendlier by the way, and don’t forget to check the HTTP response code. Consider calling a 404grabber before you send the 404 header, this is a neat method to do page by page redirects capturing outdated URLs before the visitor gets the error page.

Next you need to hack header.php to fix canonicalization issues which the rewrite rule in the .htaccess file above doesn’t cover. By default WordPress delivers the same page for both the canonical URL http://your-blog.com/hello-world/ and the crappy variant http://your-blog.com/hello-world without the trailing slash. Unfortunately many people who are unaware of Web standards as well as scripts written by clueless assclowns remove trailing slashes to save bandwidth (lame excuse for this bullshit by the way, although even teeny search engines suffer from brain dead code monkeys implementing crap like that).

At the very top of header.php add this PHP code:

<?php
$requestUri = $_SERVER[”REQUEST_URI”];
$uriArr = explode(”#”, $requestUri);
$requestUriBase = $uriArr[0];
$requestUriFragment = $uriArr[1];
$uriArr = explode(”?”, $requestUriBase);
$requestUriBase = $uriArr[0];
$queryString = $_SERVER[”QUERY_STRING”];
if ( substr($requestUriBase, strlen($requestUriBase) -1, 1) != “/” ) {
$canonicalUrl = “http://sebastians-pamphlets.com”;
$canonicalUrl .= $requestUriBase .”/”;
if ($queryString) {
$canonicalUrl .= “?” .$queryString;
}
@header(”HTTP/1.1 301 Moved Permanently”, TRUE, 301);
@header(”Location: $canonicalUrl”);
exit;
}
include(”pagefunctions.php”);
?>
(Again, not a single whitespace (space, tab, new line) or visible character before the <?php! Grab this code.)

Of course you need to change my URL in the canonicalUrl variable to yours but I don’t mind when you forget it. There’s no such thing as bad traffic. Beyond the canonicalization done above, at this point you can perform all sorts of URL checks and manipulations.

Now you understand why you’ve added the trailing slash in the permalink settings. Not only does the URL look better as a directory link, the trailing slash also allows you to canonicalize your URLs with ease. This works with all kind of WordPress URLs, even archives (although they shall not be crawlable), category archives, pages, of course the main page and whatnot. It can break links when your template has hard coded references to “index.php”, what is quite popular with the search form and needs a fix anyway, because it leads to at least two URLs serving identical contents.

It’s possible to achieve that with a hack in the blog root’s index.php, respectively a PHP script called in .htaccess to handle canonicalization and then including index.php. The index.php might be touched when you update WordPress, so that’s a file you shouldn’t hack. Use the other variant if you see serious performance issues by running through the whole WordPress logic before a possible 301-redirect is finally done in the template’s header.php.

While you’re at the header.php file, you should fix the crappy post titles WordPress generates by default. Prefixing title tags with your blog’s name is downright obscene, irritates readers, and kills your search engine rankings. Hence replace the PHP statement in the TITLE tag with
<title>
<?php
$pageTitle = wp_title(“”,false);
if (empty($pageTitle)) {
$pageTitle = get_bloginfo(”name”);
}
$pageTitle = trim($pageTitle);
print $pageTitle;
?>
</title>
(Grab code.)

Next delete the references to the archives in the HEAD section:
<?php // wp_get_archives(’type=monthly&format=link’); ?>
The “//” tells PHP that the line contains legacy code, SEO wise at least. If your template comes with “index,follow” robots meta tags and other useless meta crap, delete these unnecessary meta tags too.

Well, there are a few more hacks which make sense, for example level-1 page links in the footer and so on, but lets stick with the mere SEO basics. Now we proceed with plugins you really need.

  • Install automated meta tags, activate and forget it.
  • Next grab Arne’s sitemaps plugin, activate it and uncheck the archives option in the settings. Don’t generate or submit a sitemap.xml before you’re done with the next steps!
  • Because you’re a nice gal/guy, you want to pass link juice to your commenters. Hence you install nofollow case by case or another dofollow plugin preventing you from nofollow insane.
  • Stephen Spencer’s SEO title tag plugin is worth a try too. I didn’t get it to work on this blog (that’s why I hacked the title tag’s PHP code in the first place), but that was probably caused by alzheimer light (=lame excuse for either laziness or goofiness) because it works fine on other blogs I’m involved with. Also, meanwhile I’ve way more code in the title tag, for example to assign sane titles to URLs with query strings, so I can’t use it here.
  • To eliminate the built-in death by pagination flaw –also outlined here–, you install PagerFix from Richard’s buddy Jaimie Sirovich. Activate it, then hack your templates (category-[category ID].php, category.php, archive.php and index.php) at the very bottom:
    <?php
    // posts_nav_link(’ — ‘, __(’« Previous Page’), __(’Next Page »’));
    pager_fix();
    ?>
    (Grab code.) The pager_fix() function replaces the single previous/next links with links pointing to all relevant pages, so that every post is maximal two clicks away from the main page respectively its categorized archive page. Clever.

Did I really promise that applying basic SEO to a WordPress blog is done in five minutes? Well, that’s a lie, but you knew that beforehand. I want you to change your sidebar too. First set the archives widget to display as a drop down list or remove it completely. Second, if you’ve more than a handful of categories, remove the categories widget and provide another topically organized navigation like category index pages. Linking to every category from every page dilutes relevancy with regard to search engine indexing, and is –with long lists of categories in tiny sidebar widgets– no longer helpful to visitors.

When you’re at your sidebar.php, go check whether the canonicalization recommended above broke the site search facility or not. If you find a line of HTML code like
<form id="searchform" method="get" action="./index.php">

then search is defunct. You should replace this line by
<form id="searchform" method="post” action=”<?php
$searchUrl = get_bloginfo(’url’);
if (substr($searchUrl, -1, 1) != “/”) {
$searchUrl .= “/”;
}
print $searchUrl; ?>”>
(grab code)
and here is why: The URL canonicalization routine adds a trailing slash to ./index.php what results in a 404 error. Next if the method is “get” you really want to replace that with “post” because firstly with regard to Google’s guildelines crawlable search results are a bad idea, and secondly GET-forms are a nice hook for negative SEO (that means that folks not exactly on your buddy list can get you ranked for all sorts of naughty out-of-context search terms).

Finally fire up your plain text editor again and create a robots.txt file:

User-agent: *
# …
Disallow: /2005/
Disallow: /2006/
Disallow: /2007/
Disallow: /2008/
Disallow: /2009/
Disallow: /2010/
# …
Sitemap: http://your-site.com/sitemap.xml
(If you go for the “www” thingy then you must write “www.your-site.com” in the sitemaps-autodiscovery statement! The robots.txt goes to the root directory, change the paths to /blog/2007/ etcetera if you don’t blog in the root.)

You may ask why I tell you to remove all references to the archives. The answer is that firstly nobody needs them, and secondly they irritate search engines with senseless and superfluous content duplication. As long as you provide logical, topically organized and short paths to your posts, none of your visitors will browse the archives. Would you use the white pages to lookup a phone number when entries aren’t ordered alphabetically but by date of birth instead? Nope, solely blogging software produces crap like that as sole or at least primary navigation. There are only very few good reasons to browse a blog’s monthly archives, thus a selection list is the perfect navigational widget in this case.

Once you’ve written a welcome post, submit your sitemap to Google and Yahoo!, and you’re done with your basic WordPress SEOing. Bear in mind that you don’t receive shitloads of search engine traffic before you’ve acquired good inbound links. However, then you’ll do much better with the search engines when your crawlability is close to perfect.

Updated 09/03/2007 to add Richard’s pager_fix tip from the comments.

Updated 09/05/2007 Lucia made a very good point. When you copy the code from this page where WordPress “prettifies” even <code>PHP code</code>, you end up with crap. I had to learn that not every reader knows that code must be changed when copied from a page where WordPress replaces wonderful plain single as well as double quotes within code/pre tags with fancy symbols stolen from M$-Word. (Actually, I knew it but sometimes I’m bonelazy.) So here is the clean PHP code from above (don’t mess with the quotes, especially don’t replace double quotes with single quotes!).



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

If you free-host your blog flee now!

Run away as fast as you can!Dear free hosted blogger, here is food for thoughts, err my few good reasons to escape the free-hosted blogging hell.

If you don’t own the domain, you don’t own your content. In other words: all free hosts steal content, skim traffic, share your reputation, and whatnot indulging the evil side of life 2.0. You get what you pay for, and you pay with your contents, your reputation, and a share of your traffic.

Of course you’ve the copyrights, but not the full power of disposition. Sooner or later –rather sooner if your blogging experiment becomes a passion and your blog an asset– you want to leave the free host, at least if you don’t decide to abandon your blog. At this point you’ll spot that your ideal blogging world is a dungeon in reality, from which you can’t escape saving your bacon.

For the sake of this discussion I don’t need to argue with worst case scenarios like extremely crappy free hosts which skim a fair amount of your traffic to sell it or feed their cash cows with, plaster your pages with their ads, don’t offer content export interfaces, and more crap like that. I’m talking about a serious operation, the free blogging service from the very nice folks at Google.

Web content is more than a piece of text you wrote. A piece of text anywhere on the ‘net has absolutely no value without the votes which make it findable. Hence the links pointing to a blog post, the feed subscriptions, the comments, and your text build the oneness we refer to as content.

Your text lives in Blogger’s database, which you can tap through the API. Say you want to move your blog to WordPress, what can you pull using the WordPress Blogger importer? Posts and comments, but not all properties of the comments, and some comments are damaged.

  • Many of your commenters are signed in with Blogger, commenting under their blogger account, so you don’t get their email addresses and URLs.
  • Even comments where the author typed in an email addy and URL come in with the author’s name only. I admit that may be a flaw in the WordPress script, but it sucks.
  • Blogger castrates links on save, so links embedded in comments are nofollow’ed. Adding nofollow crap to moderated comments on the fly is evil enough, but storing and exporting condomized comments contributed to your posts counts as damage to property.

According to Google’s very own rules the canonical procedure to move a page to another URL is a permanent redirect. Blogger like most “free” platforms doesn’t allow server sided scripting, so you can’t 301-redirect your posts from blogspot.com to your new blog’s pages. Blogger’s technical flaws (the permalink variable is not yet populated in the HEAD section of the template, hence it can’t be used to redirect to the actual posts’s new location with a zero meta refresh) dilute each post’s PageRank because it can’t be transferred to its new location directly. Every hop (internal link on the new blog pointing to the post from the meta redirect’s destination page) devours a portion of the post’s PageRank.

The missing capability to redirect properly, that is page by page, from blogspot to another blog hinders traffic management a blogger should be able to do, and results in direct as well as indirect traffic losses. It’s still your content, but you’ve not the full power of disposition, and that’s theft, err part of your payment for hosting and services.

PageRank is computed based on a model emulating surfing behavior. Following this theory a loss of PageRank equals a loss of human traffic. The reality is, that you don’t lose traffic in theory. You lose a fair amount of visitors who have clicked a link to a particular post and land through the all pages to one URL redirect on a huge page carrying links pointing to all kind of posts, exactly there, on the links page. A surfer not getting the expected content hits the back button faster than you can explain why this shabby redirect is not your fault. And yes, PageRank is a commodity. The post’s new location will suffer from a loss of search engine traffic, because PageRank is a ranking factor too.

As defined above, a post’s inbound links as well as the PageRank gained thereof belongs to the post, and Blogger steals takes away a fair amount of that when you move away from blogspot. Blogger also steals collects the fee (link love and, in case you move, click throughs from the author’s link) you owe your commenters for contributing content to your blog, regardless whether you stay or go away.

Of course you can jump through overcomplicated hoops by first transferring the blogger blog to its own domain, publishing it there for a while before you install WordPress over the Blogger blog. Blogger’s domain mapping will then do page by page redirects, but you’re stuck with the crappy url structure (timestamp fragments in post URLs). I mean, when I want to cross a street, is it fair to tell me that I can do that anytime but if I’d like to arrive unhurt then I must take the long route, that is a turnabout for an orbit around the earth?

Having said that, there are a few more disadvantages with Blogger even before you move to another platform on your own domain.

  • Blogger inserts links seducing your visitors into leaving your blog, and links to itself (powered by Blogger images) sucking your PageRank.
  • If you have to change a post’s title, Blogger changes the URL too. You can’t avoid that, so all exisiting traffic lands on Blogger’s very own 404 page on blogger.com. The 404 page should be part of the template, hosted on yourblog.blogspot.com, so that you can keep your visitors.
  • Commenting on a Blogger blog is a nightmare with regard to usability, so you miss out on shitloads of user contributed contents.
  • Blogger throws nofollow crap on your comments like confetti, even when you’ve turned comment moderation and captchas on, what should prove that you’ve control over outgoing links in the comments.
  • There is a saboteur in Google’s Blogger team. Every now and then Blogger inserts “noindex” meta tags, even on Google’s very own blogs, or silently deindexes your stuff at all search engines in other ways.
  • Often the overcrowded servers of blogger.com and/or blogspot.com are so slow, that you can’t post nor approve comments, and your visitors don’t get more than the hourglass for 30 minutes and then all of a sudden a fragment of broken XML. This unreliable behavior does not exactly support your goal of building a loyal readership and keeping recurring visitors happy. You suffer when by accident a few blogs on your shared box get slashdotted, digged, stumbled …, and Blogger can’t handle those spikes.
  • Ok, better don’t get me started on a Blogger rant … ;)

By the way we’re in the same boat. When I started my blogging experiment in 2005 I was lazy enough to choose Blogger, although after many years of webmastering, providing Webmaster support and rescuing contents from (respectively writing off contents on) free hosts I should have known that I was going to run into serious troubles. So do yourself a favor and flee now. Blogger is not meant as a platform for professional blogs. It’s Google’s content generator for AdSense. That’s a fair deal with personal blogs, but unacceptable for corporate blogs.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

The technical side of moving a blog from Blogger to WordPress

Recently I had to manage an exodus of Blogger driven posts to this WordPress blog. During the move I learned a few new things, developed a few pieces of code, and thought it might be a good idea to share my experiences. Probably there are other bloggers who want to leave blogspot.com but don’t do it because they are afraid of the aftermaths.

Such a move comes with pitfalls and unavoidable traffic losses, so here is my try to minimize the downsides which you please don’t read as kinda move-Blogger-blog-to-WordPress-guru tutorial.

Loading WordPress with Blogger posts and comments

After installing WordPress on this brand new domain, one of my first steps was to feed it with content, and to announce this content to search engines. Search engines don’t care to index posts totally fucked up due to formating issues, but every indexed URL is an asset I can fine tune later on. I figured that Google would need at least a week or so to index the whole blog and didn’t care much about the other engines, which never sent much visitors to my pamphlets. This week gave me enough time to find the broken pages and to remove PRE tags and HTML comments causing the mess.

I’ve imported my Blogger posts and comments into WordPress using the standard import functionality. Pretty neat script by the way (if it can’t access the Blogger database, look at this plugin). Without even looking at the imported stuff I created an XML sitemap and submitted it to Google (you should submit it to Yahoo! too). In the sitemaps settings I’ve disabled the archives because I really don’t want any engine to index them. Then I created a robots.txt, blocked the archives, and added the sitemap autodiscovery statement so that Yahoo!, MSN and Ask can pick up my new blog too. Replacing the uncrawlable archive pages I’ve created categorized links pages like this one later on.

Now I needed to connect my old blogger posts to the new canonical URLs in order to route the human traffic as well as crawlers to the right pages.

    Here are six technical bits which could be helpful:

  1. The first step was getting the mappings from the blogspot URLs to the new pages. Extracting this info from formatted sources was no option, so I connected to my WordPress database and submitted this query to get the raw data:


    SELECT wp_posts.ID,
    wp_posts.post_title,
    wp_posts.post_name,
    wp_posts.guid,
    wp_postmeta.meta_value
    FROM wp_posts
    LEFT OUTER JOIN wp_postmeta ON (wp_postmeta.post_id = wp_posts.ID)
    WHERE wp_posts.ID > 1 AND wp_posts.ID < 176
    AND wp_posts.post_status = 'publish'
    AND wp_posts.post_type = 'post'
    AND wp_postmeta.meta_key = 'blogger_permalink'
    ORDER BY wp_posts.ID
    LIMIT 176
    (ID #1 was the generated welcome post, and #175 was the ID of the last post imported from Blogger)

  2. Next I wanted a flexible and persistent data source to make use of the Blogger-URL relations for various purposes, so I created a routing table (joins are too expensive, VIEWs were introduced by MySQL 5.1 and I run an older version) to store the URL mappings (Blogger to WordPress) and populated it from the query above:


    CREATE TABLE IF NOT EXISTS `wp_blogger_url_maps` (
    `bum_id` BIGINT( 20 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
    `post_id` BIGINT( 20 ) NOT NULL ,
    `post_title` VARCHAR( 255 ) NOT NULL ,
    `post_name` VARCHAR( 255 ) NOT NULL ,
    `guid` VARCHAR( 255 ) NOT NULL ,
    `bum_blogger_rel_url` VARCHAR( 255 ) NOT NULL ,
    UNIQUE ( `post_id` ) ,
    INDEX ( `post_title` ) ,
    INDEX ( `post_name` ) ,
    INDEX ( `guid` ) ,
    INDEX ( `bum_blogger_rel_url` )
    )
    CHARACTER SET utf8
    COLLATE utf8_general_ci

    Of course you should do that with a script written in a way that you can repeat this procedure. While you’re working on your new blog you’ll still post at blogspot, and visitors will comment. Also, the search engines need to pick up your new blog and that takes a while, so no rush at all.

    Please note that when you perform repeated Blogger imports into a WordPress database which stores imported Blogger posts already, new comments to old posts lose their connection to the post and get assigned to the blog’s main page. There’s no official way to move a comment from there to the post it belongs to. So better copy these comments manually, that’s doable with Better Comments Manager, where you can reply from the comments list and edit the author. However, it may be a good idea to do the final import as one of the last steps to prevent you from too many manual tasks like that.

    Unfortunately the WordPress Blogger import does not change links pointing to blogspot URLs in your posts. I’ve parsed the post_content column for these links as well as image locations at blogger.com and created a list of posts to edit. It is possible to automate that further, at least for the links to other posts, but I had to edit many posts due to formatting issues anyway and didn’t link much to my other posts, so I did that manually (respectively will do over time). It makes sense to parse the imported posts for HTML comments and PRE tags which work fine with Blogger but can break the layout under WordPress.

  3. Here are a few ideas what one can do with such a mapping table. To accomplish it you need a plugin that allows the execution of PHP code within the content of posts and pages.

    Looping the wp_blogger_url_maps table you can create for example an index page of all imported blogspot posts and their new locations.

    Or you could write a mapping tool which delivers new URLs and ask your friends to use it to update their posts linking to you.

  4. What you really should do is writing a redirect script to “link” from your old blogspot posts to the new URLs. Make sure that the redirect code is 301, not 302! If you only set the location you get an unwanted 302 redirect:


    @header(”HTTP/1.1 301 Moved Permanently”, TRUE, 301);
    @header(”Location: $guid”);
    exit;

    Say the script is http://sebastians-pamphlets.com/blogspot.php and accepts an input variable src used to locate the new canonical URL. In your blogger template’s post section you can add this link a while before you move, so that search engines become comfortable with the new locations:

    <em>posted by <a href="http://sebastians-pamphlets.com/blogspot.php?src=~
    <$BlogItemPermalinkUrl$>"><$BlogItemAuthorNickname$></a> @ <a href="<$BlogItemPermalinkUrl$>" title="permanent link"><$BlogItemDateTime$> · PERMANENT LINK</a></em>
    (Remove the “~\n” when you copy the code!)

    Another possible issue you might solve with these links is that you can transfer the source bonus in search engine indexes from the old posts to the new URLs. One indicator used by the engines to figure out which one of two identical page contents is the source is an unidirectional link. The major search engines do that silently, Technorati even documents it with “NEW URL !” as the linked post title (of the old location) when you put a textual hint like “moved to” or so on the old pages.

  5. Later on when you actually move you should change the nickname link to your new blog’s root index page, and put the redirect script in the permalink’s href. Then insert a robots meta tag “noindex,follow” in the blogger template and add prominent links to the new URLs to each post.

    Monitor the traffic on blogspot (use any free blog stats package or tools like MBL). When most visitors populate your new blog and the search engines have indexed it completely, you can redirect all blogspot URLs to your new address by inserting a zero refresh redirect meta tag:

    <meta http-equiv=refresh content="0; url=http://sebastians-pamphlets.com/about/sebastianx-blogspot-com/" />

    That will transfer PageRank as well as human visitors to the URL above. It does not route directly to the posts the visitors expected to see when clicking a link on another blog or a SERP (in the HEAD section of the blogger template the variable <$BlogItemPermalinkUrl$> is not yet populated with the permalink so you’ve to use a hard coded URL in the meta refresh directive).

    The redirects from blogspot to my blog come with disadvantages because it’s not possible to map each and every URL to a corresponding page on this site. From a few pages like archives and so on I can link out to my exodus page or the root, but can’t map a post-URL or so. So I’ve not yet decided whether I’ll do the final redirect or not.

    Of course it would transfer the PageRank from my old blog to this site, but it certainly would confuse visitors who click a link to any post on blogspot and land on the exodus page or the main page here. I guess that’s a sure-fire procedure to lose visitors. I tend to leave the ugly blogspot thingy as it is now, plastered with links pointing here. I’d rather miss out on a few folks who read my old stuff at blogspot and don’t click through to this blog, than piss off way more visitors with a zero meta refresh. Also, the old blog’s main page only showed a toolbar PR 4 and I’m not afraid to write that off, especially because the very nice folks linking to me in their blogrolls have changed the URL already, and a few friends have even edited their posts — THANKS!– linking to me.

  6. Redirecting folks consuming my stuff in their feed readers was quite easy. I’ve burned this blog’s feed with Feedburner’s MyBrand (free, you get a feed URL like http://feeds.sebastians-pamphlets.com/SebastiansPamphlets). When I was ready to move my still buggy blog I wrote a farewell Blogger post, waited a day to reach most if not all readers, then I redirected my old blogger feed to feedburner resulting in a nice spike (from zero to 150 subscribers) in my Feedburner stats. In Blogger go to Settings/Feed, enter the new feed’s URL and you’re done. You can do that without burning your feed too, but then you miss out on the stats.

Well, a few days after the move everything runs smoothly and as expected. Google has indexed and ranked 60 pages and still counting, I spotted my very first SERP referrer, and the number of indexed pages from blogspot.com decreases slowly caused by the noindex robots meta tag. The other engines are still crawling, only Yahoo has indexed 3 pages and 100 inbound links so far. StumbleUpon users liked my not that serious canonical SEO definitions and created more buzz than Sphinn so far. I feel lucky.

My to-do list is here and if you’re interested in my scripts drop me a message in the comments.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Google is neat

August/16/2007: I’ve installed WordPress 8 days ago on this brand new domain.
August/17/2007: I’ve submitted an XML sitemap to Google.
August/18/2007: I’ve (somewhat hidden) linked to this domain from my old blog.
August/23/2007: Ms. Googlebot has crawled 749 pages from this blog, 9 pages made it in the Web search index so far.
August/24/2007: I got the very first hit from a Google SERP for [Google is neat]:
Google is neat - my first SERP referrer
Considering the number of results for this search term I think my #4 spot is not too bad, although it’s purely based on BlitzIndexing and certainly not to stay for long. The same post from my old blog ranks #22 for this search, probably caused by its link (via a 301 redirect script) to the new URL.

Interestingly the search query URL in my referrer stats is too clean, it lacks all the gimmicks Google adds when one searches with a browser. So who did that to alert me on the indexing? Thanks for choosing such a neat search term! :)



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Welcome back at Sebastian’s Pamphlets!

Dear friends and loyal readers,

I didn’t post for a while because I was working on this blog in my spare time. Not a big loss by the way, since SES in San Jose floods the SEOsphere with enough tidbits from the Google Dance and other events like paid link sermons currently. Ok, back to the breaking news: a week ago I got sebastians-pamphlets.com (unfortunately the owner of sebastian.com was not willing to sell the domain, although he has no Web site and I promised to keep his email addresses intact and working) and started my Blogger exodus. Meanwile I’ve imported the old stuff from sebastianx.blogspot.com, and hopefully I’ve solved all the various issues breaking my layout here.

Actually, I’m launching a buggy blog, but I can’t stand my ugly free hosted outlet any more. So please save your comments before you hit submit, because the AJAXed comment script erases it if you forget to type in the spam protection stuff, or the answer is wrong, or whatnot. There are more open items on my to-do list, but nothing that critical.

Please update your blogrolls: http://sebastians-pamphlets.com/ (without the world wide wait prefix)
and your feed readers: http://feeds.sebastians-pamphlets.com/SebastiansPamphlets (I’ll redirect the old feed when my Blogger farewell post has hit most if not all feed readers). If you’d like to update your links to particular posts too, here is a tool to find the new URLs. ;)

WordPress is kinda plug ‘n play thingy, at least that’s how I’ve used it before. Installing and configuring WordPress for my personal blog was a completely other story. I looked at more details, and found many things I wanted to change or optimize. Now I’m the proud owner of the most hacked template ever. Some things weren’t doable with plugins and template-hacks, so I wrote a couple scripts too, producing enough fodder for a soon to write blog post or two. And of course I had to enhance the WordPress link structure.

At least temporarily I’ll get rid of all my search engine traffic, because I can’t redirect properly from Blogger and must noindex the old blog to avoid dupe issues. It would be nice when you dear readers could feed me with links until the engines rediscover me! ;)

I’m eager to get feedback on this blog, so please don’t hesitate to leave your dofollowed comments despite the AJAX pitfall mentioned above. I really hope you’ll like it, but I do appreciate your critiques very much because they’ll help me to improve things for your convenience.

Thank you and happy commenting!
Sebastian



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

How to feed old WordPress posts with link love

WordPress phases old posts out. That’s fine with timely contents, but bloggers publishing persistent stuff suffer from a loss of traffic to evergreens by design. If you’re able to hack your template, you can enhance your link structure in a way that funnels visitors and link love to old posts.

For the sake of this case study, lets assume that the archives are uncrawlable, that is blocked in robots.txt and not linked anywhere in a way that search engine crawlers follow the links. To understand the problem, lets look at the standard link structure of WordPress blogs:
Standard WordPress link structure
Say your category archives display 10 posts per page. The first 10 posts gain a fair amount of visibility (for visitors and search engines), but the next 10 posts land on category archive page #2, which is linked solely from the archive pages #1 and #3. And so on, the freshest 10 posts per category are reachable, older posts phase out.

Lets count that in clicks from the main page. The freshest 10 posts are 2 clicks away. The next bunch of 10 posts is 3 clicks away. Posts on category archive page #3 are 4 clicks away. Consider a link level depth greater than 3 crap. Search engines may index these deeply buried posts on popular blogs, but most visitors don’t navigate that deep into the archives.

Now let me confuse you with another picture. This enhanced link structure should feed each and every post on your blog with links:
Standard WordPress link structure
The structure outlined in the picture is an additional layer, it does not replace the standard architecture.

You get one navigational path connecting any page via one hop (category index page #2 on the image) to any post. That’s two clicks from any page on your blog to any post, but such a mega hub (example) comes with disadvantages when you’ve large archives.

Hence we create more paths to deeply buried posts. Both new category index pages provide links to lean categorized links pages which list all post by category (”[category name] overview” in this example corresponding to category index page #1 on the image above). If both category index pages are linked in the sidebar of all pages, you get a couple two-hop-links to all posts from all pages. That means that via a category index page and a lean category links page (example) each and every post is 3 clicks away from any other page.

Now we’ve got a few shorter paths to old posts, but that’s not enough. We want to make use of the lean category links pages to create topical one-hop-links to related posts too. With most templates every post links to one or more category pages. We can’t replace these links because blog savvy readers clicking these category links expect a standard category page. I’ve added my links to the lean categorized links pages below the comments, and there are many more ways to put them, not only on single post pages.

It’s possible to tweak this concept by flooding pages with navigational links to swipe a click level here and there, but that can dilute topical relevancy because it leads to “every page links to every page” patterns which are not exactly effective nor useful. Also, ornate pages load awfully slow and that’s a sure fire way to lose visitors. By the way that’s the reason why I don’t put a category links widget onto my sidebar.

To implement this concept I hacked the template and wrote a PHP script to output the links lists which is embedded in a standard page (/links/categories/). At the moment this experiment is just food for thoughts, because this blog is quite new (I’ve registered the domain a few days ago). However, I expect it will work.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Hello world!

Howdy! I’m in the process of moving my blog from blogspot to this domain. As long as you see a ToDo List here, I’m not yet ready. Ok, there’s no such thing as a blog that’s “ready”. Hence the ToDo list is here to stay. Probably forever. More on the bugs & flaws page.

    Unordered & unprioritized ToDo List

  • OK: dofollow plugin
  • OK: feedburner
  • OK: navigation in sidebar
  • OK: links + OK:blogroll
  • OK: remove “be the first …” on pages w/o comments
  • OK: Twitter and other gadgets
  • OK: configure titles (<title><?php wp_title(”",true); ?></title> (actually that’s more code coz wp_title(”",true) delivers nil for the home page)) and meta tags
  • tag old blogger posts
  • OK: 301 sebastian-x.com
  • OK: post background color too dark (blockquotes) and other CSS hacks
  • OK: XML sitemap
  • credits on links page
  • enhance site search when Google has indexed everything
  • links list blogspot commenters
  • OK: friendlier 404 page which actually sends a 404
  • contact page
  • edit blogspot links in imported posts and comments (when you get pings because I’ve edited an old post of mine, I’m sorry.)
  • OK: edit categories to enable nice categorized links lists feeding old posts with link juice
  • OK: Feed buttons
  • OK: Share/bookmark links
  • OK: change author links
  • OK: linking from blogspot posts to new URLs
  • OK: sitemaps/categories page with feed links/list of articles
  • fix the disabled button + text area after AJAXed comment submissions
  • advertising? Ynot.
  • OK: think about testing vs. launching early (Hi Marissa!)
  • OK: bait crawlers (Ms. Googlebot grabbed her milk & cookies a day after I’ve registered the domain, but is confused because I’ve changed the IP address a few days later)
  • OK: ask friends for href changes
  • wait for something unspecified
  • OK: push the red button, publish the moved-draft at blogger, redirect the blogger feeds to feedburner, and turn this thingy to public
  • FINALLY: redirect the blogspot url

I’ll write a post on a few technical aspects of the move in a few days.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Google’s 5 sure-fire steps to safer indexing

Nofollow plagueAre you wondering why Gray Hat Search Engine News (GHN) is so quiet recently?

One reason may be that I’ve borrowed their Google savvy spy. I’ve sent him to Mountain View again to learn more about Google’s nofollow strategy.

He returned with a copy of Google’s recently revised mission statement, discovered in the wastebasket of a conference room near office 211 in building 43. Read the shocking and unbelievable head note printed in bold letters:

Google’s mission is to condomize the world’s information and make it universally uncrawlable and useless.

Read and reread it, then some weird facts begin to make sense. Now you’ll understand why:

  1. The rel-nofollow plague was designed to maximize collateral damage by devaluing all hyperlinked votes by honest users of nearly all platforms you’re using everyday, for example Twitter, Wikipedia, corporate blogs, GoogleGroups … ostensibly to nullify the efforts of a few spammers.
  2. Nobody bothers to comment on your nofollow’ed blog.
  3. Google invented the supplemental index (to store scraped resources suffering from too many condomized links) and why it grows faster than the main index.
  4. Google installed the Bigdaddy infrastructure (to prevent Ms. Googlebot from following nofollow’ed links).
  5. Google switched to BlitzCrawling (to list timely contents for a moment whilst fat resources from large archives get buried in the supplemental index). RIP deep crawler and freshbot.

Seriously, the deep crawler isn’t defunct, it’s called supplemental crawler nowadays, and the freshbot is still alive as Feedfetcher.

Disclaimer: All these hard facts were gathered by torturing sources close to Google, robbery and other unfair methods. If anyone bothers to debunk all that as bad joke, one question still remains: Why does Google next to nothing to stop the nofollow plague? I mean, ongoing mass abuse of rel-nofollow is obviously counterproductive with regard to their real mission.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Ego food from John’s barbecue

JohnMu grilled me ;)

Check out his folks bin frequently for readable Webmaster interviews.

Thanks John, it was fun :)



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

« Previous Page  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28  Next Page »