SEO-sanitizing a WordPress theme in 5 minutes
When you start a blog with WordPress, you get an overall good crawlability like with most blogging platforms. To get it ranked at search engines your first priority should be to introduce it to your communities acquiring some initial link love. However, those natural links come with disadvantages: canonicalization issues.
“Canonicalization”, what a geeky word. You won’t find it in your dictionary, you must ask Google for a definition. Do the search, the number one result leads to Matt’s blog. Please study both posts before you read on.
Most bloggers linking to you will copy your URLs from the browser’s address bar, or use the neat FireFox “Copy link location” thingy, what leads to canonical inbound links so to say. Others will type in incomplete URLs, or “correct” pasted URLs by removing trailing slashes, “www” prefixes or whatever. Unfortunately, usually both your Web server as well as WordPress are smart enough to find the right page, says your browser at least. What happens totally unseen in the background is that some of these page requests produce a 302-Found elsewhere response, and that search engine crawlers get feeded with various URLs all pointing to the same piece of content. That’s a bad thing with regard to search engine rankings (and enough stuff for a series of longish posts, so just trust me).
Lets begin the WordPress SEO-sanitizing with a fix of the most popular canonicalization issues. Your first step is to tell WordPress that you prefer sane and meaningful URLs without gimmicks. Go to the permalink options, check custom, type in /%postname%/
and save. Later on give each post a nice keyword rich title like “My get rich in a nanosecond scam” and a corresponding slug like “get-rich-in-a-nanosecond”. Next create a plain text file with this code
# Disallow directory browsing:
Options -Indexes
<IfModule mod_rewrite.c>
RewriteEngine On
# Fix www vs. non-www issues:
RewriteCond %{HTTP_HOST} !^your-blog\.com [NC]
RewriteRule (.*) http://your-blog.com/$1 [R=301,L]
# WordPress permalinks:
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
and upload it in ASCII mode to your server’s root as “.htaccess” (if you don’t host your blog in the root or prefer the “www” prefix change the code accordingly). Change your-blog
to your domain name respectively www\.your-domain-name
.
This setup will not only produce user friendly post URLs like http://your-blog.com/hello-world/, it will also route all server errors to your theme’s error page. If you don’t blog in the root, learn here how you should handle HTTP errors outside the /blog/ directory (in any case you should use ErrorDocument directives to capture stuff WordPress can’t/shouldn’t handle, e.g. 401, 403, 5xx errors). Load 404.php in an ASCII editor to check whether it will actually send a 404 response. If the very first lines of code don’t look like
<?php
@header("HTTP/1.1 404 Not found", TRUE, 404);
?>
then insert the code above and make absolutely sure that you’ve not a single whitespace (space, tab, new line) or visible character before the <?php
(grab the code). It doesn’t hurt to make the 404 page friendlier by the way, and don’t forget to check the HTTP response code. Consider calling a 404grabber before you send the 404 header, this is a neat method to do page by page redirects capturing outdated URLs before the visitor gets the error page.
Next you need to hack header.php to fix canonicalization issues which the rewrite rule in the .htaccess file above doesn’t cover. By default WordPress delivers the same page for both the canonical URL http://your-blog.com/hello-world/ and the crappy variant http://your-blog.com/hello-world without the trailing slash. Unfortunately many people who are unaware of Web standards as well as scripts written by clueless assclowns remove trailing slashes to save bandwidth (lame excuse for this bullshit by the way, although even teeny search engines suffer from brain dead code monkeys implementing crap like that).
At the very top of header.php add this PHP code:
(Again, not a single whitespace (space, tab, new line) or visible character before the
<?php
$requestUri = $_SERVER[”REQUEST_URI”];
$uriArr = explode(”#”, $requestUri);
$requestUriBase = $uriArr[0];
$requestUriFragment = $uriArr[1];
$uriArr = explode(”?”, $requestUriBase);
$requestUriBase = $uriArr[0];
$queryString = $_SERVER[”QUERY_STRING”];
if ( substr($requestUriBase, strlen($requestUriBase) -1, 1) != “/” ) {
$canonicalUrl = “http://sebastians-pamphlets.com”;
$canonicalUrl .= $requestUriBase .”/”;
if ($queryString) {
$canonicalUrl .= “?” .$queryString;
}
@header(”HTTP/1.1 301 Moved Permanently”, TRUE, 301);
@header(”Location: $canonicalUrl”);
exit;
}
include(”pagefunctions.php”);
?>
<?php
! Grab this code.)
Of course you need to change my URL in the canonicalUrl variable to yours but I don’t mind when you forget it. There’s no such thing as bad traffic. Beyond the canonicalization done above, at this point you can perform all sorts of URL checks and manipulations.
Now you understand why you’ve added the trailing slash in the permalink settings. Not only does the URL look better as a directory link, the trailing slash also allows you to canonicalize your URLs with ease. This works with all kind of WordPress URLs, even archives (although they shall not be crawlable), category archives, pages, of course the main page and whatnot. It can break links when your template has hard coded references to “index.php”, what is quite popular with the search form and needs a fix anyway, because it leads to at least two URLs serving identical contents.
It’s possible to achieve that with a hack in the blog root’s index.php, respectively a PHP script called in .htaccess to handle canonicalization and then including index.php. The index.php might be touched when you update WordPress, so that’s a file you shouldn’t hack. Use the other variant if you see serious performance issues by running through the whole WordPress logic before a possible 301-redirect is finally done in the template’s header.php.
While you’re at the header.php file, you should fix the crappy post titles WordPress generates by default. Prefixing title tags with your blog’s name is downright obscene, irritates readers, and kills your search engine rankings. Hence replace the PHP statement in the TITLE tag with
<title>
(Grab code.)
<?php
$pageTitle = wp_title(“”,false);
if (empty($pageTitle)) {
$pageTitle = get_bloginfo(”name”);
}
$pageTitle = trim($pageTitle);
print $pageTitle;
?>
</title>
Next delete the references to the archives in the HEAD section:
<?php // wp_get_archives(’type=monthly&format=link’); ?>
The “//” tells PHP that the line contains legacy code, SEO wise at least. If your template comes with “index,follow” robots meta tags and other useless meta crap, delete these unnecessary meta tags too.
Well, there are a few more hacks which make sense, for example level-1 page links in the footer and so on, but lets stick with the mere SEO basics. Now we proceed with plugins you really need.
- Install automated meta tags, activate and forget it.
- Next grab Arne’s sitemaps plugin, activate it and uncheck the archives option in the settings. Don’t generate or submit a sitemap.xml before you’re done with the next steps!
- Because you’re a nice gal/guy, you want to pass link juice to your commenters. Hence you install nofollow case by case or another dofollow plugin preventing you from nofollow insane.
- Stephen Spencer’s SEO title tag plugin is worth a try too. I didn’t get it to work on this blog (that’s why I hacked the title tag’s PHP code in the first place), but that was probably caused by alzheimer light (=lame excuse for either laziness or goofiness) because it works fine on other blogs I’m involved with. Also, meanwhile I’ve way more code in the title tag, for example to assign sane titles to URLs with query strings, so I can’t use it here.
- To eliminate the built-in death by pagination flaw –also outlined here–, you install PagerFix from Richard’s buddy Jaimie Sirovich. Activate it, then hack your templates (category-[category ID].php, category.php, archive.php and index.php) at the very bottom:
(Grab code.) The
<?php
// posts_nav_link(’ — ‘, __(’« Previous Page’), __(’Next Page »’));
pager_fix();
?>
pager_fix()
function replaces the single previous/next links with links pointing to all relevant pages, so that every post is maximal two clicks away from the main page respectively its categorized archive page. Clever.
Did I really promise that applying basic SEO to a WordPress blog is done in five minutes? Well, that’s a lie, but you knew that beforehand. I want you to change your sidebar too. First set the archives widget to display as a drop down list
or remove it completely. Second, if you’ve more than a handful of categories, remove the categories widget and provide another topically organized navigation like category index pages. Linking to every category from every page dilutes relevancy with regard to search engine indexing, and is –with long lists of categories in tiny sidebar widgets– no longer helpful to visitors.
When you’re at your sidebar.php, go check whether the canonicalization recommended above broke the site search facility or not. If you find a line of HTML code like
<form id="searchform" method="get" action="./index.php">
then search is defunct. You should replace this line by
(grab code)
<form id="searchform" method="post” action=”<?php
$searchUrl = get_bloginfo(’url’);
if (substr($searchUrl, -1, 1) != “/”) {
$searchUrl .= “/”;
}
print $searchUrl; ?>”>
and here is why: The URL canonicalization routine adds a trailing slash to ./index.php
what results in a 404 error. Next if the method is “get” you really want to replace that with “post” because firstly with regard to Google’s guildelines crawlable search results are a bad idea, and secondly GET-forms are a nice hook for negative SEO (that means that folks not exactly on your buddy list can get you ranked for all sorts of naughty out-of-context search terms).
Finally fire up your plain text editor again and create a robots.txt file:
(If you go for the “www” thingy then you must write “www.your-site.com” in the sitemaps-autodiscovery statement! The robots.txt goes to the root directory, change the paths to /blog/2007/ etcetera if you don’t blog in the root.)
User-agent: *
# …
Disallow: /2005/
Disallow: /2006/
Disallow: /2007/
Disallow: /2008/
Disallow: /2009/
Disallow: /2010/
# …
Sitemap: http://your-site.com/sitemap.xml
You may ask why I tell you to remove all references to the archives. The answer is that firstly nobody needs them, and secondly they irritate search engines with senseless and superfluous content duplication. As long as you provide logical, topically organized and short paths to your posts, none of your visitors will browse the archives. Would you use the white pages to lookup a phone number when entries aren’t ordered alphabetically but by date of birth instead? Nope, solely blogging software produces crap like that as sole or at least primary navigation. There are only very few good reasons to browse a blog’s monthly archives, thus a selection list is the perfect navigational widget in this case.
Once you’ve written a welcome post, submit your sitemap to Google and Yahoo!, and you’re done with your basic WordPress SEOing. Bear in mind that you don’t receive shitloads of search engine traffic before you’ve acquired good inbound links. However, then you’ll do much better with the search engines when your crawlability is close to perfect.
Updated 09/03/2007 to add Richard’s pager_fix tip from the comments.
Updated 09/05/2007 Lucia made a very good point. When you copy the code from this page where WordPress “prettifies” even <code>PHP code</code>, you end up with crap. I had to learn that not every reader knows that code must be changed when copied from a page where WordPress replaces wonderful plain single as well as double quotes within code/pre tags with fancy symbols stolen from M$-Word. (Actually, I knew it but sometimes I’m bonelazy.) So here is the clean PHP code from above (don’t mess with the quotes, especially don’t replace double quotes with single quotes!).
Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb Subscribe to ![]() ![]() ![]() |
47 comments Sebastian | Duplicate Content, Blogging, Redirects, Web development, XML-Sitemaps, robots.txt, .htaccess, SEO