SEO-sanitizing a WordPress theme in 5 minutes

When you start a blog with , you get an overall good crawlability like with most blogging platforms. To get it ranked at search engines your first priority should be to introduce it to your communities acquiring some initial link love. However, those natural links come with disadvantages: canonicalization issues.

“Canonicalization”, what a geeky word. You won’t find it in your dictionary, you must ask Google for a definition. Do the search, the number one result leads to Matt’s blog. Please study both posts before you read on.

Most bloggers linking to you will copy your URLs from the browser’s address bar, or use the neat FireFox “Copy link location” thingy, what leads to canonical inbound links so to say. Others will type in incomplete URLs, or “correct” pasted URLs by removing trailing slashes, “www” prefixes or whatever. Unfortunately, usually both your Web server as well as WordPress are smart enough to find the right page, says your browser at least. What happens totally unseen in the background is that some of these page requests produce a 302-Found elsewhere response, and that search engine crawlers get feeded with various URLs all pointing to the same piece of content. That’s a bad thing with regard to search engine rankings (and enough stuff for a series of longish posts, so just trust me).

Lets begin the WordPress SEO-sanitizing with a fix of the most popular canonicalization issues. Your first step is to tell WordPress that you prefer sane and meaningful URLs without gimmicks. Go to the permalink options, check custom, type in /%postname%/ and save. Later on give each post a nice keyword rich title like “My get rich in a nanosecond scam” and a corresponding slug like “get-rich-in-a-nanosecond”. Next create a plain text file with this code

# Disallow directory browsing:
Options -Indexes
<IfModule mod_rewrite.c>
RewriteEngine On
# Fix www vs. non-www issues:
RewriteCond %{HTTP_HOST} !^your-blog\.com [NC]
RewriteRule (.*) http://your-blog.com/$1 [R=301,L]
# WordPress permalinks:
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

and upload it in ASCII mode to your server’s root as “.htaccess” (if you don’t host your blog in the root or prefer the “www” prefix change the code accordingly). Change your-blog to your domain name respectively www\.your-domain-name.

This setup will not only produce user friendly post URLs like http://your-blog.com/hello-world/, it will also route all server errors to your theme’s error page. If you don’t blog in the root, learn here how you should handle HTTP errors outside the /blog/ directory (in any case you should use ErrorDocument directives to capture stuff WordPress can’t/shouldn’t handle, e.g. 401, 403, 5xx errors). Load 404.php in an ASCII editor to check whether it will actually send a 404 response. If the very first lines of code don’t look like

<?php
@header("HTTP/1.1 404 Not found", TRUE, 404);
?>

then insert the code above and make absolutely sure that you’ve not a single whitespace (space, tab, new line) or visible character before the <?php (grab the code). It doesn’t hurt to make the 404 page friendlier by the way, and don’t forget to check the HTTP response code. Consider calling a 404grabber before you send the 404 header, this is a neat method to do page by page redirects capturing outdated URLs before the visitor gets the error page.

Next you need to hack header.php to fix canonicalization issues which the rewrite rule in the .htaccess file above doesn’t cover. By default WordPress delivers the same page for both the canonical URL http://your-blog.com/hello-world/ and the crappy variant http://your-blog.com/hello-world without the trailing slash. Unfortunately many people who are unaware of Web standards as well as scripts written by clueless assclowns remove trailing slashes to save bandwidth (lame excuse for this bullshit by the way, although even teeny search engines suffer from brain dead code monkeys implementing crap like that).

At the very top of header.php add this PHP code:

<?php
$requestUri = $_SERVER[”REQUEST_URI”];
$uriArr = explode(”#”, $requestUri);
$requestUriBase = $uriArr[0];
$requestUriFragment = $uriArr[1];
$uriArr = explode(”?”, $requestUriBase);
$requestUriBase = $uriArr[0];
$queryString = $_SERVER[”QUERY_STRING”];
if ( substr($requestUriBase, strlen($requestUriBase) -1, 1) != “/” ) {
$canonicalUrl = “http://sebastians-pamphlets.com”;
$canonicalUrl .= $requestUriBase .”/”;
if ($queryString) {
$canonicalUrl .= “?” .$queryString;
}
@header(”HTTP/1.1 301 Moved Permanently”, TRUE, 301);
@header(”Location: $canonicalUrl”);
exit;
}
include(”pagefunctions.php”);
?>
(Again, not a single whitespace (space, tab, new line) or visible character before the <?php! Grab this code.)

Of course you need to change my URL in the canonicalUrl variable to yours but I don’t mind when you forget it. There’s no such thing as bad traffic. Beyond the canonicalization done above, at this point you can perform all sorts of URL checks and manipulations.

Now you understand why you’ve added the trailing slash in the permalink settings. Not only does the URL look better as a directory link, the trailing slash also allows you to canonicalize your URLs with ease. This works with all kind of WordPress URLs, even archives (although they shall not be crawlable), category archives, pages, of course the main page and whatnot. It can break links when your template has hard coded references to “index.php”, what is quite popular with the search form and needs a fix anyway, because it leads to at least two URLs serving identical contents.

It’s possible to achieve that with a hack in the blog root’s index.php, respectively a PHP script called in .htaccess to handle canonicalization and then including index.php. The index.php might be touched when you update WordPress, so that’s a file you shouldn’t hack. Use the other variant if you see serious performance issues by running through the whole WordPress logic before a possible 301-redirect is finally done in the template’s header.php.

While you’re at the header.php file, you should fix the crappy post titles WordPress generates by default. Prefixing title tags with your blog’s name is downright obscene, irritates readers, and kills your search engine rankings. Hence replace the PHP statement in the TITLE tag with
<title>
<?php
$pageTitle = wp_title(“”,false);
if (empty($pageTitle)) {
$pageTitle = get_bloginfo(”name”);
}
$pageTitle = trim($pageTitle);
print $pageTitle;
?>
</title>
(Grab code.)

Next delete the references to the archives in the HEAD section:
<?php // wp_get_archives(’type=monthly&format=link’); ?>
The “//” tells PHP that the line contains legacy code, SEO wise at least. If your template comes with “index,follow” robots meta tags and other useless meta crap, delete these unnecessary meta tags too.

Well, there are a few more hacks which make sense, for example level-1 page links in the footer and so on, but lets stick with the mere SEO basics. Now we proceed with plugins you really need.

  • Install automated meta tags, activate and forget it.
  • Next grab Arne’s sitemaps plugin, activate it and uncheck the archives option in the settings. Don’t generate or submit a sitemap.xml before you’re done with the next steps!
  • Because you’re a nice gal/guy, you want to pass link juice to your commenters. Hence you install nofollow case by case or another dofollow plugin preventing you from nofollow insane.
  • Stephen Spencer’s SEO title tag plugin is worth a try too. I didn’t get it to work on this blog (that’s why I hacked the title tag’s PHP code in the first place), but that was probably caused by alzheimer light (=lame excuse for either laziness or goofiness) because it works fine on other blogs I’m involved with. Also, meanwhile I’ve way more code in the title tag, for example to assign sane titles to URLs with query strings, so I can’t use it here.
  • To eliminate the built-in death by pagination flaw –also outlined here–, you install PagerFix from Richard’s buddy Jaimie Sirovich. Activate it, then hack your templates (category-[category ID].php, category.php, archive.php and index.php) at the very bottom:
    <?php
    // posts_nav_link(’ — ‘, __(’« Previous Page’), __(’Next Page »’));
    pager_fix();
    ?>
    (Grab code.) The pager_fix() function replaces the single previous/next links with links pointing to all relevant pages, so that every post is maximal two clicks away from the main page respectively its categorized archive page. Clever.

Did I really promise that applying basic SEO to a WordPress blog is done in five minutes? Well, that’s a lie, but you knew that beforehand. I want you to change your sidebar too. First set the archives widget to display as a drop down list or remove it completely. Second, if you’ve more than a handful of categories, remove the categories widget and provide another topically organized navigation like category index pages. Linking to every category from every page dilutes relevancy with regard to search engine indexing, and is –with long lists of categories in tiny sidebar widgets– no longer helpful to visitors.

When you’re at your sidebar.php, go check whether the canonicalization recommended above broke the site search facility or not. If you find a line of HTML code like
<form id="searchform" method="get" action="./index.php">

then search is defunct. You should replace this line by
<form id="searchform" method="post” action=”<?php
$searchUrl = get_bloginfo(’url’);
if (substr($searchUrl, -1, 1) != “/”) {
$searchUrl .= “/”;
}
print $searchUrl; ?>”>
(grab code)
and here is why: The URL canonicalization routine adds a trailing slash to ./index.php what results in a 404 error. Next if the method is “get” you really want to replace that with “post” because firstly with regard to Google’s guildelines crawlable search results are a bad idea, and secondly GET-forms are a nice hook for negative SEO (that means that folks not exactly on your buddy list can get you ranked for all sorts of naughty out-of-context search terms).

Finally fire up your plain text editor again and create a robots.txt file:

User-agent: *
# …
Disallow: /2005/
Disallow: /2006/
Disallow: /2007/
Disallow: /2008/
Disallow: /2009/
Disallow: /2010/
# …
Sitemap: http://your-site.com/sitemap.xml
(If you go for the “www” thingy then you must write “www.your-site.com” in the sitemaps-autodiscovery statement! The robots.txt goes to the root directory, change the paths to /blog/2007/ etcetera if you don’t blog in the root.)

You may ask why I tell you to remove all references to the archives. The answer is that firstly nobody needs them, and secondly they irritate search engines with senseless and superfluous content duplication. As long as you provide logical, topically organized and short paths to your posts, none of your visitors will browse the archives. Would you use the white pages to lookup a phone number when entries aren’t ordered alphabetically but by date of birth instead? Nope, solely blogging software produces crap like that as sole or at least primary navigation. There are only very few good reasons to browse a blog’s monthly archives, thus a selection list is the perfect navigational widget in this case.

Once you’ve written a welcome post, submit your sitemap to Google and Yahoo!, and you’re done with your basic WordPress SEOing. Bear in mind that you don’t receive shitloads of search engine traffic before you’ve acquired good inbound links. However, then you’ll do much better with the search engines when your crawlability is close to perfect.

Updated 09/03/2007 to add Richard’s pager_fix tip from the comments.

Updated 09/05/2007 Lucia made a very good point. When you copy the code from this page where WordPress “prettifies” even <code>PHP code</code>, you end up with crap. I had to learn that not every reader knows that code must be changed when copied from a page where WordPress replaces wonderful plain single as well as double quotes within code/pre tags with fancy symbols stolen from M$-Word. (Actually, I knew it but sometimes I’m bonelazy.) So here is the clean PHP code from above (don’t mess with the quotes, especially don’t replace double quotes with single quotes!).



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

47 Comments to "SEO-sanitizing a WordPress theme in 5 minutes"

  1. Arafela on 31 August, 2007  #link

    Thanks Sebastian for that post!
    2 things are exactly what i was looking for in this very moment..and boom -> your update in RSS:-)
    made my day!

  2. Sebastian on 31 August, 2007  #link

    Arafela, such a comment makes my day :)
    Sometimes I ask myself whether my dull geek tips get read + implemented or not. I’m glad this post was helpful.

  3. Tim on 31 August, 2007  #link

    I’m going to give this a try… may God have mercy on my soul :)

  4. john andrews on 31 August, 2007  #link

    Hmm…. I rad this whole article, and my blog is still not optimized. What am I doing wrong? ;-)

  5. Richard Hearne on 1 September, 2007  #link

    Very good overview of setting up WP. One thing that I also find interesting is the ‘Death by Pagination’ issue. The next/previous default page navigation of WP is an issue, and there is a very nice plugin you can install to fix this issue. The same issue applies to all large sites, and one site I worked on recently we tried very hard to make sure that the navigation structure kept all silo sub-pages interlinked with very good results achieved.

    Rgds
    Richard

  6. lucia on 1 September, 2007  #link

    Before I do it, is there any reason not to substitute

    $canonicalUrl = ‘http://’.$_SERVER[’SERVER_NAME’];

    for the hardcoded:
    $canonicalUrl = “http://sebastians-pamphlets.com”;

  7. Mark Fulton on 1 September, 2007  #link

    Awesome post, thanks very much for the tips! I have implemented your header hacks on my site.

    I didn’t know about the categories list either, I will have to work on replacing my long list.

  8. Tad Chef on 1 September, 2007  #link

    I’m too lazy for all that .htaccess stuff. In Google Webmaster Tools you can tell Google to use just one of the URLs, either www or non-www. It’s the ” Preferred domain” link.

  9. Glenn on 1 September, 2007  #link

    If using Apache virtual hosts, another way of handling the 301 redirect is in the vhost definition … if available, maybe this is more efficient than a .htaccess redirect?

    ServerName www.yoursite.com

    RedirectPermanent / “http://yoursite.com/”

    I’m wondering whether you can fix the trailing slash in the vhost as well?

    Also, when you have configured a static page as your home page (say /home), Wordpress will present duplicate pages for bots yoursite.com/ and yoursite.com/home/ - a robots.txt entry can suppress the the /home/ version, do you think this is the best way to avoid this duplication?

  10. Sebastian on 2 September, 2007  #link

    Tim, you can’t do much wrong, mercy seems granted ;)

  11. Sebastian on 2 September, 2007  #link

    John, I did say this article comes with one blatant lie, didn’t I? ;)

  12. Sebastian on 2 September, 2007  #link

    Richard, can you post a link to this plugin? TIA

  13. Sebastian on 2 September, 2007  #link

    Lucia, actually no. For this article I edited my code changing a variable to the literal. SERVER_NAME will do the trick and allows distribution too.

  14. Sebastian on 2 September, 2007  #link

    Thanks Mark, perhaps you can borrow ideas from my variations of the category lists.

  15. Sebastian on 2 September, 2007  #link

    Tad, I agree that it’s a good idea to set this in Webmaster Central too, but it serves another purpose over there. The Google setting is not transparent to other crawlers, and comes witout warranties for Googlebot itself. You really need to implement it on your server.

  16. Sebastian on 2 September, 2007  #link

    Glenn, usually server configurations aren’t accessible for the Webmaster, whilst on most hosts one can use .htaccess files. Also on most hosts a server will respond to any not configured subdomain and you can’t (shouldn’t) cover all possible typos like “ww” “w3″ “blogg” and whatever. Just saying “if the server name is not the canonical server name then do the redirect” is way easier. The /home stuff can be covered with PHP where I’ve checked for trailing slashes, or with a redirect in .htacces like RedirectPermanent /home/ http://yoursite.com/. I’d do it with PHP because then I’ve all the WordPress related redirects in one piece of code.

  17. Arafela on 2 September, 2007  #link

    Tad and Sebastian. I would like to add that it’s important to make a permanent redirect (301) from non-www to www (or www -> non-www). There could be a problem of floating PR. Better concentrate it on 1 single domain and not spread it over all your subdomains. That’s why “Preferred domain” in Google webmaster tools does only visual job. It’s still important to make 301 redirect to concentrate PR on a single FQDN (fully qualified domain name).

  18. Richard Hearne on 2 September, 2007  #link

    Sorry

    I was being lazy, and it’s quite difficult to spell Jamie’s second name (Sirovich):

    http://www.seoegghead.com/blog/seo/updated-pagerfix-plugin-code-for-wp-21-p202.html

    Plugin is cool, but learning about the issue is more important actually.

    Best rgds
    Richard

  19. […] talks about Seo Sanitizing a WordPress in 5 minutes.  Very useful post, especially for a lazy n00b like […]

  20. Sebastian on 3 September, 2007  #link

    Thanks Richard! Installed, tested and added to the list of must-have plugins.

  21. […] Sebastian’s pamphlets great tips at optimize Wordpress for search engines. I’ll be implementing most of these at my […]

  22. […] Ein interessanter Artikel über einige empfohlene Maßnahmen Wordpress vor dem Start eines Blogs abzusichern und SEO-Probleme zu beheben. So wird z.B. das Problem der gelöst, dass die selben Inhalte unter mehreren URLs gefunden werden. Etwas , was Suchmaschinen gar nicht mögen. zum Artikel […]

  23. Reader Tips: 09 September 2007 on 9 September, 2007  #link

    […] Wordpress SEO Basics: The article explains how you can SEO-sanitize your Wordpress blog in 5 minutes. It covers most of the basic points like title tags, robots.txt and canonical URLs. […]

  24. […] implementing my WordPress-SEO tweaks to avoid unnecessary code changes. If your permalink structure is not set to custom /%postname%/ […]

  25. […] at Sebastian’s Pamphlets that sums it up better than I ever could of.   The post is titled SEO-sanitizing a WordPress theme in 5 minutes, where he provides a bunch of useful SEO tips that are easy and won’t take very long to […]

  26. […] all my fellow bloggers here’s how to SEO-sanitizing a WordPress theme in 5 minutes, here’s 25 Expert Techniques to Have a Successful Blog and Why Your Newsletter Content Should […]

  27. […] free to e-mail me or leave me a comment. If you’re interested in SEO for WordPress, I recommend this post. I don’t want to become another blog blogging about blogging, so I’m not going to talk about SEO […]

  28. […] SEO-sanitizing a WordPress theme in 5 minutes - Some good SEO tips for wordpress users. […]

  29. FiSh on 18 September, 2007  #link

    I recently implemented most of these policies in my own blog, and my Google hits have steadily increased over the past month or so as a result. Very good information :)

  30. Sebastian on 19 September, 2007  #link

    I’m glad it worked for you.

  31. […] about the plugin. Make sure you mention they can get the plugin here, and tell them the talented Sebastian identified the problem and came up with the fix. It’s just a simple plugin, and one line of code. […]

  32. A Monday’s topic conglomerate on 8 October, 2007  #link

    […] made a plugin from my WordPress URL canonicalization bugfix. […]

  33. Oliver Smith on 13 October, 2007  #link

    Hey Sebastian,
    Many thanks for taking the time to do an in depth guide like this. Its much appreciated and i will use it wisely. :-)

  34. […] Write a how to […]

  35. doctor_claw on 28 November, 2007  #link

    Hi Sebastian,

    Thanks for this great post, it has become a useful point of reference for me lately.

    Since WordPress now has tags enabled by default, and since people are clamoring at me to implement tagging on a certain blog I work on, two questions:

    1. Is there any advantage over using tags vs categories in WP? I am led to believe that tags are necessary for showing up in various blog search engines, etc, but I see no major differences whatsoever between tags and categories aside from the terminology. Please advise.

    2. Since I now have tags as well as categories, would it make any sense for me to rename the category url base “/tag/” (or the equivalent) and redirect anything with the string “/category/” to a sitemap page with links to all the “/tag/” URLs? It seems that by doing this I would gain the manageability of a WP category list and lose half of my duplicate content while keeping “tag” functionality, but are there any other factors I am overlooking?

    I realize this is not a support forum, and you are a busy guy, but if you could comment on these issues I think it may help me greatly.

  36. Sebastian on 28 November, 2007  #link

    I didn’t look at the new WP software because they screwed the database structure breaking all my scripts. Deleting tables without a well documented path to upgrades –respectively providing the old structures as views on the new tables– is a no-no, but it seems everybody’s darling WP can get away with BS like that. Having said that, I think that categories as well as tags make it into the feed categories, so that’s not a reason to upgrade. Also search engines like Technorati get the tags both from category assignments as well as tags. Renaming URIs is always a bad idea. It’s doable but you need to be a redirect expert or you’ll screw your SE listings. I’m not exactly a fan of link farms in tag clouds and such stuff, because all those cool widgets dilute your topical link juice way too much. If you’ve tags, make the cats uncrawlable and condomize links to cat pages, otherwise hide the tags in the same way. You’ll get away with some intended dupe content, don’t be too dupe-paranoid with a blog, but completely hide the monthly archives from search engines. I hope that helps a little.

  37. […] I moved my blog from blogspot to this domain, I’ve enhanced the faulty WordPress URL canonicalization. If any user agent requests http://sebastians-pamphlets.com it gets redirected to […]

  38. […] from Andy Beard’s excellent article “Ultimate WordPress Htaccess file” or from Sebastian’s Pamphlet on trailing slash canonicalization . Be careful with your htaccess file though - a mistake there can make a wretched mess of your […]

  39. GungaDin on 24 August, 2008  #link

    Fantastic. I was looking for info like this. Thx ! Idiotic wp theme directories got in the way of my search results.

  40. Sebastian on 29 September, 2008  #link

    That’s a Guinness related problem.

  41. search engine optimization tutorial on 12 June, 2009  #link

    Wow, never thought that natural linking could cause such serious problems. Perhaps will try your method.
    Regards,
    Andrew

  42. Devan on 17 August, 2009  #link

    Hi Guys…I really appreciate the great ino here. I have a question, I want to delete about 30 different post on my wordpress blog. Will deleting these pages hurt my SEO rankings? The pages that I want to delete are all indexed bu they are duplicate content post. I don’t want to take a chance a leave that content on my blog but I am not sure if I should just delete them…and if so should I delete them all at once?

    [I’d never delete an indexed URI. If you really want to wipe out the content then 301 redirect the URIs to related pages. Of course the “right thing to do” is serving a 410, but that kills the traffic.]

  43. Health blog on 19 September, 2009  #link

    One thing i’d love to know how to do is : Being able to redirect spammy traffic to a location of my choice.

    [Although I do know the answer, I’m busy redirecting blog-spammer-links to my comment policy. Sigh.]

  44. RH on 18 October, 2009  #link

    I recently did a workpress site and found that it was having problems getting indexed by google. I found that the problem was a plugin that i had used for SEF URLs. Becareful when using some of the plugins as they can affect the indexing and crawling time with Google. Hope this helps.

    [Yup, suffering from some forms of pluginmania can hurt.]

  45. Andreas on 30 October, 2009  #link

    Very nice Post. I have already implemented some of your tips. They are working fine!

    I wonder if search engines like google really can not differentiate between the two URL’s “without www and with www”. In my opinion this would be an indication of inability!

    [Actually, your attempt to linkspam my blog is “an indication of inability”. You ignorant krauts better leave the Interweb alone …]

  46. Søkemotoroptimalisering on 18 August, 2010  #link

    RH: Never have any problem getting my site crawled by Google or any other SE - well that’s not completely correct, as I were having problems after the upgrade to WP 3.0 from 2.92. Google kept informing me that they couldn’t reach my sitemap, even though it’s were easily available for everyone else !?? Suddenly after a few days, they noticed me that they hade managed to collect my sitemap again ? The problem occured after I made my blog unavailable for SE when working on it, and when I had moved it to the webserver and opened up for SE - Google continued to say that it was unavailable…. maybe a bug in wordpress 3.0 though, but never had any problems regarding plugins….

  47. pozycjonowanie on 20 December, 2010  #link

    Hey!
    I was just having giant pleasure reading your site. It was great time for me indeed. If there would be more sites with so much usefull informations like this one, then my knowledge wouldn’t be so painful to get for me. I can assume that there would be no necessery to spare so much time on searching informations. So in conclusion i just wanted to show you how i am grateful for your effort to make this site.

    [Link removed #401]

Leave a reply


[If you don't do the math, or the answer is wrong, you'd better have saved your comment before hitting submit. Here is why.]

Be nice and feel free to link out when a link adds value to your comment. More in my comment policy.