Archived posts from the 'Usability' Category

Why storing URLs with truncated trailing slashes is an utterly idiocy

Yahoo steals my trailing slashesWith some Web services URL canonicalization has a downside. What works great for major search engines like Google can fire back when a Web service like Yahoo thinks circumcising URLs is cool. Proper URL canonicalization might, for example, screw your blog’s reputation at Technorati.

In fact the problem is not your URL canonicalization, e.g. 301 redirects from http://example.com to http://example.com/ respectively http://example.com/directory to http://example.com/directory/, but crappy software that removes trailing forward slashes from your URLs.

Dear Web developers, if you really think that home page locations respectively directory URLs look way cooler without the trailing slash, then by all means manipulate the anchor text, but do not manipulate HREF values, and do not store truncated URLs in your databases (not that “http://example.com” as anchor text makes any sense when the URL in HREF points to “http://example.com/”). Spreading invalid URLs is not funny. People as well as Web robots take invalid URLs from your pages for various purposes. Many usages of invalid URLs are capable to damage the search engine rankings of the link destinations. You can’t control that, hence don’t screw our URLs. Never. Period.

Folks who don’t agree with the above said read on.

    TOC:

  • What is a trailing slash? About URLs, directory URIs, default documents, directory indexes, …
  • How to rescue stolen trailing slashes About Apache’s handling of directory requests, and rewriting respectively redirecting invalid directory URIs in .htaccess as well as in PHP scripts.
  • Why stealing trailing slashes is not cool Truncating slashes is not only plain robbery (bandwidth theft), it often causes malfunctions at the destination server and 3rd party services as well.
  • How URL canonicalization irritates Technorati 301 redirects that “add” a trailing slash to directory URLs, respectively virtual URIs that mimic directories, seem to irritate Technorati so much that it can’t compute reputation, recent post lists, and so on.

What is a trailing slash?

The Web’s standards say (links and full quotes): The trailing path segment delimiter “/” represents an empty last path segment. Normalization should not remove delimiters when their associated component is empty. (Read the polite “should” as “must”.)

To understand that, lets look at the most common URL components:
scheme:// server-name.tld /path ?query-string #fragment
The (red) path part begins with a forward slash “/” and must consist of at least one byte (the trailing slash itself in case of the home page URL http://example.com/).

If an URL ends with a slash, it points to a directory’s default document, or, if there’s no default document, to a list of objects stored in a directory. The home page link lacks a directory name, because “/” after the TLD (.com|net|org|…) stands for the root directory.

Automated directory indexes (a list of links to all files) should be forbidden, use Options -Indexes in .htaccess to send such requests to your 403-Forbidden page.

In order to set default file names and their search sequence for your directories use DirectoryIndex index.html index.htm index.php /error_handler/missing_directory_index_doc.php. In this example: on request of http://example.com/directory/ Apache will first look for /directory/index.html, then if that doesn’t exist for /directory/index.htm, then /directory/index.php, and if all that fails, it will serve an error page (that should log such requests so that the Webmaster can upload the missing default document to /directory/).

The URL http://example.com (without the trailing slash) is invalid, and there’s no specification telling a reason why a Web server should respond to it with meaningful contents. Actually, the location http://example.com points to Null  (nil, zilch, nada, zip, nothing), hence the correct response is “404 - we haven’t got ‘nothing to serve’ yet”.

The same goes for sub-directories. If there’s no file named “/dir”, the URL http://example.com/dir points to Null too. If you’ve a directory named “/dir”, the canonical URL http://example.com/dir/ either points to a directory index page (an autogenerated list of all files) or the directory’s default document “index.(html|htm|shtml|php|…)”. A request of http://example.com/dir –without the trailing slash that tells the Web server that the request is for a directory’s index– resolves to “not found”.

You must not reference a default document by its name! If you’ve links like http://example.com/index.html you can’t change the underlying technology without serious hassles. Say you’ve a static site with a file structure like /index.html, /contact/index.html, /about/index.html and so on. Tomorrow you’ll realize that static stuff sucks, hence you’ll develop a dynamic site with PHP. You’ll end up with new files: /index.php, /contact/index.php, /about/index.php and so on. If you’ve coded your internal links as http://example.com/contact/ etc. they’ll still work, without redirects from .html to .php. Just change the DirectoryIndex directive from “… index.html … index.php …” to “… index.php … index.html …”. (Of course you can configure Apache to parse .html files for PHP code, but that’s another story.)

It seems that truncating default document names can make sense for services that deal with URLs, but watch out for sites that serve different contents under various extensions of “index” files (intentionally or not). I’d say that folks submitting their ugly index.html files to directories, search engines, top lists and whatnot deserve all the hassles that come with later changes.

How to rescue stolen trailing slashes

Since Web servers know that users are faulty by design, they jump through a couple of resource burning hoops in order to either add the trailing slash so that relative references inside HTML documents (CSS/JS/feed links, image locations, HREF values …) work correctly, or apply voodoo to accomplish that without (visibly) changing the address bar.

With Apache, DirectorySlash On enables this behavior (check whether your Apache version does 301 or 302 redirects, in case of 302s find another solution). You can also rewrite invalid requests in .htaccess when you need special rules:
RewriteEngine on
RewriteBase /content/
RewriteRule ^dir1$ http://example.com/content/dir1/ [R=301,L]
RewriteRule ^dir2$ http://example.com/content/dir2/ [R=301,L]

With content management systems (CMS) that generate virtual URLs on the fly, often there’s no other chance than hacking the software to canonicalize invalid requests. To prevent search engines from indexing invalid URLs that are in fact duplicates of canonical URLs, you’ll perform permanent redirects (301).

Here is a WordPress (header.php) example:
$requestUri = $_SERVER["REQUEST_URI"];
$queryString = $_SERVER["QUERY_STRING"];
$doRedirect = FALSE;
$fileExtensions = array(".html", ".htm", ".php");
$serverName = $_SERVER["SERVER_NAME"];
$canonicalServerName = $serverName;
 
// if you prefer http://example.com/* URLs remove the "www.":
$srvArr = explode(".", $serverName);
$canonicalServerName = $srvArr[count($srvArr) - 2] ."." .$srvArr[count($srvArr) - 1];
 
$url = parse_url ("http://" .$canonicalServerName .$requestUri);
$requestUriPath = $url["path"];
if (substr($requestUriPath, -1, 1) != "/") {
$isFile = FALSE;
foreach($fileExtensions as $fileExtension) {
if ( strtolower(substr($requestUriPath, strlen($fileExtension) * -1, strlen($fileExtension))) == strtolower($fileExtension) ) {
$isFile = TRUE;
}
}
if (!$isFile) {
$requestUriPath .= "/";
$doRedirect = TRUE;
}
}
$canonicalUrl = "http://" .$canonicalServerName .$requestUriPath;
if ($queryString) {
$canonicalUrl .= "?" . $queryString;
}
if ($url["fragment"]) {
$canonicalUrl .= "#" . $url["fragment"];
}
if ($doRedirect) {
@header("HTTP/1.1 301 Moved Permanently", TRUE, 301);
@header("Location: $canonicalUrl");
exit;
}

Check your permalink settings and edit the values of $fileExtensions and $canonicalServerName accordingly. For other CMSs adapt the code, perhaps you need to change the handling of query strings and fragments. The code above will not run under IIS, because it has no REQUEST_URI variable.

Why stealing trailing slashes is not cool

This section expressed in one sentence: Cool URLs don’t change, hence changing other people’s URLs is not cool.

Folks should understand the “U” in URL as unique. Each URL addresses one and only one particular resource. Technically spoken, if you change one single character of an URL, the altered URL points to a different resource, or nowhere.

Think of URLs as phone numbers. When you call 555-0100 you reach the switchboard, 555-0101 is the fax, and 555-0109 is the phone extension of somebody. When you steal the last digit, dialing 555-010, you get nowhere.

Yahoo'ish fools steal our trailing slashesOnly a fool would assert that a phone number shortened by one digit is way cooler than the complete phone number that actually connects somewhere. Well, the last digit of a phone number and the trailing slash of a directory link aren’t much different. If somebody hands out an URL (with trailing slash), then use it as is, or don’t use it at all. Don’t “prettify” it, because any change destroys its serviceability.

If one requests a directory without the trailing slash, most Web servers will just reply to the user agent (brower, screen reader, bot) with a redirect header telling that one must use a trailing slash, then the user agent has to re-issue the request in the formally correct way. From a Webmaster’s perspective, burning resources that thoughtlessly is plain theft. From a user’s perspective, things will often work without the slash, but they’ll be quicker with it. “Often” doesn’t equal “always”:

  • Some Web servers will serve the 404 page.
  • Some Web servers will serve the wrong content, because /dir is a valid script, virtual URI, or page that has nothing to do with the index of /dir/.
  • Many Web servers will respond with a 302 HTTP response code (Found) instead of a correct 301-redirect, so that most search engines discovering the sneakily circumcised URL will index the contents of the canonical URL under the invalid URL. Now all search engine users will request the incomplete URL too, running into unnecessary redirects.
  • Some Web servers will serve identical contents for /dir and /dir/, that leads to duplicate content issues with search engines that index both URLs from links. Most Web services that rank URLs will assign different scorings to all known URL variants, instead of accumulated rankings to both URLs (which would be the right thing to do, but is technically, well, challenging).
  • Some user agents can’t handle (301) redirects properly. Exotic user agents might serve the user an empty page or the redirect’s “error message”, and Web robots like the crawlers sent out by Technorati or MSN-LiveSearch hang up respectively process garbage.

Does it really make sense to maliciously manipulate URLs just because some clueless developers say “dude, without the slash it looks way cooler”? Nope. Stealing trailing slashes in general as well as storing amputated URLs is a brain dead approach.

KISS (keep it simple, stupid) is a great principle. “Cosmetic corrections” like trimming URLs add unnecessary complexity that leads to erroneous behavior and requires even more code tweaks. GIGO (garbage in, garbage out) is another great principle that applies here. Smart algos don’t change their inputs. As long as the input is processible, they accept it, otherwise they skip it.

Exceptions

URLs in print, radio, and offline in general, should be truncated in a way that browsers can figure out the location - “domain.co.uk” in print and “domain dot co dot uk” on radio is enough. The necessary redirect is cheaper than a visitor who doesn’t type in the canonical URL including scheme, www-prefix, and trailing slash.

How URL canonicalization seems to irritate Technorati

Due to the not exactly responsively (respectively swamped) Technorati user support parts of this section should be interpreted as educated speculation. Also, I didn’t research enough cases to come to a working theory. So here is just the story “how Technorati fails to deal with my blog”.

When I moved my blog from blogspot to this domain, I’ve enhanced the faulty WordPress URL canonicalization. If any user agent requests http://sebastians-pamphlets.com it gets redirected to http://sebastians-pamphlets.com/. Invalid post/page URLs like http://sebastians-pamphlets.com/about redirect to http://sebastians-pamphlets.com/about/. All redirects are permanent, returning the HTTP response code “301″.

I’ve claimed my blog as http://sebastians-pamphlets.com/, but Technorati shows its URL without the trailing slash.
…<div class="url"><a href="http://sebastians-pamphlets.com">http://sebastians-pamphlets.com</a> </div> <a class="image-link" href="/blogs/sebastians-pamphlets.com"><img …

By the way, they forgot dozens of fans (folks who “fave’d” either my old blogspot outlet or this site) too.
Blogs claimed at Technorati

I’ve added a description and tons of tags, that both don’t show up on public pages. It seems my tags were deleted, at least they aren’t visible in edit mode any more.
Edit blog settings at Technorati

Shortly after the submission, Technorati stopped to adjust the reputation score from newly discovered inbound links. Furthermore, the list of my recent posts became stale, although I’ve pinged Technorati with every update, and technorati received my update notifications via ping services too. And yes, I’ve tried manual pings to no avail.

I’ve gained lots of fresh inbound links, but the authority score didn’t change. So I’ve asked Technorati’s support for help. A few weeks later, in December/2007, I’ve got an answer:

I’ve taken a look at the issue regarding picking up your pings for “sebastians-pamphlets.com”. After making a small adjustment, I’ve sent our spiders to revisit your page and your blog should be indexed successfully from now on.

Please let us know if you experience any problems in the future. Do not hesitate to contact us if you have any other questions.

Indeed, Technorati updated the reputation score from “56″ to “191″, and refreshed the list of posts including the most recent one.

Of course the “small adjustment” didn’t persist (I assume that a batch process stole the trailing slash that the friendly support person has added). I’ve sent a follow-up email asking whether that’s a slash issue or not, but didn’t receive a reply yet. I’m quite sure that Technorati doesn’t follow 301-redirects, so that’s a plausible cause for this bug at least.

Since December 2007 Technorati didn’t update my authority score (just the rank goes up and down depending on the number of inbound links Technorati shows on the reactions page - by the way these numbers are often unreal and change in the range of hundreds from day to day).
Blog reactions and authority scoring at Technorati

It seems Technorati didn’t index my posts since then (December/18/2007), so probably my outgoing links don’t count for their destinations.
Stale list of recent posts at Technorati

(All screenshots were taken on February/05/2008. When you click the Technorati links today, it could hopefully will look differently.)

I’m not amused. I’m curious what would happen when I add
if (!preg_match("/Technorati/i", "$userAgent")) {/* redirect code */}

to my canonicalization routine, but I can resist to handle particular Web robots. My URL canonicalization should be identical both for visitors and crawlers. Technorati should be able to fix this bug without code changes at my end or weeky support requests. Wishful thinking? Maybe.

Update 2008-03-06: Technorati crawls my blog again. The 301 redirects weren’t the issue. I’ll explain that in a follow-up post soon.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Comment rating and filtering with SezWho

I’ve added SezWho to the comment area. SezWho enables rating and filtering of comments, and shows you even comments an author has left on other blogs. Neat.

Currently there are no ratings, so the existing comments are all rated 2.5 (quite good). Once you’ve rated a few comments, you can suppress all lower quality comments (rated below 3), or show high quality comments only (rated 4 or better).

Don’t freak out when you use CSS that highlights nofollow’ed links. SezWho manipulates the original (mostly dofollow’ed) author link with JavaScript, hence search engines still recognize that a link shall pass PageRank and anchor text. (I condomize some link drops, for example when I don’t know a site and can’t afford the time to check it out, see my comments policy.)

I’ll ask SezWho to change that when I’m more familiar with their system (I hate change requests based on a first peek). SezWho should look at the attributes of the original link in order to add rel=”nofollow” to the JS created version only when the blogger actually has condomized a particular link. Their software changes the comment author URL to a JS script that redirects visitors to the URL the commenter has submitted. It would be nice to show the original URL in the status bar on mouse over.

Also, it seems that when you sign up with SezWho, they remove the trailing slash from your blog’s URL. That’s not acceptable. I mean not every startup should do what clueless Yahoo developers still do although they know that it violates several Web standards. Removing trailing slashes from links is not cool, that’s a crappy manipulation that can harm search engine rankings, will lead to bandwidth theft when bots follow castrated links only to get redirected, … ok, ok, ok … that’s stuff for another post rant. Judging from their Web site, SezWho looks like a decent operation, so I’m sure they can change that too.

 

SezWho sidebar widget

I’ve not yet added the widgets, above is how they would appear in the sidebar.

 

I consider SezWho useful. All functionality lives in the blog and can access the blog’s database, so in theory it doesn’t slow down the page load time by pulling loads of data from 3rd party sources. Please let me know whether you like it or not. Thanks!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

One out of many sure-fire ways to avoid blog comments

ranting on idiotic comment form designsIf your name is John Doe and you don’t blog this rant is not for you, because you don’t suffer from truncated form field values. Otherwise check here whether you annoy comment authors on your blog or not. “Annoy” is the polite version by the way, I’m pissed on 99% of the blogs I read. It took me years to write about this issue eventually. Today I had enough.

Look at this form designed especially for John Doe (john@doe.com) at http://doe.com/, then duplicated onto all blogs out there, and imagine you’re me going to comment on a great post:

I can’t view what I’ve typed in, and even my browser’s suggested values are truncated because the input field is way too narrow. Sometimes I leave post-URLs with a comment, so when I type in the first characters of my URL, I get a long list of shortened entries from which I can’t select anything. When I’m in a bad mood I swear and surf on without commenting.

I’ve looked at a fair amount of WordPress templates recently, and I admit that crappy comment forms are a minor issue with regard to the amount of duplicated hogwash most theme designers steal from each other. However, I’m sick of crappy form usability, so I’ve changed my comment form today:

Now the input fields should display the complete input values in most cases. My content column is 500 pixels wide, so size="42" leaves enough space when a visitor surfs with bigger fonts enlarging the labels. If with very long email addresses or URLs that’s not enough, I’ve added title attributes and onchange triggers which display the new value as tooltip when the visitors navigates to the next input field. Also I’ve maxed out the width of the text area. I hope this 60 seconds hack improves the usability of my comment form.

When do you fire up your editor and FTP client to make your comment form convenient? Even tiny enhancements can make your visitors happier.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Free WordPress Add-on: Categorized Sitemaps

In How to feed all posts on a WordPress blog with link love I’ve outlined a method to create short and topically related paths to each and every post even on a large blog. Since not every blogger is PHP savvy enough to implement the concept, some readers asked me to share the sitemaps script.

Ok, here it is. It wasn’t developed as a plugin, and I’m not sure that’s possible (actually, I didn’t think about it), but I’ll do my best to explain the template hacks necessary to get it running smoothly. Needless to say it’s a quick hack and not exactly elegant, however it works here with WordPress 2.2.2. Use it as is at your own risk, yada yada yada, the usual stuff.

I’m a link whore, so please note: If you implement my sitemap script, please link out to any page on my blog. The script inserts a tiny link at the bottom of the sitemap. If you link to my blog under credits, powered by, in the blogroll or whereever, you can remove it. If you don’t link, the search engines shall ban you. ;)

Prerequisites

You should be able to do guided template hacks.

You need a WordPress plugin that enables execution of PHP code within the content of posts and pages. Install one from the list below and test it with a private post or so. Don’t use the visual editor and deactivate the “WordPress should correct invalidly nested XHTML automatically” thingy in Options::Writing. In the post editor write something like
Q: Does my PHP plugin work?
<?php
print "A: Yep, It works.";
?>
and check “enable PHP on this page” (labels differ from plug-in to plug-in), save and click preview. If you see the answer it works. Otherwise try another plug-in:

(Maybe you need to quote your PHP code with special tags like <phpcode></phpcode>, RTFM.)

Consider implementing my WordPress-SEO tweaks to avoid unnecessary code changes. If your permalink structure is not set to custom /%postname%/ giving post/page URLs like http://sebastians-pamphlets.com/about/ you need to tweak my code a little. Not that there’s such a thing as a valid reason to use another permalink structure …

Download

Don’t copy and paste PHP code from this page, it might not work because WordPress prettifies quotes etcetera. Everything you need is on the download page.

Installation

Copy list_categories.php to your template directory /wp-content/themes/yourtemplatename/ on your local disk and upload it to your server.

Create a new page draft, title it “Category Index” or so, and in page content put
<?php @include(TEMPLATEPATH . "/list_categories.php"); ?>
then save and preview it. You should see a category links list like this one. Click the links, check whether the RSS icons show or not, etcetera.

If anything went wrong, load list_categories.php with your preferred editor (not word processor!). Scroll down to edit these variables:
// Customize if necessary:
//$blogLocaction = “sebastians-pamphlets.com”;
// “www.yourserver.com”, “www.yourserver.com/blog” …
// without “http://” and no trailing slash!
//$rssIconPath = “/img/feed-icon-16×16.gif”;
// get a 16*16px rss icon somewhere and upload it you your server,
// then change this path which is relative to the domain’s root.
$rssIconWidth = 16;
$rssIconHeight = 16;
If you edit a variable, remove its “//“. If you use the RSS icon delivered with WordPress, change width and height to 14 pixels. Save the file, upload it to your server, and test again.

If you use Feedburner then click the links to the category feeds, Feedburner shouldn’t redirect them to your blog’s entries feed. I’ve used feed URLs which the Feedburner plug-in doesn’t redirect, but if the shit hits the fan search for the variable $catFeedUrl and experiment with the category-feed URLs.

Your sitemap’s URL is http://your-blog.com/sitemap-page-slug/ (respectively your-blog.com/about/sitemap/ or so when the sitemap has a parent page).

In theory you’re done. You could put a link to the sitemap in your sidebar and move on. In reality you want to prettify it, and you want to max out the SEO effects. Here comes the step by step guide to optimized WordPress sitemaps / topical hubs.

Category descriptions

On your categorized sitemap click any “[category-name] overview” link. You land on a page listing all posts of [category-name] under the generic title “Category Index”, “Sitemap”, or whatever you’ve put in the page’s title. Donate at least a description. Your visitors will love that and when you install a meta tag plugin the search engines will send a little more targeted traffic because your SERP listings look better (sane meta tags don’t boost your rankings but should improve your SERP CTR).

On your dashboard click Manage::Categories and write a nice but keyword rich description for each category. When you reference other categories by name my script will interlink the categories automatically, so don’t put internal links. Now the category links lists (overview pages) look better and carry (lots of) keywords.

The sitemap URL above will not show the descriptions (respectively only as tooltip), but the topical mini-hubs linked as “overview” (category links lists) have it. Your sitemap’s URL with descriptions is http://your-blog.com/sitemap-page-slug/?definitions=TRUE (your-blog.com/about/sitemap/?definitions=TRUE or so when the sitemap has a parent page).

If you want to put a different introduction or footer depending on the appearance of descriptions you can replace the code in your page by:
<?php
// introduction:
if (strtoupper($_GET["definitions"]) == "TRUE") {
print "<p><strong>All categories with descriptions.</strong> (Example)</p>”;
}
else {
if (!isset($_GET[”cat”])) {
print “<p><strong>All categories without descriptions.</strong> (Example)</p>”;
}
}
@include(TEMPLATEPATH . “/list_categories.php”);
// footer as above
?>
(If you use quotes in the print statements then prefix them with a slash, for example: print "<em>yada \"yada\" <a href=\"url\" title=\"string\">yada</a></em>."; will output yada “yada” yada.)

Title tags

The title of the page listing all categories with links to the category pages and feeds is by design used for the category links pages too. WordPress ignores input parameters in URLs like http://your-blog.com/sitemap-page-slug/?cat=category-name.

To give each category links list its own title tag, replace the PHP code in the title tag. Edit header.php:
<title>
<?php
// 1. Everything:
$pageTitle = wp_title(“”,false);
if (empty($pageTitle)) {
$pageTitle = get_bloginfo(”name”);
}
$pageTitle = trim($pageTitle);
// 2. Dynamic category pages:
$input_catName = trim($_GET[”cat”]);
if ($input_catName) {
$input_catName = ucfirst($input_catName);
$pageTitle = $input_catName .” at ” .get_bloginfo(”name”);
}
// 3. If you need a title depending on the appearance of descriptions
$input_catDefs = trim($_GET[”definitions”]);
if ($input_catDefs) {
$pageTitle = “All tags explained by ” .get_bloginfo(”name”);
}
print $pageTitle;
?>
</title>

The first statements just fix the obscene prefix crap most template designers are obsessed about. The second block generates page titles with the category name in it for the topical hubs (if your category slugs and names are identical). You need 1. and 2.; 3. is optional.

Page headings

Now that you’ve neat title tags, what do you think about accurate headings on the category hub pages? To accomplish that you need to edit page.php. Search for a heading (h3 or so) displaying the_title(); and replace this function by:
<h3 class=”entrytitle” id=”post-<?php the_ID(); ?>”> <a href=”<?php the_permalink() ?>” rel=”bookmark”>
<?php
// 1. Dynamic category pages
$input_catName = trim($_GET[”cat”]);
if ($input_catName) {
$input_catName = ucfirst($input_catName);
$dynTitle = “All Posts Tagged ‘” .$input_catName .”‘”;
}
// 2. If you need a heading depending on the appearance of descriptions
$input_catDefs = trim($_GET[”definitions”]);
if ($input_catDefs) {
$dynTitle = “All tags explained”;
}
// 3. Output the heading
if ($dynTitle) print $dynTitle; else the_title();
?>
</a>
</h3>

(The surrounding XHTML code may look different in your template! Replace the PHP code leaving the HTML code as is.)

The first block generates headings with the category name in it for the topical hubs (if your category slugs and names are identical). The last statement outputs either the hub’s heading or the standard title if the actual page doesn’t belong to the script. You need 1. and 3.; 2. is optional.

Feeding the category hubs

With most templates each post links to the categories its tagged with. Besides the links to the category archive pages you want to feed your hubs linking to all posts of each category with a little traffic and topical link juice. One method to accomplish that is linking to the category hubs below the comments. If you don’t read this post on the main page or an archive page, click here for an example. Edit single.php, a line below the comments_template(); call insert something like that:
<br />
<p class="post-info" id="related-links-lists">
<em class="cat">Find related posts in
<?php
$catString = "";
foreach((get_the_category()) as $catItem) {
if (!empty($catString)) $catString .= ", ";
$catName = $catItem->cat_name;
$catSlug = $catItem->category_nicename;
$catUrl = "http://your-blog.com/sitemap-page-slug/?cat="
.strtolower($catSlug);
$catString .= "<a href=\"$catUrl\">$catName</a>";
} // foreach
print $catString;
?>
</em>
</p>
(Study your template’s “post-info” paragraph and ensure that you use the same class names!)

Also, if your descriptions are of glossary quality, then link to your category hubs in your posts. Since most of my posts are dull as dirt, I decided to make the category descriptions an even duller canonical SEO glossary. It’s up to you to become creative and throw together something better, funnier, more useful … you get the idea. If you blog in english and you honestly believe your WordPress sitemap is outstanding, why not post it in the comments? Links are dofollowed in most cases. ;)

Troubleshooting

Test everything before you publish the page and link to the sitemaps.

If you have category descriptions and on the sitemap pages links to other categories within the description are broken: Make sure that the sitemap page’s URL does not contain the name or slug of any of your categories. Say the page slug is “sitemaps” and “links” is the parent page of “sitemaps” (URL: /links/sitemaps/), then you must not have a category named “links” nor “sitemaps”. Since a “sitemap” category is somewhat unusual, I’d say serving the sitemaps on a first level page named “sitemap” is safe.

Disclaimer

I hope this post isn’t clear as mud and everybody can install my stuff without hassles. However, every change of code comes with pitfalls, and I can’t address each and every possibility, so please backup your code before you change it, or play with my script in a development system. I can’t provide support but I’ll try to reply to comments. Have fun at your own risk! ;)



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

How to feed old WordPress posts with link love

WordPress phases old posts out. That’s fine with timely contents, but bloggers publishing persistent stuff suffer from a loss of traffic to evergreens by design. If you’re able to hack your template, you can enhance your link structure in a way that funnels visitors and link love to old posts.

For the sake of this case study, lets assume that the archives are uncrawlable, that is blocked in robots.txt and not linked anywhere in a way that search engine crawlers follow the links. To understand the problem, lets look at the standard link structure of WordPress blogs:
Standard WordPress link structure
Say your category archives display 10 posts per page. The first 10 posts gain a fair amount of visibility (for visitors and search engines), but the next 10 posts land on category archive page #2, which is linked solely from the archive pages #1 and #3. And so on, the freshest 10 posts per category are reachable, older posts phase out.

Lets count that in clicks from the main page. The freshest 10 posts are 2 clicks away. The next bunch of 10 posts is 3 clicks away. Posts on category archive page #3 are 4 clicks away. Consider a link level depth greater than 3 crap. Search engines may index these deeply buried posts on popular blogs, but most visitors don’t navigate that deep into the archives.

Now let me confuse you with another picture. This enhanced link structure should feed each and every post on your blog with links:
Standard WordPress link structure
The structure outlined in the picture is an additional layer, it does not replace the standard architecture.

You get one navigational path connecting any page via one hop (category index page #2 on the image) to any post. That’s two clicks from any page on your blog to any post, but such a mega hub (example) comes with disadvantages when you’ve large archives.

Hence we create more paths to deeply buried posts. Both new category index pages provide links to lean categorized links pages which list all post by category (”[category name] overview” in this example corresponding to category index page #1 on the image above). If both category index pages are linked in the sidebar of all pages, you get a couple two-hop-links to all posts from all pages. That means that via a category index page and a lean category links page (example) each and every post is 3 clicks away from any other page.

Now we’ve got a few shorter paths to old posts, but that’s not enough. We want to make use of the lean category links pages to create topical one-hop-links to related posts too. With most templates every post links to one or more category pages. We can’t replace these links because blog savvy readers clicking these category links expect a standard category page. I’ve added my links to the lean categorized links pages below the comments, and there are many more ways to put them, not only on single post pages.

It’s possible to tweak this concept by flooding pages with navigational links to swipe a click level here and there, but that can dilute topical relevancy because it leads to “every page links to every page” patterns which are not exactly effective nor useful. Also, ornate pages load awfully slow and that’s a sure fire way to lose visitors. By the way that’s the reason why I don’t put a category links widget onto my sidebar.

To implement this concept I hacked the template and wrote a PHP script to output the links lists which is embedded in a standard page (/links/categories/). At the moment this experiment is just food for thoughts, because this blog is quite new (I’ve registered the domain a few days ago). However, I expect it will work.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments