Web development

Archived posts from the 'Web development' Category

How to fuck up click tracking with the JavaScript onclick trigger

Posted on 14 September, 2007

Fuck up click tracking There’s a somewhat heated debate over at Sphinn and many other places as well where folks ~~call each other guppy and dumbass~~ try to figure out whether a particular directory’s click tracking sinks PageRank distribution or not. Besides interesting replies from Matt Cutts, an essential result of this debate is that Sphinn will implement a dumbass button.

Usually I wouldn’t write about desperate PageRank junkies going cold turkey, not even as a TGIF post, but the reason why this blog directory most probably doesn’t pass PageRank is interesting, because it has nothing to do with onclick myths. Of course the existence of an intrinsic event handler (aka onclick trigger) in an A element alone has nothing to do with Google’s take on the link’s intention, hence an onclick event itself doesn’t pull a link’s ability to pass Google-juice.

To fuck up your click tracking you really need to forget everything you’ve ever read in Google’s Webmaster Guidelines. Unfortunately, Web developers usually don’t bother reading dull stuff like that and code the desired functionality in a way that Google as well as other search engines puke on the generated code. However, ignorance is no excuse when Google talks best practices.

Lets look at the code. Code reveals everything and not every piece of code is poetry. That’s crap:.html: <a href="http://sebastians-pamphlets.com" id="1234" onclick="return o('sebastians-blog');"> http://sebastians-pamphlets.com</a>

.js: function o(lnk){ window.open('/out/'+lnk+'.html'); return false; }
The script /out/sebastians-blog.html counts the click and then performs a redirect to the HREF’s value.

Why can and most probably will Google consider the hapless code above deceptive? A human visitor using a JavaScript enabled user agent clicking the link will land exactly where expected. The same goes for humans using a browser that doesn’t understand JS, and users surfing with JS turned off. A search engine crawler ignoring JS code will follow the HREF’s value pointing to the same location. All final destinations are equal. Nothing wrong with that. Really?

Nope. The problem is that Google’s spam filters can analyze client sided scripting, but don’t execute JavaScript. Google’s algos don’t ignore JavaScript code, they parse it to figure out the intent of links (and other stuff as well). So what does the algo do, see, and how does it judge eventually?

It understands the URL in HREF as definitive and ultimate destination. Then it reads the onclick trigger and fetches the external JS files to lookup the o() function. It will notice that the function returns an unconditional FALSE. The algo knows that the return value FALSE will not allow all user agents to load the URL provided in HREF. Even if o() would do nothing else, a human visitor with a JS enabled browser will not land at the HREF’s URL when clicking the link. Not good.

Next the window.open statement loads http://this-blog-directory.com/out/sebastians-blog.html, not http://sebastians-pamphlets.com (truncating the trailing slash is a BS practice as well, but that’s not the issue here). The URLs put in HREF and built in the JS code aren’t identical. That’s a full stop for the algo. Probably it does not request the redirect script http://this-blog-directory.com/out/sebastians-blog.html to analyze its header which sends a Location: http://sebastians-pamphlets.com line. (Actually, this request would tell Google that there’s no deceiptful intent, just plain hapless and overcomplicated coding, what might result in a judgement like “unreliable construct, ignore this link” or so, depending on other signals available).

From the algo’s perspective the JavaScript code performs a more or less sneaky redirect. It flags the link as shady and moves on. Guess what happens in Google’s indexing process with pages that carry tons of shady links … those links not passing PageRank sounds like a secondary problem. Perhaps Google is smart enough not to penalize legit sites for, well, hapless coding, but that’s sheer speculation.

However, shit happens, so every once in a while such a link will slip thru and may even appear in reverse citation results like link: searches or Google Webmaster Central link reports. That’s enough to fool even experts like Andy Beard (maybe Google even shows bogus link data to mislead SEO researches of any kind? Never mind).

Ok, now that we know how not to implement onclick click tracking, here’s an example of a bullet-proof method to track user clicks with the onclick event: <a href="http://sebastians-pamphlets.com/" id="link-1234" onclick="return trackclick(this.href, this.name);"> Sebastian's Pamphlets</a>trackclick() is a function that calls a server sided script to store the click and returns TRUE without doing a redirect or opening a new window.

Here is more information on search engine friendly click tracking using the onlick event. The article is from 2005, but not outdated. Of course you can add onclick triggers to all links with a few lines of JS code. That’s good practice because it avoids clutter in the A elements and makes sure that every (external) link is trackable. For this more elegant way to track clicks the warnings above apply too: don’t return false and don’t manipulate the HREF’s URL.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

35 comments Sebastian | Risky Linkage, Web development, Crap, JavaScript Redirects, SEO, Cloaking, Google

Free WordPress Add-on: Categorized Sitemaps

Posted on 10 September, 2007

In How to feed all posts on a WordPress blog with link love I’ve outlined a method to create short and topically related paths to each and every post even on a large blog. Since not every blogger is PHP savvy enough to implement the concept, some readers asked me to share the sitemaps script.

Ok, here it is. It wasn’t developed as a WordPress plugin, and I’m not sure that’s possible (actually, I didn’t think about it), but I’ll do my best to explain the template hacks necessary to get it running smoothly. Needless to say it’s a quick hack and not exactly elegant, however it works here with WordPress 2.2.2. Use it as is at your own risk, yada yada yada, the usual stuff.

I’m a link whore, so please note: If you implement my sitemap script, please link out to any page on my blog. The script inserts a tiny link at the bottom of the sitemap. If you link to my blog under credits, powered by, in the blogroll or whereever, you can remove it. If you don’t link, the search engines shall ban you.

Prerequisites

You should be able to do guided template hacks.

You need a WordPress plugin that enables execution of PHP code within the content of posts and pages. Install one from the list below and test it with a private post or so. Don’t use the visual editor and deactivate the “WordPress should correct invalidly nested XHTML automatically” thingy in Options::Writing. In the post editor write something like Q: Does my PHP plugin work? <?php print "A: Yep, It works."; ?>and check “enable PHP on this page” (labels differ from plug-in to plug-in), save and click preview. If you see the answer it works. Otherwise try another plug-in:

RunPHP
~~EzStatic3~~ EzStatic
PhpExec

(Maybe you need to quote your PHP code with special tags like <phpcode></phpcode>, RTFM.)

Consider implementing my WordPress-SEO tweaks to avoid unnecessary code changes. If your permalink structure is not set to custom /%postname%/ giving post/page URLs like http://sebastians-pamphlets.com/about/ you need to tweak my code a little. Not that there’s such a thing as a valid reason to use another permalink structure …

Download

Don’t copy and paste PHP code from this page, it might not work because WordPress prettifies quotes etcetera. Everything you need is on the download page.

Installation

Copy list_categories.php to your template directory /wp-content/themes/yourtemplatename/ on your local disk and upload it to your server.

Create a new page draft, title it “Category Index” or so, and in page content put <?php @include(TEMPLATEPATH . "/list_categories.php"); ?> then save and preview it. You should see a category links list like this one. Click the links, check whether the RSS icons show or not, etcetera.

If anything went wrong, load list_categories.php with your preferred editor (not word processor!). Scroll down to edit these variables: // Customize if necessary: //$blogLocaction = “sebastians-pamphlets.com”; // “www.yourserver.com”, “www.yourserver.com/blog” … // without “http://” and no trailing slash! //$rssIconPath = “/img/feed-icon-16×16.gif”; // get a 16*16px rss icon somewhere and upload it you your server, // then change this path which is relative to the domain’s root. $rssIconWidth = 16; $rssIconHeight = 16; If you edit a variable, remove its “//“. If you use the RSS icon delivered with WordPress, change width and height to 14 pixels. Save the file, upload it to your server, and test again.

If you use Feedburner then click the links to the category feeds, Feedburner shouldn’t redirect them to your blog’s entries feed. I’ve used feed URLs which the Feedburner plug-in doesn’t redirect, but if the shit hits the fan search for the variable $catFeedUrl and experiment with the category-feed URLs.

Your sitemap’s URL is http://your-blog.com/sitemap-page-slug/ (respectively your-blog.com/about/sitemap/ or so when the sitemap has a parent page).

In theory you’re done. You could put a link to the sitemap in your sidebar and move on. In reality you want to prettify it, and you want to max out the SEO effects. Here comes the step by step guide to optimized WordPress sitemaps / topical hubs.

Category descriptions

On your categorized sitemap click any “[category-name] overview” link. You land on a page listing all posts of [category-name] under the generic title “Category Index”, “Sitemap”, or whatever you’ve put in the page’s title. Donate at least a description. Your visitors will love that and when you install a meta tag plugin the search engines will send a little more targeted traffic because your SERP listings look better (sane meta tags don’t boost your rankings but should improve your SERP CTR).

On your dashboard click Manage::Categories and write a nice but keyword rich description for each category. When you reference other categories by name my script will interlink the categories automatically, so don’t put internal links. Now the category links lists (overview pages) look better and carry (lots of) keywords.

The sitemap URL above will not show the descriptions (respectively only as tooltip), but the topical mini-hubs linked as “overview” (category links lists) have it. Your sitemap’s URL with descriptions is http://your-blog.com/sitemap-page-slug/?definitions=TRUE (your-blog.com/about/sitemap/?definitions=TRUE or so when the sitemap has a parent page).

If you want to put a different introduction or footer depending on the appearance of descriptions you can replace the code in your page by: <?php // introduction: if (strtoupper($_GET["definitions"]) == "TRUE") { print "<p><strong>All categories with descriptions.</strong> (Example)</p>”; } else { if (!isset($_GET[”cat”])) { print “<p><strong>All categories without descriptions.</strong> (Example)</p>”; } } @include(TEMPLATEPATH . “/list_categories.php”); // footer as above ?>(If you use quotes in the print statements then prefix them with a slash, for example: print "<em>yada \"yada\" <a href=\"url\" title=\"string\">yada</a></em>."; will output yada “yada” yada.)

Title tags

The title of the page listing all categories with links to the category pages and feeds is by design used for the category links pages too. WordPress ignores input parameters in URLs like http://your-blog.com/sitemap-page-slug/?cat=category-name.

To give each category links list its own title tag, replace the PHP code in the title tag. Edit header.php: <title> <?php // 1. Everything: $pageTitle = wp_title(“”,false); if (empty($pageTitle)) { $pageTitle = get_bloginfo(”name”); } $pageTitle = trim($pageTitle); // 2. Dynamic category pages: $input_catName = trim($_GET[”cat”]); if ($input_catName) { $input_catName = ucfirst($input_catName); $pageTitle = $input_catName .” at ” .get_bloginfo(”name”); } // 3. If you need a title depending on the appearance of descriptions $input_catDefs = trim($_GET[”definitions”]); if ($input_catDefs) { $pageTitle = “All tags explained by ” .get_bloginfo(”name”); } print $pageTitle; ?> </title>
The first statements just fix the obscene prefix crap most template designers are obsessed about. The second block generates page titles with the category name in it for the topical hubs (if your category slugs and names are identical). You need 1. and 2.; 3. is optional.

Page headings

Now that you’ve neat title tags, what do you think about accurate headings on the category hub pages? To accomplish that you need to edit page.php. Search for a heading (h3 or so) displaying the_title(); and replace this function by: <h3 class=”entrytitle” id=”post-<?php the_ID(); ?>”> <a href=”<?php the_permalink() ?>” rel=”bookmark”> <?php // 1. Dynamic category pages $input_catName = trim($_GET[”cat”]); if ($input_catName) { $input_catName = ucfirst($input_catName); $dynTitle = “All Posts Tagged ‘” .$input_catName .”‘”; } // 2. If you need a heading depending on the appearance of descriptions $input_catDefs = trim($_GET[”definitions”]); if ($input_catDefs) { $dynTitle = “All tags explained”; } // 3. Output the heading if ($dynTitle) print $dynTitle; else the_title(); ?> </a> </h3>(The surrounding XHTML code may look different in your template! Replace the PHP code leaving the HTML code as is.)

The first block generates headings with the category name in it for the topical hubs (if your category slugs and names are identical). The last statement outputs either the hub’s heading or the standard title if the actual page doesn’t belong to the script. You need 1. and 3.; 2. is optional.

Feeding the category hubs

With most templates each post links to the categories its tagged with. Besides the links to the category archive pages you want to feed your hubs linking to all posts of each category with a little traffic and topical link juice. One method to accomplish that is linking to the category hubs below the comments. If you don’t read this post on the main page or an archive page, click here for an example. Edit single.php, a line below the comments_template(); call insert something like that: <br /> <p class="post-info" id="related-links-lists"> <em class="cat">Find related posts in <?php $catString = ""; foreach((get_the_category()) as $catItem) { if (!empty($catString)) $catString .= ", "; $catName = $catItem->cat_name; $catSlug = $catItem->category_nicename; $catUrl = "http://your-blog.com/sitemap-page-slug/?cat=" .strtolower($catSlug); $catString .= "<a href=\"$catUrl\">$catName</a>"; } // foreach print $catString; ?> </em> </p> (Study your template’s “post-info” paragraph and ensure that you use the same class names!)

Also, if your descriptions are of glossary quality, then link to your category hubs in your posts. Since most of my posts are dull as dirt, I decided to make the category descriptions an even duller canonical SEO glossary. It’s up to you to become creative and throw together something better, funnier, more useful … you get the idea. If you blog in english and you honestly believe your WordPress sitemap is outstanding, why not post it in the comments? Links are dofollowed in most cases.

Troubleshooting

Test everything before you publish the page and link to the sitemaps.

If you have category descriptions and on the sitemap pages links to other categories within the description are broken: Make sure that the sitemap page’s URL does not contain the name or slug of any of your categories. Say the page slug is “sitemaps” and “links” is the parent page of “sitemaps” (URL: /links/sitemaps/), then you must not have a category named “links” nor “sitemaps”. Since a “sitemap” category is somewhat unusual, I’d say serving the sitemaps on a first level page named “sitemap” is safe.

Disclaimer

I hope this post isn’t clear as mud and everybody can install my stuff without hassles. However, every change of code comes with pitfalls, and I can’t address each and every possibility, so please backup your code before you change it, or play with my script in a development system. I can’t provide support but I’ll try to reply to comments. Have fun at your own risk!

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

2 comments Sebastian | Usability, Blogging, Web development, Tools, SEO

Google and Yahoo accept undelayed meta refreshs as 301 redirects

Posted on 3 September, 2007

Although the meta refresh often gets abused to trick visitors into popup hells by sneaky pages on low-life free hosts (poor man’s cloaking), search engines don’t treat every instance of the meta refresh as Webspam. Folks moving their free hosted stuff to their own domains rely on it to redirect to the new location: <meta http-equiv=refresh content="0; url=http://example.com/newurl" />
Yahoo clearly states how they treat a zero meta refresh, that is a redirect with a delay of zero seconds:

META Refresh: <meta http-equiv=”refresh” content=…> is recognized as a 301 if it specifies little or no delay or as a 302 if it specifies noticeable delay.

Google is in the process of rewriting their documentation, in the current version of their help documents the meta refresh is not (yet!) mentioned. The Google Mini treats all meta refreshs as 302:

A META tag that specifies http-equiv=”refresh” is handled as a 302 redirect.

but that’s handled differently on the Web. I’ve asked Google’s search evangelist Adam Lasnik and he said:

[The] best idea is to use 301/302s directly whenever possible; otherwise, next best is to do a metarefresh with 0 for a 301. I don’t believe we recommend or support any 302-alternative.

Thanks Adam! I’ll update the last meta refresh thread.

If you have the chance to do 301 redirects don’t mess with the meta refresh. Utilize this method only when there’s absolutely no other chance.

Full stop for search geeks. What follows is an explanation for not that experienced Webmasters in need to move their stuff away from greedy Web content funeral services, aka free hosts of any sort.

Ok, now that we know the major search engines accept an undelayed meta refresh as poor man’s 301 redirect, how should a page having this tag look like in order to act as a provisional permanent redirect? As plain and functional as possible: <html> <head> <title>Moved to new URL: http://example.com/newurl</title> <meta http-equiv=refresh content="0; url=http://example.com/newurl" /> <meta name="robots" content="noindex,follow" /> </head> <body> <h1>This page has been moved to http://example.com/newurl</h1> <p>If your browser doesn't redirect you to the new location please <a href="http://example.com/newurl"><b>click here</b></a>, sorry for the hassles!</p> </body> </html>
As long as the server delivers the content above under the old URL sending a 200-OK, Google’s crawl stats should not list the URL under 404 errors. If it does appear under “Not found”, something went awfully bad, probably on the free host’s side. As long as you’ve control over the account, you must not delete the page because the search engines revisit it from time to time checking whether you still redirect with that URL or not.

[Excursus: When a search engine crawler fetches this page, the server returns a 200-OK because, well, it’s there. Acting as a 301/302 does not make it a standard redirect. That sounds confusing to some people, so here is the technical explanation. Server sided response codes like 200, 302, 301, 404 or 410 are sent by the Web server to the user agent in the HTTP header before the server delivers any page content to the user agent (Web browser, search engine crawler, …). The meta refresh OTOH is a client sided directive telling the user agent to disregard the page’s content and to fetch the given (new) URL to render it instead of the initially requested URL. The browser parses the redirect directive out of the file which was received with a HTTP response code 200 (OK). That’s why you don’t get a 302 or 301 when you use a server header checker.]

When a search engine crawler fetches the page above, that’s just the beginning of a pretty complex process. Search engines are large scaled systems which make use of asynchronous communication between tons of highly specialized programs. The crawler itself has nothing to do with indexing. Maybe it follows server sided redirects instantly, but that’s unlikely with meta refreshs because crawlers just fetch Web contents for unprocessed delivery to a data pool from where all sorts of processes like (vertical) indexers pull their fodder. Deleting a redirecting page in the search index might be done by process A running hourly, whilst process B instructing the crawler to fetch the redirect’s destination runs once a day, then the crawler may be swamped so that it delivers the new content a month later to process C which ran just five minutes before the content delivery and starts again not before next Monday if that’s not a bank holiday…

That means the old page may gets deindexed way before the new URL makes it in the search index. If you change anything during this period, you just confuse the pretty complex chain of processes what means that perhaps the search engine starts over by rolling back all transactions and refetching the redirecting page. Not good. Keep all kind of permanent redirects forever.

Actually, a zero meta refresh works like a 301 redirect because the engines (shall) treat is as a permanent redirect, but it’s not a native 301. In fact, due to so much abuse by spammers it might be considered less reliable than a server sided 301 sent in the HTTP header. Hence you want to express your intention clearly to the engines. You do that with several elements of the meta refresh’ing page:

The page title says that the resource was moved and tells the new location. Words like “moved” and “new URL” without surrounding gimmicks clear the message.
The zero (second) delay parameter shows that you don’t deliver visible content to (most) human visitors but switch their user agent right to the new URL.
The “noindex” robots meta tag telling the engines not to index the actual page’s contents is a signal that you don’t cheat. The “follow” value (referring to links in BODY) is just a fallback mechanismn to ensure that engines having troubles to understand the redirect at least follow and index the “click here” link.
The lack of indexable content and keywords makes clear that you don’t try to achieve SE rankings for anything except the new URL.
The H1 heading repeating the title tag’s content on the page, visible for users surfing with meta refresh = off, accelerates the message and helps the engines to figure out the seriousness of your intent.
The same goes for the text message with a clear call for action underlined with the URL introduced by other elements.

Meta refreshs like other client sided redirects (e.g. window.location = "http://example.com/newurl"; in JavaScript) can be found in every spammer’s toolbox, so don’t leave the outdated content on the page and add a JavaScript redirect only to contentless pages like the sample above. Actually, you don’t need to do that, because the number of users surfing with meta-refresh=off is only a tiny fraction of your visitors, and using JavaScript redirects is way more risky (WRT picky search engines) than a zero meta refresh. Also, JavaScript redirects -if captured by a search engine- should count as 302 and you really don’t want to deal with all the disadvantages of soft redirects.

Another interesting question is whether removing the content from the outdated page makes a difference or not. Doing a mass search+replace to insert the meta tags (refresh and robots) with no further changes to the HTML source might seem attractive from a Webmaster’s perspective. It’s fault-prone however. Creating a list mapping outdated pages to their new locations to feed a quick+dirty desktop program generating the simple HTML code above is actually easier and eliminates a couple points of failure.

Finally: Make use of meta refreshs on free hosts only. Professional hosting firms let you do server sided redirects!

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

25 comments Sebastian | URL removal, Web development, Redirects, Crawler Directives, JavaScript Redirects, Yahoo, SEO, Webmaster Central, Google

SEO-sanitizing a WordPress theme in 5 minutes

Posted on 31 August, 2007

When you start a blog with WordPress, you get an overall good crawlability like with most blogging platforms. To get it ranked at search engines your first priority should be to introduce it to your communities acquiring some initial link love. However, those natural links come with disadvantages: canonicalization issues.

“Canonicalization”, what a geeky word. You won’t find it in your dictionary, you must ask Google for a definition. Do the search, the number one result leads to Matt’s blog. Please study both posts before you read on.

Most bloggers linking to you will copy your URLs from the browser’s address bar, or use the neat FireFox “Copy link location” thingy, what leads to canonical inbound links so to say. Others will type in incomplete URLs, or “correct” pasted URLs by removing trailing slashes, “www” prefixes or whatever. Unfortunately, usually both your Web server as well as WordPress are smart enough to find the right page, says your browser at least. What happens totally unseen in the background is that some of these page requests produce a 302-Found elsewhere response, and that search engine crawlers get feeded with various URLs all pointing to the same piece of content. That’s a bad thing with regard to search engine rankings (and enough stuff for a series of longish posts, so just trust me).

Lets begin the WordPress SEO-sanitizing with a fix of the most popular canonicalization issues. Your first step is to tell WordPress that you prefer sane and meaningful URLs without gimmicks. Go to the permalink options, check custom, type in /%postname%/ and save. Later on give each post a nice keyword rich title like “My get rich in a nanosecond scam” and a corresponding slug like “get-rich-in-a-nanosecond”. Next create a plain text file with this code
# Disallow directory browsing: Options -Indexes <IfModule mod_rewrite.c> RewriteEngine On # Fix www vs. non-www issues: RewriteCond %{HTTP_HOST} !^your-blog\.com [NC] RewriteRule (.*) http://your-blog.com/$1 [R=301,L] # WordPress permalinks: RewriteBase / RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] </IfModule>
and upload it in ASCII mode to your server’s root as “.htaccess” (if you don’t host your blog in the root or prefer the “www” prefix change the code accordingly). Change your-blog to your domain name respectively www\.your-domain-name.

This setup will not only produce user friendly post URLs like http://your-blog.com/hello-world/, it will also route all server errors to your theme’s error page. If you don’t blog in the root, learn here how you should handle HTTP errors outside the /blog/ directory (in any case you should use ErrorDocument directives to capture stuff WordPress can’t/shouldn’t handle, e.g. 401, 403, 5xx errors). Load 404.php in an ASCII editor to check whether it will actually send a 404 response. If the very first lines of code don’t look like
<?php @header("HTTP/1.1 404 Not found", TRUE, 404); ?>
then insert the code above and make absolutely sure that you’ve not a single whitespace (space, tab, new line) or visible character before the <?php (grab the code). It doesn’t hurt to make the 404 page friendlier by the way, and don’t forget to check the HTTP response code. Consider calling a 404grabber before you send the 404 header, this is a neat method to do page by page redirects capturing outdated URLs before the visitor gets the error page.

Next you need to hack header.php to fix canonicalization issues which the rewrite rule in the .htaccess file above doesn’t cover. By default WordPress delivers the same page for both the canonical URL http://your-blog.com/hello-world/ and the crappy variant http://your-blog.com/hello-world without the trailing slash. Unfortunately many people who are unaware of Web standards as well as scripts written by clueless assclowns remove trailing slashes to save bandwidth (lame excuse for this bullshit by the way, although even teeny search engines suffer from brain dead code monkeys implementing crap like that).

At the very top of header.php add this PHP code:
<?php $requestUri = $_SERVER[”REQUEST_URI”]; $uriArr = explode(”#”, $requestUri); $requestUriBase = $uriArr[0]; $requestUriFragment = $uriArr[1]; $uriArr = explode(”?”, $requestUriBase); $requestUriBase = $uriArr[0]; $queryString = $_SERVER[”QUERY_STRING”]; if ( substr($requestUriBase, strlen($requestUriBase) -1, 1) != “/” ) { $canonicalUrl = “http://sebastians-pamphlets.com”; $canonicalUrl .= $requestUriBase .”/”; if ($queryString) { $canonicalUrl .= “?” .$queryString; } @header(”HTTP/1.1 301 Moved Permanently”, TRUE, 301); @header(”Location: $canonicalUrl”); exit; } include(”pagefunctions.php”); ?>(Again, not a single whitespace (space, tab, new line) or visible character before the <?php! Grab this code.)

Of course you need to change my URL in the canonicalUrl variable to yours but I don’t mind when you forget it. There’s no such thing as bad traffic. Beyond the canonicalization done above, at this point you can perform all sorts of URL checks and manipulations.

Now you understand why you’ve added the trailing slash in the permalink settings. Not only does the URL look better as a directory link, the trailing slash also allows you to canonicalize your URLs with ease. This works with all kind of WordPress URLs, even archives (although they shall not be crawlable), category archives, pages, of course the main page and whatnot. It can break links when your template has hard coded references to “index.php”, what is quite popular with the search form and needs a fix anyway, because it leads to at least two URLs serving identical contents.

It’s possible to achieve that with a hack in the blog root’s index.php, respectively a PHP script called in .htaccess to handle canonicalization and then including index.php. The index.php might be touched when you update WordPress, so that’s a file you shouldn’t hack. Use the other variant if you see serious performance issues by running through the whole WordPress logic before a possible 301-redirect is finally done in the template’s header.php.

While you’re at the header.php file, you should fix the crappy post titles WordPress generates by default. Prefixing title tags with your blog’s name is downright obscene, irritates readers, and kills your search engine rankings. Hence replace the PHP statement in the TITLE tag with
<title> <?php $pageTitle = wp_title(“”,false); if (empty($pageTitle)) { $pageTitle = get_bloginfo(”name”); } $pageTitle = trim($pageTitle); print $pageTitle; ?> </title> (Grab code.)

Next delete the references to the archives in the HEAD section:
<?php // wp_get_archives(’type=monthly&format=link’); ?>
The “//” tells PHP that the line contains legacy code, SEO wise at least. If your template comes with “index,follow” robots meta tags and other useless meta crap, delete these unnecessary meta tags too.

Well, there are a few more hacks which make sense, for example level-1 page links in the footer and so on, but lets stick with the mere SEO basics. Now we proceed with plugins you really need.

Install automated meta tags, activate and forget it.
Next grab Arne’s sitemaps plugin, activate it and uncheck the archives option in the settings. Don’t generate or submit a sitemap.xml before you’re done with the next steps!
Because you’re a nice gal/guy, you want to pass link juice to your commenters. Hence you install nofollow case by case or another dofollow plugin preventing you from nofollow insane.
Stephen Spencer’s SEO title tag plugin is worth a try too. I didn’t get it to work on this blog (that’s why I hacked the title tag’s PHP code in the first place), but that was probably caused by alzheimer light (=lame excuse for either laziness or goofiness) because it works fine on other blogs I’m involved with. Also, meanwhile I’ve way more code in the title tag, for example to assign sane titles to URLs with query strings, so I can’t use it here.
To eliminate the built-in death by pagination flaw -also outlined here-, you install PagerFix from Richard’s buddy Jaimie Sirovich. Activate it, then hack your templates (category-[category ID].php, category.php, archive.php and index.php) at the very bottom:<?php // posts_nav_link(’ — ‘, __(’« Previous Page’), __(’Next Page »’)); pager_fix(); ?>(Grab code.) The pager_fix() function replaces the single previous/next links with links pointing to all relevant pages, so that every post is maximal two clicks away from the main page respectively its categorized archive page. Clever.

Did I really promise that applying basic SEO to a WordPress blog is done in five minutes? Well, that’s a lie, but you knew that beforehand. I want you to change your sidebar too. First set the archives widget to display as a drop down list or remove it completely. Second, if you’ve more than a handful of categories, remove the categories widget and provide another topically organized navigation like category index pages. Linking to every category from every page dilutes relevancy with regard to search engine indexing, and is -with long lists of categories in tiny sidebar widgets- no longer helpful to visitors.

When you’re at your sidebar.php, go check whether the canonicalization recommended above broke the site search facility or not. If you find a line of HTML code like<form id="searchform" method="get" action="./index.php">
then search is defunct. You should replace this line by<form id="searchform" method="post” action=”<?php $searchUrl = get_bloginfo(’url’); if (substr($searchUrl, -1, 1) != “/”) { $searchUrl .= “/”; } print $searchUrl; ?>”> (grab code)
and here is why: The URL canonicalization routine adds a trailing slash to ./index.php what results in a 404 error. Next if the method is “get” you really want to replace that with “post” because firstly with regard to Google’s guildelines crawlable search results are a bad idea, and secondly GET-forms are a nice hook for negative SEO (that means that folks not exactly on your buddy list can get you ranked for all sorts of naughty out-of-context search terms).

Finally fire up your plain text editor again and create a robots.txt file:
User-agent: * # … Disallow: /2005/ Disallow: /2006/ Disallow: /2007/ Disallow: /2008/ Disallow: /2009/ Disallow: /2010/ # … Sitemap: http://your-site.com/sitemap.xml (If you go for the “www” thingy then you must write “www.your-site.com” in the sitemaps-autodiscovery statement! The robots.txt goes to the root directory, change the paths to /blog/2007/ etcetera if you don’t blog in the root.)

You may ask why I tell you to remove all references to the archives. The answer is that firstly nobody needs them, and secondly they irritate search engines with senseless and superfluous content duplication. As long as you provide logical, topically organized and short paths to your posts, none of your visitors will browse the archives. Would you use the white pages to lookup a phone number when entries aren’t ordered alphabetically but by date of birth instead? Nope, solely blogging software produces crap like that as sole or at least primary navigation. There are only very few good reasons to browse a blog’s monthly archives, thus a selection list is the perfect navigational widget in this case.

Once you’ve written a welcome post, submit your sitemap to Google and Yahoo!, and you’re done with your basic WordPress SEOing. Bear in mind that you don’t receive shitloads of search engine traffic before you’ve acquired good inbound links. However, then you’ll do much better with the search engines when your crawlability is close to perfect.

Updated 09/03/2007 to add Richard’s pager_fix tip from the comments.

Updated 09/05/2007 Lucia made a very good point. When you copy the code from this page where WordPress “prettifies” even <code>PHP code</code>, you end up with crap. I had to learn that not every reader knows that code must be changed when copied from a page where WordPress replaces wonderful plain single as well as double quotes within code/pre tags with fancy symbols stolen from M$-Word. (Actually, I knew it but sometimes I’m bonelazy.) So here is the clean PHP code from above (don’t mess with the quotes, especially don’t replace double quotes with single quotes!).

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

47 comments Sebastian | Duplicate Content, Blogging, Redirects, Web development, XML-Sitemaps, robots.txt, .htaccess, SEO

How to feed old WordPress posts with link love

Posted on 22 August, 2007

WordPress phases old posts out. That’s fine with timely contents, but bloggers publishing persistent stuff suffer from a loss of traffic to evergreens by design. If you’re able to hack your template, you can enhance your link structure in a way that funnels visitors and link love to old posts.

For the sake of this case study, lets assume that the archives are uncrawlable, that is blocked in robots.txt and not linked anywhere in a way that search engine crawlers follow the links. To understand the problem, lets look at the standard link structure of WordPress blogs:
Standard WordPress link structure
Say your category archives display 10 posts per page. The first 10 posts gain a fair amount of visibility (for visitors and search engines), but the next 10 posts land on category archive page #2, which is linked solely from the archive pages #1 and #3. And so on, the freshest 10 posts per category are reachable, older posts phase out.

Lets count that in clicks from the main page. The freshest 10 posts are 2 clicks away. The next bunch of 10 posts is 3 clicks away. Posts on category archive page #3 are 4 clicks away. Consider a link level depth greater than 3 crap. Search engines may index these deeply buried posts on popular blogs, but most visitors don’t navigate that deep into the archives.

Now let me confuse you with another picture. This enhanced link structure should feed each and every post on your blog with links:
Standard WordPress link structure
The structure outlined in the picture is an additional layer, it does not replace the standard architecture.

You get one navigational path connecting any page via one hop (category index page #2 on the image) to any post. That’s two clicks from any page on your blog to any post, but such a mega hub (example) comes with disadvantages when you’ve large archives.

Hence we create more paths to deeply buried posts. Both new category index pages provide links to lean categorized links pages which list all post by category (”[category name] overview” in this example corresponding to category index page #1 on the image above). If both category index pages are linked in the sidebar of all pages, you get a couple two-hop-links to all posts from all pages. That means that via a category index page and a lean category links page (example) each and every post is 3 clicks away from any other page.

Now we’ve got a few shorter paths to old posts, but that’s not enough. We want to make use of the lean category links pages to create topical one-hop-links to related posts too. With most templates every post links to one or more category pages. We can’t replace these links because blog savvy readers clicking these category links expect a standard category page. I’ve added my links to the lean categorized links pages below the comments, and there are many more ways to put them, not only on single post pages.

It’s possible to tweak this concept by flooding pages with navigational links to swipe a click level here and there, but that can dilute topical relevancy because it leads to “every page links to every page” patterns which are not exactly effective nor useful. Also, ornate pages load awfully slow and that’s a sure fire way to lose visitors. By the way that’s the reason why I don’t put a category links widget onto my sidebar.

To implement this concept I hacked the template and wrote a PHP script to output the links lists which is embedded in a standard page (/links/categories/). At the moment this experiment is just food for thoughts, because this blog is quite new (I’ve registered the domain a few days ago). However, I expect it will work.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

9 comments Sebastian | Usability, Blogging, Web development, SEO

Google helps those who help themselves

Posted on 11 July, 2007

And if that’s not enough to survive on Google’s SERPs, try Google’s Webmaster Forum where you can study Adam Lasnik’s FAQ which covers even questions the Webmaster Help Center provides no comprehensive answer for (yet), and where Googlers working in Google’s Search Quality, Webspam, and Webmaster Central teams hang out. Google dumps all sorts of questioners to the forum, where a crowd of hardcore volunteers (aka regulars as Google calls them) invests a lot of time to help out Webmasters and site owners facing problems with the almighty Google.

Despite the sporadic posts by Googlers, the backbone of Google’s Webmaster support channel is this crew of regulars from all around the globe. Google monitors the forum for input and trends, and intervenes when the periodic scandal escalates every once in a while. Apropos scandal … although the list of top posters mentions a few of the regulars, bear in mind that trolls come with a disgusting high posting cadency. Fortunately, currently the signal drowns the noise (again), and I appreciate very much that the Googlers participate more and more.

Some of the regulars like seo101 don’t reveal their URLs and stay anonymous. So here is an incomplete list of folks giving good advice:

Cass-Hacks - Powerful XHTML DHTML presentation and accessibility tools (profile)
Christina - Accessible design and development (profile)
Dockarl - BlixKrieg Wordpress Theme profile
JHL - JHL web design (profile)
JohnMu - Founder of Search Engine Tools, blogger, and now a Googler (profile)
Webado - Web Hosting and Design Canada (profile)
Phil Payne - Website rescue, redesign and maintenance (profile)
Red Cardinal - Search Engine Optimization in Ireland (profile)
DJC (Dori) - We’re coming to Wisconsin (profile)

If I’ve missed anyone, please drop me a line (I stole the list above from JLH and Red Cardinal, so it’s all their fault!).

So when you’re a Webmaster or site owner, don’t hesitate to post your Google related question (but read the FAQ before posting, and search for your topics), chances are one of these regulars or even a Googler offers assistance. Otherwise when you’re questionless carrying a swag of valuable answers, join the group and share your knowledge. Finally, when you’re a Googler, donate the sites linked above a boost on the SERPs

Micro-meme started by John Honeck, supported by Richard Hearne, Bert Vierstra …

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

3 comments Sebastian | Web development, Webmaster Central, SEO, Google

Referrer spoofing with PrefBar 3.4.1

Posted on 24 June, 2007

Testing browser optimization, search engine friendly user-agent cloaking, referrer based navigation or dynamic landing pages with scripts or by changing the user agent name in the browser’s settings is no fun.

I love PrefBar, a neat FireFox plug-in, which provides me with a pretty useful customizable toolbar. With PrefBar you can switch JavaScript, Flash, colors, images, cookies… on and off with one mouse click, and you can enter a list of user agent names to choose the user agent while browsing.

So I’ve asked Manuel Reimer to create a referrer spoofer widget, and kindly he created it with PrefBar 3.4.1. Thank you Manuel!

To activate referrer spoofing in your PrefBar toolbar install or update Prefbar to 3.4.1, then download the Referer Spoof Menulist 1.0, click “Customize” on the toolbar and import the file. Then click on “Edit” to add all the referrer URLs you need for testing purposes, and enjoy. It works great.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

4 comments Sebastian | Spoofing, Testing, Web development, Tools, Cloaking

Code Monkey Very Simple Man/Woman

Posted on 22 February, 2007

Rcjordan over at Threadwatch pointed me to a nice song perfectly explaining romors like “Google’s verification tags get you into supplemental hell” and thoughtless SEO theories like “self-closing meta tags in HTML 4.x documents and uppercase element/attribute names in XHTML documents prevent search engine crawlers from indexing”. You don’t believe such crappy “advice” can make it to the headlines? Just wait for an appropiate thread at your preferred SEO forum picked by a popular but technically challenged blogger. This wacky hogwash is the most popular lame excuse for MSSA issues (aka “Google is broke coz my site sitting at top10 positions since the stone age disappeared all of a sudden”) at Google’s very own Webmaster Central.

Here is a quote:

“The robot [search engine crawler] HAS to read syntactically … And I opt for this explanation exactly because it makes sense to me [the code monkley] that robots have to be dilligent in crawling syntactically in order to do a good job of indexing … The old robots [Googlebot 2.x] did not actually parse syntactically - they sucked in all characters and sifted them into keywords - text but also tags and JS content if the syntax was broken, they didn’t discrimnate. Most websites were originally indexed that way. The new robots [Mozilla compatible Googlebot] parse with syntax in mind. If it’s badly broken (and improper closing of a tag in the head section of a non-xhtml dtd is badly broken), they stop or skip over everything else until they find their bearings again. With a broken head that happens the </html> tag or thereabouts”.

Basically this means that the crawler ignores the remaining code in HEAD or even hops to the end of the document not reading the page’s contents.

In reality search engine crawlers are pretty robust and fault tolerant, designed to eat and digest the worst code one can provide. These Google’s Sandbox“).

Just hire code monkeys for code monkey tasks, and SEOs for everything else

Tags: Search Engine Optimization (SEO)

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

1 comment Sebastian | Web development, Webmaster Central, Uncategorized

Hapless Structures and Weak Linkage

Posted on 21 February, 2007

Michael Martinez over at SEO-Theory (moved!) has a nice write-up on how to get crawled and indexed. The post titled “Search engine love: now they crawl me, now they don’t” discusses the importance of internal linkage, PageRank distribution, and Google’s recent architectural changes — topics which are “hot” in Google’s Webmaster Help Center, where I hang out every now and then. I thought I blog Michael’s nice essay as sort of multi-link-bookmark making link drops easier, so here is some of my stuff related to crawling and indexing:

About Google’s Toolbar-PageRank
High PageRank leads to frequent crawling, but nonetheless ignore green pixels.

The Top-5 Methods to Attract Search Engine Spiders
Get deep links to great content.

Supporting search engine crawling
The syntax of a search engine friendly Web site.

Web Site Structuring
Do’s and don’ts on information architectures.

Optimizing Web Site Navigation
Tweak your UI for users to make it crawler friendly.

Linking is All About Popularity and Authority
LOL: Link out loud.

Related information

Tags: Search Engine Optimization (SEO)

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

Be the first to comment Sebastian | Link Building, Web development, Risky Linkage, SEO, Uncategorized

Smart Web Site Architects Provide Meaningful URLs

Posted on 19 October, 2005

From a typical forum thread on user/search engine friendly Web site design:

Question: Should I provide meaningful URLs carrying keywords and navigational information?

Answer 1: Absolutely, if your information architecture and its technical implementation allow the use of keyword rich hyphened URLs.

Answer 2: Bear in mind that URLs are unchangeable, thus first consider to develop a suitable information architecture and a flexible Web site structure. You’ll learn that folders and URLs are the last thing to think of.

Question: WTF do you mean?

Answer: Here you go, it makes no sense to paint a house before the architect has finished the blueprints.

Tags: Search Engine Optimization (SEO) Information Architecture

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

Be the first to comment Sebastian | Link Building, Web development, SEO, Uncategorized

« Previous Page 1 | 2 | 3 | 4 | 5 Next Page »

Sebastian’s Pamphlets

Archived posts from the 'Web development' Category

How to fuck up click tracking with the JavaScript onclick trigger

Free WordPress Add-on: Categorized Sitemaps

Prerequisites

Download

Installation

Category descriptions

Title tags

Page headings

Feeding the category hubs

Troubleshooting

Disclaimer

Google and Yahoo accept undelayed meta refreshs as 301 redirects

SEO-sanitizing a WordPress theme in 5 minutes

How to feed old WordPress posts with link love

Google helps those who help themselves

Referrer spoofing with PrefBar 3.4.1

Code Monkey Very Simple Man/Woman

Hapless Structures and Weak Linkage

Smart Web Site Architects Provide Meaningful URLs

Categories

Monthly Archives

Links

RSS Feeds