Shit happens, your redirects hit the fan!

confused spiderAlthough robust search engine crawlers are rather fault-tolerant creatures, there is an often overlooked but quite safe procedure to piss off the spiders. Playing redirect ping pong mostly results in unindexed contents. Google reports chained redirects under the initially requested URL as URLs not followed due to redirect errors, and recommends:

Minimize the number of redirects needed to follow a link from one page to another.

The same goes for other search engines, they can’t handle longish chains of redirecting URLs. In other words: all search engines consider URLs involved in longish redirect chains unreliable, not trustworthy, low quality …

What’s that to you? Well, you might play redirect ping pong with search engine crawlers unknowingly. If you’ve ever redesigned a site, chances are you’ve build chained redirects. In most cases those chains aren’t too complex, but it’s worth checking. Bear in mind that Apache, .htaccess, scripts or CMS software and whatnot can perform redirects, often without notice and undetectable with a browser.

I made up this example, but I’ve seen worse redirect chains. Here is the transcript of Ms. Googlebot’s chat with your Web server:
crappy redirect chain

Googlebot: Now that’s a nice link I’ve discovered on this old and trusted page. I can’t wait to fetch it. Hey port 80 at yourstuff.com, would you please be so kind to serve me /some-page?

.htaccess: Oh silly Googlebot, don’t you read Matt’s blog? He told me that a 301 redirect is the canonical answer when someone requests my stuff without the www-prefix. I didn’t bother to lookup the resource you’ve asked for, and why should I since your request is wrong, wrong, wrong! Here is the canonical URL: 301-Moved permanently, Location: http://www.yourstuff.com/some-page.

Googlebot: Dear Web server, of course that’s the best thing to do. I apologize for my silly request, but please note that I’m following a link where the Webmaster of the authority site forgot the dup-dup-dup stuff in the HREF attribute’s value. Can you please serve me /some-page now?

Apache: Doh! Why does this silly bot bother me with an incomplete URI? Since /some-page is not an existing file, probably it’s a directory and a brain dead Webmaster forgot the trailing slash. Lets see … yep, that’s it, and it has even a default document index.php. Dear Ms. Googlebot, here is the correct URL: 302-Found elsewhere, Location: http://www.yourstuff.com/some-page/index.php.

[Most servers will respond with 301-Moved permanently, Location: http://www.yourstuff.com/some-page/, but however the server is configured, it will be a redirect.]

Googlebot: Oh I see, I’ve got a really malformed URI. Sorry for bothering you again, could you please deliver the content of http://www.yourstuff.com/some-page/index.php? Thanks in advance!

/some-page/index.php: Uuaahhh … who’s knocking? I slept sooo long … not fully awake … please be patient while Apache hands me over to the PHP parser. Actually, I’m totally outdated because my Webmaster discovered some blogging software and moved my contents to a WordPress database. Please wait. Where is the darn post-it with the new address? Ahh, just found it under my pillow. Please don’t disturb me again, bye bye, and here you go: 301-Moved permanantly, Location: http://www.yourstuff.com/blog/some-post/.

Googlebot: You’re kidding! Why didn’t you give me this URI in the first place? Never mind, I want http://www.yourstuff.com/blog/some-post/ now.

Hack in /blog/index.php: Oh my god, another bot following outdated links. My Webmaster is clever, he read somewhere that category names in URLs boost search engine rankings and changed the permalink structure to category/postname. Wait a moment, I’m looking up the category for this post, be right back. Hmmm, this category name is not exactly a money term … the heck with it! Here is the keyword stuffed URL I just made for you: 301-Moved permanantly, Location: http://www.yourstuff.com/blog/cat/some-post/.

Googlebot: Four crappy redirects in a row, what a bunch of bull! This is my ultimately last try: gimme http://www.yourstuff.com/blog/cat/some-post/!

WordPress: See, that didn’t hurt, did it? Here you are: 200-OK, the content

Googlebot: Here’s your shabby page. Don’t send me to this assclowns again! I’ve better things to do than running five times to this misconfigured server only to fetch a keyword laden sales pitch with 34 grammar errors, 67 typos, and a ton of affiliate links. Grrrr!

Crawl scheduler: Sorry dear. I can’t blacklist them for stuff like that, but I’ve set the crawl frequency for yourstuff.com to once a year, and I’ve red-flagged the document ID so that the indexer can downrank it accordingly.

Do you really want to treat Ms. Googlebot so badly? Not to speak of the minus points you gain for playing redirect ping pong with a search engine. Maybe most search engines index a page served after four redirects, but I won’t rely on such a redirect chain. It’s quite easy to shorten it. Just delete outdated stuff so that all requests run into a 404-Not found, then write up a list in a format like

Old URI 1 Delimiter New URI 1 \n
Old URI 2 Delimiter New URI 2 \n
  … Delimiter   … \n

and write a simple redirect script which reads this file and performs a 301 redirect to New URI when REQUEST_URI == Old URI. If REQUEST_URI doesn’t match any entry, then send a 404 header and include your actual error page. If you need to change the final URLs later on, you can easily do that in the text file’s right column with search and replace.

Next point the ErrorDocument 404 directive in your root’s .htaccess file to this script. Done. Not looking at possible www/non-www canonicalization redirects, you’ve shortened the number of redirects to one, regardless how often you’ve moved your pages. Don’t forget to add all outdated URLs to the list when you redesign your stuff again, and cover common 3rd party sins like truncating trailing slashes too. The flat file from the example above would look like:

/some-page Delimiter /blog/cat/some-post/ \n
/some-page/ Delimiter /blog/cat/some-post/ \n
/some-page/index.php Delimiter /blog/cat/some-post/ \n
/blog/some-post Delimiter /blog/cat/some-post/ \n
/blog/some-post/ Delimiter /blog/cat/some-post/ \n
  … Delimiter   … \n

With a large site consider a database table, processing huge flat files with every 404 error can come with disadvantages. Also, if you’ve patterns like /blog/post-name/ ==> /blog/cat/post-name/ then don’t generate and process longish mapping tables but cover these redirects algorithmically.

To gather URLs worth a 301 redirect use these sources:

  • Your server logs.
  • 404/301/302/… reports from your server stats.
  • Google’s Web crawl error reports.
  • Tools like XENU’s Link Sleuth which crawl your site and output broken links as well as all sorts of redirects, and can even check your complete Web space for orphans.
  • Sitemaps of outdated structures/site areas.
  • Server header checkers which follow all redirects to the final destination.

Disclaimer: If you suffer from IIS/ASP, free hosts, restrictive hosts like Yahoo or other serious maladies, this post is not for you.

I’m curious, does did your site play redirect ping pong with search engine crawlers?



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

One out of many sure-fire ways to avoid blog comments

ranting on idiotic comment form designsIf your name is John Doe and you don’t blog this rant is not for you, because you don’t suffer from truncated form field values. Otherwise check here whether you annoy comment authors on your blog or not. “Annoy” is the polite version by the way, I’m pissed on 99% of the blogs I read. It took me years to write about this issue eventually. Today I had enough.

Look at this form designed especially for John Doe (john@doe.com) at http://doe.com/, then duplicated onto all blogs out there, and imagine you’re me going to comment on a great post:

I can’t view what I’ve typed in, and even my browser’s suggested values are truncated because the input field is way too narrow. Sometimes I leave post-URLs with a comment, so when I type in the first characters of my URL, I get a long list of shortened entries from which I can’t select anything. When I’m in a bad mood I swear and surf on without commenting.

I’ve looked at a fair amount of WordPress templates recently, and I admit that crappy comment forms are a minor issue with regard to the amount of duplicated hogwash most theme designers steal from each other. However, I’m sick of crappy form usability, so I’ve changed my comment form today:

Now the input fields should display the complete input values in most cases. My content column is 500 pixels wide, so size="42" leaves enough space when a visitor surfs with bigger fonts enlarging the labels. If with very long email addresses or URLs that’s not enough, I’ve added title attributes and onchange triggers which display the new value as tooltip when the visitors navigates to the next input field. Also I’ve maxed out the width of the text area. I hope this 60 seconds hack improves the usability of my comment form.

When do you fire up your editor and FTP client to make your comment form convenient? Even tiny enhancements can make your visitors happier.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

How to get the perfect logo for your blog

When I moved my blog from blogspot to this domain, using my avatar image (which I’ve dropped at tons of places) in the blog’s logo was a natural thing to do. It’s somewhat unique in my atmosphere, it helps folks to remember my first name, and branding with an image in a signal color like red fitted my marketing instincts.

My red crab avatarNow the bad news. A few days later my 5yo daughter taught me that it was not exactly clever. She got a Disney video she wanted to watch with me, and (not so much to my surprise) my blog’s logo was one of the kingpins. Way back when I got the image from a freelance designer, I didn’t think about copyright issues, because I planned to use the image only as thumbnail respectively icon connecting my name to a rememberable picture (of the little mermaid’s crab Sebastian). The bigger version on top of all pages here however had way too much similarities with the Disney character.

Kinda dilemma. Reverting to a text logo was no option. Fortunately I click new home page links on all of my comments, so I remembered that not long ago a blogging cartoonist submitted a note to one of my posts. I wrote an email telling him that I need a new red crab, he replied with a reasonable quote, thus I ordered the logo design. Long story short, I was impressed by his professional attitude, and now you can admire his drawing skills at my blog’s header and footer as well.

Before I bore you with the longish story of my red crab, think of your (blog’s) branding. Do you have a unique logo? Is it compelling and rememberable? If you put it on a page along with 100+ icons gathered from your usual hangouts, will its thumbnail stick out? Does it represent you, your business, your niche, or whatever you blog for? Do you brand yourself at all? Why not? Do it.

Look at a few very popular marketing blogs and follow the authors to their hangouts. You’ll spot that they make consistent use of their individual logos respectively avatars. That’s not coincidence, that’s professionalism. For the records, you can become a rockstar without a logo. If you’re Vanessa Fox or John Andrews you get away with frequently changing icons and even NSFW domain names. However, a conspicuous and witty logo makes branding easier, but a logo is not your brand.

Become creative, but please don’t use a red Disney character known as Sebastian as avatar, or red crabs at all, because I’ve trademarked that. Ok, if you can imagine a cartoonized logo might fit your blog, then read on.

Red crab - rough draft 1As order confirmation Steven from Clangnuts sent me his first ideas asking whether he’s on the right track or not. Indeed he was, and I liked it. Actually, I liked it a lot.

Red crab - rough draft 2Shortly after my reply I got the next previews. Steven had nicely worked in my few wishes. He even showed me with an edited screen shot how the new red crab would look in my blog’s header in variations (manually colored vs. photoshop colored). It looked great.

Finally he sent me four poses to choose from. Bummer. I liked all of them:
My red crab - 4 versions
I picked the first, and got the colored version today. Thanks Steven, you did a great job! I hereby declare that when you need an outstanding logo for your blog you better contact Steven at Clangnuts dot com before you fire up photoshop yourself.

What do you think, is #1 the best choice? Feel free to vote in the comments!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Share Your Sphinn Love!

Sphinn RockstarsDonna started a meme with a very much appreciated compliment - Thanks Donna! Like me she discovered a lot of “new” folks at Sphinn and enjoyed their interesting blogs.

Savoring Sphinn comes with a duty Donna thinks, so she appeals to share the love. She’s right. All of us benefit from Sphinn love, it’s only fair to spread it a little. However, picking only three people I’d never have come accross without Danny’s newest donation to the Internet marketing community is a tough task. Hence I wrote a long numbered list and diced. Alea iacta est. Here are three of the many nice people I met at Sphinn:

Hamlet Batista Tadeusz Szewczyk Tinu Abayomi-Paul
Hamlet Batista Tadeusz Szewczyk Tinu Abayomi-Paul
Blog Blog Blog
Feed Feed Feed
A post I like A post I like A post I like

To those who didn’t make it on this list: That’s just kismet, not bad karma! I bet you’ll appear in someone’s share the sphinn love post in no time.

To you three: Get out your sphinn love post and choose three sphinners writing a feed-worthy blog, preferably people not yet featured elsewhere. I’ve subscribed to a couple feeds of blogs discovered at Sphinn, and so did you. There’s so much great stuff at Sphinn that you’re spoilt for choice.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

If you’re not an Amway millionaire avoid BlogRush like the plague!

Do not click BlogRush affiliate links before you’re fully awake. Oh no, you did it … now praise me because I’ve sneakily disabled the link and read on.
BlogRush

    My BlogRush Summary:

  1. You won’t get free targeted traffic to your niche blog.
  2. You’ll make other people rich.
  3. You’ll piss off your readers.
  4. You’ll promote BlogRush and get nothing in return.
  5. You shouldn’t trust a site fucking up the very first HTTP request.
  6. Pyramid schemes just don’t work for you.

You won’t get free targeted traffic to your niche blog

The niches you can choose from are way too broad. When you operate a niche blog like mine, you can choose “Marketing” or “Computers & Internet”. Guess what great traffic you gain with headlines about elegant click tracking or debunking meta refresh myths from blogs selling MySpace templates to teens or RFID chips to wholesalers? In reality you get hits via blogs selling diet pills to desperate housewives (from my referrer stats!) or viagra to old age pensioners, if you see a single BlogRush referrer in your stats at all. (I’ve read a fair amount of the hype about actually targeted headline delivery in BlogRush widgets. I just don’t buy it from what I see on blogs I visit.)

You’ll make other people rich

Look at the BlogRush widget in your or my sidebar, then visit lots of other niche blogs which are focused more or less on marketing related topics. All these widgets carry ads for generic marketing blogs pitching just another make me rich on the Internet while I sleep scheme or their very own affiliate programs. These blogs, all early adopters, will hoard BlogRush’s traffic potentials. Even if you can sign up at the root to place you at the top of the pyramid referral structure, you can’t avoid that the big boys with gazillions of owed impressions in BlogRush’s “marketing” queue dominate all widgets out there, your’s included. (I heard that John Reese will try to throw a few impressions on tiny blogs before niche bloggers get upset. I doubt that will be enough to keep his widgets up.)

You’ll piss off your readers

Even if some of your readers recognize your BlogRush widget, they’ll wonder why you recommend totally unrelated generic marketing gibberish on your nicely focused blog. Yes, every link you put on your site is a recommendation. You vouch for this stuff when you link out, even when you don’t control the widget’s content. Read Tamar’s Why the Fuss about BlogRush? to learn why this clutter is useless for your visitors. Finally, the widget slows your site down and your visitors hate long loading times.

You’ll promote BlogRush and get nothing in return

When you follow the advice handed out by BlogRush and pitch their service with posts and promotional links on your blog, then you help BlogRush to skyrocket at the search engines. That will bring them a lot of buzz, but you get absolute nothing for your promotional efforts because your referrer link doesn’t land on the SERPs.

You shouldn’t trust a site fucking up the very first HTTP request

Ok, that’s a geeky issue and you don’t need to take it very seriously. Request your BlogRush affiliate link with a plain user agent not accepting cookies or executing client sided scripting, then read the headers. BlogRush does a 302 redirect to their home page rescuing your affiliate ID in an unclosed base href directive. Chances are you’ll never get the promised credits from upsold visitors using uncommon user agents respectively browser settings, because they don’t manage their affiliate traffic properly.

Pyramid schemes just don’t work for you

Unfortunately, common sense is not as common as you might think. I’m guilty of that too, but I’ll leave my widget up for a while to monitor what it brings in. The promise of free traffic is just too alluring, and in fact you can’t lose much. If you want, experiment with it and waste some ad space, but pull it once you’ve realized that it’s not worth it.

Disclaimer

This post was inspired by common sense, experience of life, and a shitload of hyped crap posts on Sphinn’s upcoming list where folks even created multiple accounts to vote their BlogRush sales pitches to the home page. If anything I’ve said here is not accurate or at least plausible, please submit a comment to set the records straight.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

The anatomy of a debunking post

Read this post at Internet HungerCurrently blogging about blogging is not exactly my agenda here. That does not mean I don’t think about it, so perhaps I’ll add this category some day.

Meanwhile please read my guest post the anatomy of a debunking post at Tanner Christensen’s blog Internet Hunger. I hope you’ll enjoy it, and stay tuned for an article by Tanner here.

 



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

How to fuck up click tracking with the JavaScript onclick trigger

Fuck up click trackingThere’s a somewhat heated debate over at Sphinn and many other places as well where folks call each other guppy and dumbass try to figure out whether a particular directory’s click tracking sinks PageRank distribution or not. Besides interesting replies from Matt Cutts, an essential result of this debate is that Sphinn will implement a dumbass button.

Usually I wouldn’t write about desperate PageRank junkies going cold turkey, not even as a TGIF post, but the reason why this blog directory most probably doesn’t pass PageRank is interesting, because it has nothing to do with onclick myths. Of course the existence of an intrinsic event handler (aka onclick trigger) in an A element alone has nothing to do with Google’s take on the link’s intention, hence an onclick event itself doesn’t pull a link’s ability to pass Google-juice.

To fuck up your click tracking you really need to forget everything you’ve ever read in Google’s Webmaster Guidelines. Unfortunately, Web developers usually don’t bother reading dull stuff like that and code the desired functionality in a way that Google as well as other search engines puke on the generated code. However, ignorance is no excuse when Google talks best practices.

Lets look at the code. Code reveals everything and not every piece of code is poetry. That’s crap:
.html: <a href="http://sebastians-pamphlets.com"
id="1234"
onclick="return o('sebastians-blog');">
http://sebastians-pamphlets.com</a>

.js: function o(lnk){ window.open('/out/'+lnk+'.html'); return false; }

The script /out/sebastians-blog.html counts the click and then performs a redirect to the HREF’s value.

Why can and most probably will Google consider the hapless code above deceptive? A human visitor using a JavaScript enabled user agent clicking the link will land exactly where expected. The same goes for humans using a browser that doesn’t understand JS, and users surfing with JS turned off. A search engine crawler ignoring JS code will follow the HREF’s value pointing to the same location. All final destinations are equal. Nothing wrong with that. Really?

Nope. The problem is that Google’s spam filters can analyze client sided scripting, but don’t execute JavaScript. Google’s algos don’t ignore JavaScript code, they parse it to figure out the intent of links (and other stuff as well). So what does the algo do, see, and how does it judge eventually?

It understands the URL in HREF as definitive and ultimate destination. Then it reads the onclick trigger and fetches the external JS files to lookup the o() function. It will notice that the function returns an unconditional FALSE. The algo knows that the return value FALSE will not allow all user agents to load the URL provided in HREF. Even if o() would do nothing else, a human visitor with a JS enabled browser will not land at the HREF’s URL when clicking the link. Not good.

Next the window.open statement loads http://this-blog-directory.com/out/sebastians-blog.html, not http://sebastians-pamphlets.com (truncating the trailing slash is a BS practice as well, but that’s not the issue here). The URLs put in HREF and built in the JS code aren’t identical. That’s a full stop for the algo. Probably it does not request the redirect script http://this-blog-directory.com/out/sebastians-blog.html to analyze its header which sends a Location: http://sebastians-pamphlets.com line. (Actually, this request would tell Google that there’s no deceiptful intent, just plain hapless and overcomplicated coding, what might result in a judgement like “unreliable construct, ignore this link” or so, depending on other signals available).

From the algo’s perspective the JavaScript code performs a more or less sneaky redirect. It flags the link as shady and moves on. Guess what happens in Google’s indexing process with pages that carry tons of shady links … those links not passing PageRank sounds like a secondary problem. Perhaps Google is smart enough not to penalize legit sites for, well, hapless coding, but that’s sheer speculation.

However, shit happens, so every once in a while such a link will slip thru and may even appear in reverse citation results like link: searches or Google Webmaster Central link reports. That’s enough to fool even experts like Andy Beard (maybe Google even shows bogus link data to mislead SEO researches of any kind? Never mind).

Ok, now that we know how not to implement onclick click tracking, here’s an example of a bullet-proof method to track user clicks with the onclick event:
<a href="http://sebastians-pamphlets.com/"
id="link-1234"
onclick="return trackclick(this.href, this.name);">
Sebastian's Pamphlets</a>
trackclick() is a function that calls a server sided script to store the click and returns TRUE without doing a redirect or opening a new window.

Here is more information on search engine friendly click tracking using the onlick event. The article is from 2005, but not outdated. Of course you can add onclick triggers to all links with a few lines of JS code. That’s good practice because it avoids clutter in the A elements and makes sure that every (external) link is trackable. For this more elegant way to track clicks the warnings above apply too: don’t return false and don’t manipulate the HREF’s URL.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Cat post: Life’s getting better!

Even the cat says life's getting betterSince I’ve moved this blog here my traffic has nicely improved. Skip the bragging, hop into the grub. During the past three weeks my feed subscriptions went up by 17%, and daily uniques by 460% (I never got stumbled at blogspot).

Thanks to Google’s newish BlitzIndexing my SERP referrers went up from 1 (August/24/2007) to 40 yesterday, naturally all long tail searches by the way. I didn’t count the bogus MSN referrer spam suggesting I rank for “yahoo”, “pontiac” and whatnot.

The search engine crawlers are quite buzzy too. When Ms. Googlebot was ready fetching all URLs 3-5 times each, Slurp and MSNbot wholehearted joined the game. Google has indexed roughly 500 pages, my Webmaster Central account counts 2,000 inbound links, shows PageRank, anchor text stats and all the other neat stuff. Yahoo has indexed 90 pages and 1,000 inbound links. Although MSN has crawled a lot, they’ve indexed a whopping 2 pages. Wow.

I’d be quite happy since my blog’s life is getting better, if there weren’t that info from Google via Matt Cutts’s blog telling me that 80% of my pages are considered useless crap, at least from Google’s perspective. That’s not a joke. Google dislikes 80% of this wonderful blog, although it contains only 20% Google bashing. Weird.

I repeated the searches below multiple times, so what I’ve spotted is not an isolated phenomenon, nor coincidence. Here’s what the standard site search query shows, 494 indexed pages:
Google site search for Sebastian's Pamphlets

Next I’ve used Matt’s power toy limiting the results to pages Google discovered within the past 30 days (&as_qdr=d30). Please note that 30 days ago this domain didn’t exist. I’ve installed WordPress on August/16/2007, the day I’ve registered the domain, that means 29 days ago I’ve created the very first indexable page. The rather astonishing result is 89 indexed pages:
Google site search for Sebastian's Pamphlets for the past 30 days

Either Matt’s time tunnel for power searchers is only 20% accurate, or 80% of my stuff went straight into the supplemental index from where advanced search can’t pull it.

The latter presumption is plausible, because the site is new, 99% or so of my deep links came in 2-3 weeks ago via 301 redirects so that the pages have no PageRank yet, and for most of the URLs Google noticed near duplicates with source bonus from my old blogspot outlet, not to speak of scraped stuff on low-life servers. Roughly 90 pages can have got noticable PageRank yet, judging from my understanding of my fresh inbound links and my internal linkage. Interestingly, those 90 pages on my blog have a real world timestap after the funeral of the blogspot thingy, and content that didn’t exist over there. That could lead to interesting theories, however I guess that indeed <speculation>time restricted searches don’t pull pages from the supplemental hell</speculation>. Reminds me of the fact that Google’s link stats show nofollow’ed links and all that, but not a single link from a page buried in the supp index.

Did Matt by accident reveal a sure-fire procedure to identify supplemental results? I mean they can’t make timely searches defunct like /& and undocumented stuff like that. I’ve tested the method with two sites where I know the supp ratio and the results were kinda plausible, but that’s no proof. Of course I couldn’t resist to post this vague piece of speculation before doing solid research. Maybe I’m dead wrong.

What do you guys think? Flame me in the comments. :)



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Google says you must manage your affiliate links in order to get indexed

Screwing affiliates recommended by Google ;=)I’ve worked hard to overtake the SERP positions of a couple merchants allowing me to link to them with an affiliate ID, and now the allmighty Google tells the sponsors they must screw me with internal 301 redirects to rescue their rankings. Bugger. Since I read the shocking news on Google’s official Webmaster blog this morning I worked on a counter strategy, with success. Affiliate programs will not screw me, not even with Google’s help. They’ll be hoist by their own petard. I’ll strike back with nofollow and I’ll take no prisoners.

Seriously, the story reads a little different and is not breaking news at all. Maile Ohye from Google just endorsed best practices I’ve recommended for ages. Here is my recap.

The problem

Actually, there are problems on both sides of an affiliate link. The affiliate needs to hide these links from Google to avoid a so called “thin affiliate site penalty”, and the affiliate program suffers from duplicate content issues, link juice dilution, and often even URL hijacking by affiliate links.

Diligent affiliates gathering tons of PageRank on their pages can “unintentionally” overtake URLs on the SERPs by fooling the canonicalization algos. When Google discovers lots of links from strong pages on different hosts pointing to http://sponsor.com/?affid=me and this page adds ?affid=me to its internal links, my URL on the sponsor’s site can “outrank” the official home page, or landing page, http://sponsor.com/. When I choose the right anchor text, Google will feed my affiliate page with free traffic, whilst the affiliate program’s very own pages don’t exist on the SERPs.

Managing incoming affiliate links (merchants)

The best procedure is capturing all incoming traffic before a single byte of content is sent to the user agent, extracting the affiliate ID from the URL, storing it in a cookie, then 301-redirecting the user agent to the canonical version of the landing page, that is a page without affiliate or user specific parameters in the URL. That goes for all user agents (humans accepting the cookie and Web robots which don’t accept cookies and start a new session with every request).

Users not accepting cookies are redirected to a version of the landing page blocked by robots.txt, the affiliate ID sticks with the URLs in this case. Search engine crawlers, identified by their user agent name or whatever, are treated as users and shall never see (internal) links to URLs with tracking parameters in the query string.

This 301 redirect passes all the link juice, that is PageRank & Co. as well as anchor text, to the canonical URL. Search engines can no longer index page versions owned by affiliates. (This procedure doesn’t prevent you from 302 hijacking where your content gets indexed under the affiliate’s URL.)

Putting safe affiliate links (online marketers)

Honestly, there’s no such thing as a safe affiliate link, at least not safe with regard to picky search engines. Masking complex URLs with redirect services like tinyurl.com or so doesn’t help, because the crawlers get the real URL from the redirect header and will leave a note in the record of the original link on the page carrying the affiliate link. Anyways, the tiny URL will fool most visitors, and if you own the redirect service it makes managing affiliate links easier.

Of course you can cloak the hell out of your thin affiliate pages by showing the engines links to authority pages whilst humans get the ads, but then better forget the Google traffic (I know, I know … cloaking still works if you can handle it properly, but not everybody can handle the risks so better leave that to the experts).

There’s only one official approach to make a page plastered with affiliate links safe with search engines: replace it with a content rich page, of course Google wants unique and compelling content and checks its uniqueness, then sensibly work in the commercial links. Best link within the content to the merchants, apply rel-nofollow to all affiliate links, and avoid banner farms in the sidebars and above the fold.

Update: I’ve sanitized the title, “Google says you must screw your affiliates in order to get indexed” was not one of my best title baits.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Free WordPress Add-on: Categorized Sitemaps

In How to feed all posts on a WordPress blog with link love I’ve outlined a method to create short and topically related paths to each and every post even on a large blog. Since not every blogger is PHP savvy enough to implement the concept, some readers asked me to share the sitemaps script.

Ok, here it is. It wasn’t developed as a plugin, and I’m not sure that’s possible (actually, I didn’t think about it), but I’ll do my best to explain the template hacks necessary to get it running smoothly. Needless to say it’s a quick hack and not exactly elegant, however it works here with WordPress 2.2.2. Use it as is at your own risk, yada yada yada, the usual stuff.

I’m a link whore, so please note: If you implement my sitemap script, please link out to any page on my blog. The script inserts a tiny link at the bottom of the sitemap. If you link to my blog under credits, powered by, in the blogroll or whereever, you can remove it. If you don’t link, the search engines shall ban you. ;)

Prerequisites

You should be able to do guided template hacks.

You need a WordPress plugin that enables execution of PHP code within the content of posts and pages. Install one from the list below and test it with a private post or so. Don’t use the visual editor and deactivate the “WordPress should correct invalidly nested XHTML automatically” thingy in Options::Writing. In the post editor write something like
Q: Does my PHP plugin work?
<?php
print "A: Yep, It works.";
?>
and check “enable PHP on this page” (labels differ from plug-in to plug-in), save and click preview. If you see the answer it works. Otherwise try another plug-in:

(Maybe you need to quote your PHP code with special tags like <phpcode></phpcode>, RTFM.)

Consider implementing my WordPress-SEO tweaks to avoid unnecessary code changes. If your permalink structure is not set to custom /%postname%/ giving post/page URLs like http://sebastians-pamphlets.com/about/ you need to tweak my code a little. Not that there’s such a thing as a valid reason to use another permalink structure …

Download

Don’t copy and paste PHP code from this page, it might not work because WordPress prettifies quotes etcetera. Everything you need is on the download page.

Installation

Copy list_categories.php to your template directory /wp-content/themes/yourtemplatename/ on your local disk and upload it to your server.

Create a new page draft, title it “Category Index” or so, and in page content put
<?php @include(TEMPLATEPATH . "/list_categories.php"); ?>
then save and preview it. You should see a category links list like this one. Click the links, check whether the RSS icons show or not, etcetera.

If anything went wrong, load list_categories.php with your preferred editor (not word processor!). Scroll down to edit these variables:
// Customize if necessary:
//$blogLocaction = “sebastians-pamphlets.com”;
// “www.yourserver.com”, “www.yourserver.com/blog” …
// without “http://” and no trailing slash!
//$rssIconPath = “/img/feed-icon-16×16.gif”;
// get a 16*16px rss icon somewhere and upload it you your server,
// then change this path which is relative to the domain’s root.
$rssIconWidth = 16;
$rssIconHeight = 16;
If you edit a variable, remove its “//“. If you use the RSS icon delivered with WordPress, change width and height to 14 pixels. Save the file, upload it to your server, and test again.

If you use Feedburner then click the links to the category feeds, Feedburner shouldn’t redirect them to your blog’s entries feed. I’ve used feed URLs which the Feedburner plug-in doesn’t redirect, but if the shit hits the fan search for the variable $catFeedUrl and experiment with the category-feed URLs.

Your sitemap’s URL is http://your-blog.com/sitemap-page-slug/ (respectively your-blog.com/about/sitemap/ or so when the sitemap has a parent page).

In theory you’re done. You could put a link to the sitemap in your sidebar and move on. In reality you want to prettify it, and you want to max out the SEO effects. Here comes the step by step guide to optimized WordPress sitemaps / topical hubs.

Category descriptions

On your categorized sitemap click any “[category-name] overview” link. You land on a page listing all posts of [category-name] under the generic title “Category Index”, “Sitemap”, or whatever you’ve put in the page’s title. Donate at least a description. Your visitors will love that and when you install a meta tag plugin the search engines will send a little more targeted traffic because your SERP listings look better (sane meta tags don’t boost your rankings but should improve your SERP CTR).

On your dashboard click Manage::Categories and write a nice but keyword rich description for each category. When you reference other categories by name my script will interlink the categories automatically, so don’t put internal links. Now the category links lists (overview pages) look better and carry (lots of) keywords.

The sitemap URL above will not show the descriptions (respectively only as tooltip), but the topical mini-hubs linked as “overview” (category links lists) have it. Your sitemap’s URL with descriptions is http://your-blog.com/sitemap-page-slug/?definitions=TRUE (your-blog.com/about/sitemap/?definitions=TRUE or so when the sitemap has a parent page).

If you want to put a different introduction or footer depending on the appearance of descriptions you can replace the code in your page by:
<?php
// introduction:
if (strtoupper($_GET["definitions"]) == "TRUE") {
print "<p><strong>All categories with descriptions.</strong> (Example)</p>”;
}
else {
if (!isset($_GET[”cat”])) {
print “<p><strong>All categories without descriptions.</strong> (Example)</p>”;
}
}
@include(TEMPLATEPATH . “/list_categories.php”);
// footer as above
?>
(If you use quotes in the print statements then prefix them with a slash, for example: print "<em>yada \"yada\" <a href=\"url\" title=\"string\">yada</a></em>."; will output yada “yada” yada.)

Title tags

The title of the page listing all categories with links to the category pages and feeds is by design used for the category links pages too. WordPress ignores input parameters in URLs like http://your-blog.com/sitemap-page-slug/?cat=category-name.

To give each category links list its own title tag, replace the PHP code in the title tag. Edit header.php:
<title>
<?php
// 1. Everything:
$pageTitle = wp_title(“”,false);
if (empty($pageTitle)) {
$pageTitle = get_bloginfo(”name”);
}
$pageTitle = trim($pageTitle);
// 2. Dynamic category pages:
$input_catName = trim($_GET[”cat”]);
if ($input_catName) {
$input_catName = ucfirst($input_catName);
$pageTitle = $input_catName .” at ” .get_bloginfo(”name”);
}
// 3. If you need a title depending on the appearance of descriptions
$input_catDefs = trim($_GET[”definitions”]);
if ($input_catDefs) {
$pageTitle = “All tags explained by ” .get_bloginfo(”name”);
}
print $pageTitle;
?>
</title>

The first statements just fix the obscene prefix crap most template designers are obsessed about. The second block generates page titles with the category name in it for the topical hubs (if your category slugs and names are identical). You need 1. and 2.; 3. is optional.

Page headings

Now that you’ve neat title tags, what do you think about accurate headings on the category hub pages? To accomplish that you need to edit page.php. Search for a heading (h3 or so) displaying the_title(); and replace this function by:
<h3 class=”entrytitle” id=”post-<?php the_ID(); ?>”> <a href=”<?php the_permalink() ?>” rel=”bookmark”>
<?php
// 1. Dynamic category pages
$input_catName = trim($_GET[”cat”]);
if ($input_catName) {
$input_catName = ucfirst($input_catName);
$dynTitle = “All Posts Tagged ‘” .$input_catName .”‘”;
}
// 2. If you need a heading depending on the appearance of descriptions
$input_catDefs = trim($_GET[”definitions”]);
if ($input_catDefs) {
$dynTitle = “All tags explained”;
}
// 3. Output the heading
if ($dynTitle) print $dynTitle; else the_title();
?>
</a>
</h3>

(The surrounding XHTML code may look different in your template! Replace the PHP code leaving the HTML code as is.)

The first block generates headings with the category name in it for the topical hubs (if your category slugs and names are identical). The last statement outputs either the hub’s heading or the standard title if the actual page doesn’t belong to the script. You need 1. and 3.; 2. is optional.

Feeding the category hubs

With most templates each post links to the categories its tagged with. Besides the links to the category archive pages you want to feed your hubs linking to all posts of each category with a little traffic and topical link juice. One method to accomplish that is linking to the category hubs below the comments. If you don’t read this post on the main page or an archive page, click here for an example. Edit single.php, a line below the comments_template(); call insert something like that:
<br />
<p class="post-info" id="related-links-lists">
<em class="cat">Find related posts in
<?php
$catString = "";
foreach((get_the_category()) as $catItem) {
if (!empty($catString)) $catString .= ", ";
$catName = $catItem->cat_name;
$catSlug = $catItem->category_nicename;
$catUrl = "http://your-blog.com/sitemap-page-slug/?cat="
.strtolower($catSlug);
$catString .= "<a href=\"$catUrl\">$catName</a>";
} // foreach
print $catString;
?>
</em>
</p>
(Study your template’s “post-info” paragraph and ensure that you use the same class names!)

Also, if your descriptions are of glossary quality, then link to your category hubs in your posts. Since most of my posts are dull as dirt, I decided to make the category descriptions an even duller canonical SEO glossary. It’s up to you to become creative and throw together something better, funnier, more useful … you get the idea. If you blog in english and you honestly believe your WordPress sitemap is outstanding, why not post it in the comments? Links are dofollowed in most cases. ;)

Troubleshooting

Test everything before you publish the page and link to the sitemaps.

If you have category descriptions and on the sitemap pages links to other categories within the description are broken: Make sure that the sitemap page’s URL does not contain the name or slug of any of your categories. Say the page slug is “sitemaps” and “links” is the parent page of “sitemaps” (URL: /links/sitemaps/), then you must not have a category named “links” nor “sitemaps”. Since a “sitemap” category is somewhat unusual, I’d say serving the sitemaps on a first level page named “sitemap” is safe.

Disclaimer

I hope this post isn’t clear as mud and everybody can install my stuff without hassles. However, every change of code comes with pitfalls, and I can’t address each and every possibility, so please backup your code before you change it, or play with my script in a development system. I can’t provide support but I’ll try to reply to comments. Have fun at your own risk! ;)



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

« Previous Page  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28  Next Page »