Archived posts from the 'Blogging' Category

Why storing URLs with truncated trailing slashes is an utterly idiocy

Yahoo steals my trailing slashesWith some Web services URL canonicalization has a downside. What works great for major search engines like Google can fire back when a Web service like Yahoo thinks circumcising URLs is cool. Proper URL canonicalization might, for example, screw your blog’s reputation at Technorati.

In fact the problem is not your URL canonicalization, e.g. 301 redirects from http://example.com to http://example.com/ respectively http://example.com/directory to http://example.com/directory/, but crappy software that removes trailing forward slashes from your URLs.

Dear Web developers, if you really think that home page locations respectively directory URLs look way cooler without the trailing slash, then by all means manipulate the anchor text, but do not manipulate HREF values, and do not store truncated URLs in your databases (not that “http://example.com” as anchor text makes any sense when the URL in HREF points to “http://example.com/”). Spreading invalid URLs is not funny. People as well as Web robots take invalid URLs from your pages for various purposes. Many usages of invalid URLs are capable to damage the search engine rankings of the link destinations. You can’t control that, hence don’t screw our URLs. Never. Period.

Folks who don’t agree with the above said read on.

    TOC:

  • What is a trailing slash? About URLs, directory URIs, default documents, directory indexes, …
  • How to rescue stolen trailing slashes About Apache’s handling of directory requests, and rewriting respectively redirecting invalid directory URIs in .htaccess as well as in PHP scripts.
  • Why stealing trailing slashes is not cool Truncating slashes is not only plain robbery (bandwidth theft), it often causes malfunctions at the destination server and 3rd party services as well.
  • How URL canonicalization irritates Technorati 301 redirects that “add” a trailing slash to directory URLs, respectively virtual URIs that mimic directories, seem to irritate Technorati so much that it can’t compute reputation, recent post lists, and so on.

What is a trailing slash?

The Web’s standards say (links and full quotes): The trailing path segment delimiter “/” represents an empty last path segment. Normalization should not remove delimiters when their associated component is empty. (Read the polite “should” as “must”.)

To understand that, lets look at the most common URL components:
scheme:// server-name.tld /path ?query-string #fragment
The (red) path part begins with a forward slash “/” and must consist of at least one byte (the trailing slash itself in case of the home page URL http://example.com/).

If an URL ends with a slash, it points to a directory’s default document, or, if there’s no default document, to a list of objects stored in a directory. The home page link lacks a directory name, because “/” after the TLD (.com|net|org|…) stands for the root directory.

Automated directory indexes (a list of links to all files) should be forbidden, use Options -Indexes in .htaccess to send such requests to your 403-Forbidden page.

In order to set default file names and their search sequence for your directories use DirectoryIndex index.html index.htm index.php /error_handler/missing_directory_index_doc.php. In this example: on request of http://example.com/directory/ Apache will first look for /directory/index.html, then if that doesn’t exist for /directory/index.htm, then /directory/index.php, and if all that fails, it will serve an error page (that should log such requests so that the Webmaster can upload the missing default document to /directory/).

The URL http://example.com (without the trailing slash) is invalid, and there’s no specification telling a reason why a Web server should respond to it with meaningful contents. Actually, the location http://example.com points to Null  (nil, zilch, nada, zip, nothing), hence the correct response is “404 - we haven’t got ‘nothing to serve’ yet”.

The same goes for sub-directories. If there’s no file named “/dir”, the URL http://example.com/dir points to Null too. If you’ve a directory named “/dir”, the canonical URL http://example.com/dir/ either points to a directory index page (an autogenerated list of all files) or the directory’s default document “index.(html|htm|shtml|php|…)”. A request of http://example.com/dir –without the trailing slash that tells the Web server that the request is for a directory’s index– resolves to “not found”.

You must not reference a default document by its name! If you’ve links like http://example.com/index.html you can’t change the underlying technology without serious hassles. Say you’ve a static site with a file structure like /index.html, /contact/index.html, /about/index.html and so on. Tomorrow you’ll realize that static stuff sucks, hence you’ll develop a dynamic site with PHP. You’ll end up with new files: /index.php, /contact/index.php, /about/index.php and so on. If you’ve coded your internal links as http://example.com/contact/ etc. they’ll still work, without redirects from .html to .php. Just change the DirectoryIndex directive from “… index.html … index.php …” to “… index.php … index.html …”. (Of course you can configure Apache to parse .html files for PHP code, but that’s another story.)

It seems that truncating default document names can make sense for services that deal with URLs, but watch out for sites that serve different contents under various extensions of “index” files (intentionally or not). I’d say that folks submitting their ugly index.html files to directories, search engines, top lists and whatnot deserve all the hassles that come with later changes.

How to rescue stolen trailing slashes

Since Web servers know that users are faulty by design, they jump through a couple of resource burning hoops in order to either add the trailing slash so that relative references inside HTML documents (CSS/JS/feed links, image locations, HREF values …) work correctly, or apply voodoo to accomplish that without (visibly) changing the address bar.

With Apache, DirectorySlash On enables this behavior (check whether your Apache version does 301 or 302 redirects, in case of 302s find another solution). You can also rewrite invalid requests in .htaccess when you need special rules:
RewriteEngine on
RewriteBase /content/
RewriteRule ^dir1$ http://example.com/content/dir1/ [R=301,L]
RewriteRule ^dir2$ http://example.com/content/dir2/ [R=301,L]

With content management systems (CMS) that generate virtual URLs on the fly, often there’s no other chance than hacking the software to canonicalize invalid requests. To prevent search engines from indexing invalid URLs that are in fact duplicates of canonical URLs, you’ll perform permanent redirects (301).

Here is a WordPress (header.php) example:
$requestUri = $_SERVER["REQUEST_URI"];
$queryString = $_SERVER["QUERY_STRING"];
$doRedirect = FALSE;
$fileExtensions = array(".html", ".htm", ".php");
$serverName = $_SERVER["SERVER_NAME"];
$canonicalServerName = $serverName;
 
// if you prefer http://example.com/* URLs remove the "www.":
$srvArr = explode(".", $serverName);
$canonicalServerName = $srvArr[count($srvArr) - 2] ."." .$srvArr[count($srvArr) - 1];
 
$url = parse_url ("http://" .$canonicalServerName .$requestUri);
$requestUriPath = $url["path"];
if (substr($requestUriPath, -1, 1) != "/") {
$isFile = FALSE;
foreach($fileExtensions as $fileExtension) {
if ( strtolower(substr($requestUriPath, strlen($fileExtension) * -1, strlen($fileExtension))) == strtolower($fileExtension) ) {
$isFile = TRUE;
}
}
if (!$isFile) {
$requestUriPath .= "/";
$doRedirect = TRUE;
}
}
$canonicalUrl = "http://" .$canonicalServerName .$requestUriPath;
if ($queryString) {
$canonicalUrl .= "?" . $queryString;
}
if ($url["fragment"]) {
$canonicalUrl .= "#" . $url["fragment"];
}
if ($doRedirect) {
@header("HTTP/1.1 301 Moved Permanently", TRUE, 301);
@header("Location: $canonicalUrl");
exit;
}

Check your permalink settings and edit the values of $fileExtensions and $canonicalServerName accordingly. For other CMSs adapt the code, perhaps you need to change the handling of query strings and fragments. The code above will not run under IIS, because it has no REQUEST_URI variable.

Why stealing trailing slashes is not cool

This section expressed in one sentence: Cool URLs don’t change, hence changing other people’s URLs is not cool.

Folks should understand the “U” in URL as unique. Each URL addresses one and only one particular resource. Technically spoken, if you change one single character of an URL, the altered URL points to a different resource, or nowhere.

Think of URLs as phone numbers. When you call 555-0100 you reach the switchboard, 555-0101 is the fax, and 555-0109 is the phone extension of somebody. When you steal the last digit, dialing 555-010, you get nowhere.

Yahoo'ish fools steal our trailing slashesOnly a fool would assert that a phone number shortened by one digit is way cooler than the complete phone number that actually connects somewhere. Well, the last digit of a phone number and the trailing slash of a directory link aren’t much different. If somebody hands out an URL (with trailing slash), then use it as is, or don’t use it at all. Don’t “prettify” it, because any change destroys its serviceability.

If one requests a directory without the trailing slash, most Web servers will just reply to the user agent (brower, screen reader, bot) with a redirect header telling that one must use a trailing slash, then the user agent has to re-issue the request in the formally correct way. From a Webmaster’s perspective, burning resources that thoughtlessly is plain theft. From a user’s perspective, things will often work without the slash, but they’ll be quicker with it. “Often” doesn’t equal “always”:

  • Some Web servers will serve the 404 page.
  • Some Web servers will serve the wrong content, because /dir is a valid script, virtual URI, or page that has nothing to do with the index of /dir/.
  • Many Web servers will respond with a 302 HTTP response code (Found) instead of a correct 301-redirect, so that most search engines discovering the sneakily circumcised URL will index the contents of the canonical URL under the invalid URL. Now all search engine users will request the incomplete URL too, running into unnecessary redirects.
  • Some Web servers will serve identical contents for /dir and /dir/, that leads to duplicate content issues with search engines that index both URLs from links. Most Web services that rank URLs will assign different scorings to all known URL variants, instead of accumulated rankings to both URLs (which would be the right thing to do, but is technically, well, challenging).
  • Some user agents can’t handle (301) redirects properly. Exotic user agents might serve the user an empty page or the redirect’s “error message”, and Web robots like the crawlers sent out by Technorati or MSN-LiveSearch hang up respectively process garbage.

Does it really make sense to maliciously manipulate URLs just because some clueless developers say “dude, without the slash it looks way cooler”? Nope. Stealing trailing slashes in general as well as storing amputated URLs is a brain dead approach.

KISS (keep it simple, stupid) is a great principle. “Cosmetic corrections” like trimming URLs add unnecessary complexity that leads to erroneous behavior and requires even more code tweaks. GIGO (garbage in, garbage out) is another great principle that applies here. Smart algos don’t change their inputs. As long as the input is processible, they accept it, otherwise they skip it.

Exceptions

URLs in print, radio, and offline in general, should be truncated in a way that browsers can figure out the location - “domain.co.uk” in print and “domain dot co dot uk” on radio is enough. The necessary redirect is cheaper than a visitor who doesn’t type in the canonical URL including scheme, www-prefix, and trailing slash.

How URL canonicalization seems to irritate Technorati

Due to the not exactly responsively (respectively swamped) Technorati user support parts of this section should be interpreted as educated speculation. Also, I didn’t research enough cases to come to a working theory. So here is just the story “how Technorati fails to deal with my blog”.

When I moved my blog from blogspot to this domain, I’ve enhanced the faulty WordPress URL canonicalization. If any user agent requests http://sebastians-pamphlets.com it gets redirected to http://sebastians-pamphlets.com/. Invalid post/page URLs like http://sebastians-pamphlets.com/about redirect to http://sebastians-pamphlets.com/about/. All redirects are permanent, returning the HTTP response code “301″.

I’ve claimed my blog as http://sebastians-pamphlets.com/, but Technorati shows its URL without the trailing slash.
…<div class="url"><a href="http://sebastians-pamphlets.com">http://sebastians-pamphlets.com</a> </div> <a class="image-link" href="/blogs/sebastians-pamphlets.com"><img …

By the way, they forgot dozens of fans (folks who “fave’d” either my old blogspot outlet or this site) too.
Blogs claimed at Technorati

I’ve added a description and tons of tags, that both don’t show up on public pages. It seems my tags were deleted, at least they aren’t visible in edit mode any more.
Edit blog settings at Technorati

Shortly after the submission, Technorati stopped to adjust the reputation score from newly discovered inbound links. Furthermore, the list of my recent posts became stale, although I’ve pinged Technorati with every update, and technorati received my update notifications via ping services too. And yes, I’ve tried manual pings to no avail.

I’ve gained lots of fresh inbound links, but the authority score didn’t change. So I’ve asked Technorati’s support for help. A few weeks later, in December/2007, I’ve got an answer:

I’ve taken a look at the issue regarding picking up your pings for “sebastians-pamphlets.com”. After making a small adjustment, I’ve sent our spiders to revisit your page and your blog should be indexed successfully from now on.

Please let us know if you experience any problems in the future. Do not hesitate to contact us if you have any other questions.

Indeed, Technorati updated the reputation score from “56″ to “191″, and refreshed the list of posts including the most recent one.

Of course the “small adjustment” didn’t persist (I assume that a batch process stole the trailing slash that the friendly support person has added). I’ve sent a follow-up email asking whether that’s a slash issue or not, but didn’t receive a reply yet. I’m quite sure that Technorati doesn’t follow 301-redirects, so that’s a plausible cause for this bug at least.

Since December 2007 Technorati didn’t update my authority score (just the rank goes up and down depending on the number of inbound links Technorati shows on the reactions page - by the way these numbers are often unreal and change in the range of hundreds from day to day).
Blog reactions and authority scoring at Technorati

It seems Technorati didn’t index my posts since then (December/18/2007), so probably my outgoing links don’t count for their destinations.
Stale list of recent posts at Technorati

(All screenshots were taken on February/05/2008. When you click the Technorati links today, it could hopefully will look differently.)

I’m not amused. I’m curious what would happen when I add
if (!preg_match("/Technorati/i", "$userAgent")) {/* redirect code */}

to my canonicalization routine, but I can resist to handle particular Web robots. My URL canonicalization should be identical both for visitors and crawlers. Technorati should be able to fix this bug without code changes at my end or weeky support requests. Wishful thinking? Maybe.

Update 2008-03-06: Technorati crawls my blog again. The 301 redirects weren’t the issue. I’ll explain that in a follow-up post soon.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Comment rating and filtering with SezWho

I’ve added SezWho to the comment area. SezWho enables rating and filtering of comments, and shows you even comments an author has left on other blogs. Neat.

Currently there are no ratings, so the existing comments are all rated 2.5 (quite good). Once you’ve rated a few comments, you can suppress all lower quality comments (rated below 3), or show high quality comments only (rated 4 or better).

Don’t freak out when you use CSS that highlights nofollow’ed links. SezWho manipulates the original (mostly dofollow’ed) author link with JavaScript, hence search engines still recognize that a link shall pass PageRank and anchor text. (I condomize some link drops, for example when I don’t know a site and can’t afford the time to check it out, see my comments policy.)

I’ll ask SezWho to change that when I’m more familiar with their system (I hate change requests based on a first peek). SezWho should look at the attributes of the original link in order to add rel=”nofollow” to the JS created version only when the blogger actually has condomized a particular link. Their software changes the comment author URL to a JS script that redirects visitors to the URL the commenter has submitted. It would be nice to show the original URL in the status bar on mouse over.

Also, it seems that when you sign up with SezWho, they remove the trailing slash from your blog’s URL. That’s not acceptable. I mean not every startup should do what clueless Yahoo developers still do although they know that it violates several Web standards. Removing trailing slashes from links is not cool, that’s a crappy manipulation that can harm search engine rankings, will lead to bandwidth theft when bots follow castrated links only to get redirected, … ok, ok, ok … that’s stuff for another post rant. Judging from their Web site, SezWho looks like a decent operation, so I’m sure they can change that too.

 

SezWho sidebar widget

I’ve not yet added the widgets, above is how they would appear in the sidebar.

 

I consider SezWho useful. All functionality lives in the blog and can access the blog’s database, so in theory it doesn’t slow down the page load time by pulling loads of data from 3rd party sources. Please let me know whether you like it or not. Thanks!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Dealing with spamming content thieves / plagiarists (oylinki.com)

Dealing with plagiaristsWhen it comes to crap like plagiarism you shouldn’t consider me a gentleman.

If assclowns like Veronica Domb steal my content and publish it along with likewise stolen comments on their blatantly spamming site oylinki.com, I’m somewhat upset.

Then when I leave a polite note asking the thief Veronica Domb from EmeryVille to remove my stuff asap, see my comment marked as “in moderation”, but neither my content gets removed nor my comment is published within 24 hours, I stay annoyed.

When I’m annoyed, I write blog posts like this one. I’m sure it will rank high enough for [Veronica Domb] when the assclown’s banker or taxman searches for her name. I’m sure it’ll be visible on any SERP that any other (potential) business partner submits at a major search engine.

Content Thieves Veronica Domb et al, P.O.BOX 99800, EmeryVille, 94662, CA are blatant spammers

Hey, outing content thieves is way more fun than filing boring DMCA complaints, and way more effective. Plagiarists do ego searches too, and from now on Veronica Domb from EmeryVille will find the footsteps of her criminal activities on the Web with each and every ego search. Isn’t that nice?

Not. Of course Veronica Domb is a pseudonym of Slade Kitchens, Jamil Akhtar, … However, some plagiarists and scam artists aren’t smart enough to hide their identity, so watch out.

Maybe I’ve done some companies a little favor, because they certainly don’t need to sent out money sneakily “earned” with Web spam and criminal activities that violate the TOS of most affiliate programs.

AdBrite will love to cancel the account for these affiliate links:
http://ads.adbrite.com/mb/text_group.php?sid=448245&br=1 &dk=736d616c6c20627573696e6573735f355f315f776562
http://www.adbrite.com/mb/commerce/purchase_form.php?opid=448245&afsid=1

Google’s webspam team as well as other search engines will most likely delist oylinki.com that comes with 100% stolen text and links and faked whois info as well.

Spamcop and alike will happily blacklist oylinki.com (IP: 66.199.174.80 , cwh2.canadianwebhosting.com) because the assclown’s blog software sends out email spam masked as trackbacks.

If anybody is interested, here’s a track of the real “Veronica Domb” from Canada clicking the link to this post from her WP admin panel:
74.14.107.36 - - [21/Jan/2008:07:50:40 -0500] "GET /outing-plagiarist-2008-01-21/ HTTP/1.1" 200 9921 "http://oylinki.com/blog/wp-admin/edit-comments.php" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SU 3.005; .NET CLR 1.1.4322; InfoPath.1; Alexa Toolbar; .NET CLR 2.0.50727)"

Common sense is not as common as you think.

Disclaimer: I’ve outed plagiarists in the past, because it works. Whether you do that on ego-SERPs or not depends on your ethics. Some folks think that’s even worse than theft and spamming. I say that publishing plagiarisms in the first place deserves bad publicity.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Thanks for all the ego food!

healthy and organic ego food burgerDefine Ego Food : Healthy, organic food for Sebastian’s ego, so it can grow up big and strong.

[Please note that only organic ego-food-burgers are healthy, so please refrain from any blackhatted tactics when praising or flaming me. Also, don’t even think that the greedy guy on the right will answer to the name of Sebastian. Rednecks crabby old farts insurrectionists are bald and wear a black hat.]

 

I’m not yet sure whether the old year ended with an asc(33) or the new year started with an U+0021. However, I want to shout out a loud Thank You! to you, my dear readers. Thanks to you my pamphlets prosper.

I’m not only talking about your very much appreciated kind mentions1 on your blogs. What gets my lazy butt out of my bed to write more pamphlets is another highlight of my day: checking this blog’s MBL and Feedburner stats. In other words: I write because you read, sphinn and stumble my articles.

The 2007 Search Blog Awards

Despite my attempt to cheat my way to a search blog award with a single-candidate-category, Loren over at SEJ decided to accept a nomination of my pamphlets in the Best SEO Blog category. It was a honor to play in that league, and it means a lot to me.

Congrats to Barry, and thanks to the 150 people who voted for me!

Yep, I’ve counted even the 1/2/3-votes, in fact as constructive criticism. I’ve no clue whether the folks who gave me low ratings just didn’t know me or considerd my blog that worthless. Anyway, I take that very seriously and will try to polish up Sebastian’s Pamphlets for the next round.

The 2007 Rubber Chicken Awards (SEM version)

Runner up in the 2007 Rubber Chicken AwardIn related good news, I, Google’s nightmare, have almost won the 2007 Rubber Chicken Award for the dullest most bizarre SEO blog post.

Ranked in row two I’m in good company with Geraldine, Jeff and David. Another post of mine made it in row three.

Congrats to Matt and Sandra who won the most wanted award on the Web!

More Ego Food

While inserting my daily load of blatant comment-author-link spam on several blogs, last night I stumbled upon a neat piece of linkbait from Shaun and couldn’t resist to slap and discredit him. Eventually he banned me, but I can spam via email too. Read the result more ego food tonight: Sebastian’s sauced idiot version of robots.txt pulled by Shaun from the UK’s Scotland’s great Hobo SEO Blog.

What can I improve?

I’m really proud of such a great readership. What do you want to see here this year? I’m blogging in my spare time, but I’ll try to fulfill as many wishes as possible. Please don’t hesitate to post your requests here. Consider the comments my to-do list for 2008. Thank you again, and have a great year!


1  It seems I’m suffering from an inbound link penalty: Technorati recently discoverd my new URL but refuses to update my reputation, despite all my pings, so I’m stuck with a daily link count.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Vote Now: Rubber Chicken Award 2007 for the dullest and most tedious search blog post

Rubber Chicken Award - Top 10 FinalistsI’m truly excited. Two of my pamphlets made it in The Rubber Chicken Award’s Top 10! That’s 50% success (2/4 nominated pamphlets), so please help me to make that 100%: vote for #3 and #4!

Just in case you, dear reader, are not a hardcore SEM addict who reads search blogs even during the holiday season, let me explain why a Rubber Chicken Award Top 10 nomination is a honor.

The Rubber Chicken Award honors the year’s most serious SEO research. Extra brownie points are given to the dullest draft and the most tedious wording.

Rumors are swirling that Google’s search quality spam task force has developed the complex RCAFHITSI©™ algopatent pending® which compiles and ranks search blog posts presented to Mike Blumenthals’s Rubber Chicken Award Jury:

Here is the cream of the crop of the search world, the 2007 Top 10 search blog posts nominated in the Rubber Chicken Award for the dullest and most boring/serious SEO/SEM article:

  1. Want traffic? Rank for High Traffic Keywords…
  2. We Add Words to AdWords… Google Subtracts them
  3. Why eBay and Wikipedia rule Google’s SERPs
  4. SEOs home alone - Google’s nightmare
  5. 13 Things to Do When Your Loved One is Away at Conferences
  6. SEO High School Confidential - Premiere Edition!
  7. The Sphinn Awards - Part I & -Part II.
  8. Top 21 Signs You Need a Break From SEO (2007 version)
  9. 10 Signs That You May Be a Blog Addict
  10. The SEO’s Guide to Beginners
  11. The Internet Marketer’s Nightmare
  12. Mission Accomplished—Top Ranking in Google
  13. Google Interiors - the day my house became searchable

I’ve selfishly marked the two posts you want to vote for. Because all nominations are truly awesome, just vote for everything but make sure to check “5” for #3 and #4:
VOTE NOW

Thank You, Dear Reader!

Update: I can’t post another voting whore call to action today, but of course I’d very much appreciate your vote in the Best SEO Blog of 2007 category at SEJ’s 2007 Search Blog Awards.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Ping the hell out of Technorati’s reputation algo

Ping your inbound links for technorati reputationIf your Technorati reputation factor sucks ass then read on, otherwise happily skip this post.

Technorati calculates a blog’s authority/reputation based on its link popularity, counting blogroll links from the linking blogs main pages as well as links within the contents of their posts. Links older than six months after their very first discovery don’t count.

Unfortunately, Technorati is not always able to find all your inbound links, usually because clueless bloggers forget to ping them, hence your blog might be undervalued. You can change that.

Compile a list of blogs that link to you and are unknown at Technorati, then introduce them below to a cluster ping orgy. Technorati will increase your authority rating after indexing those blogs.

Enter one blog home page URL per line, all lines properly delimited with a “\n” (new line, just hit [RETURN]; “\r” crap doesn’t work). And make sure that all these blogs have an auto-discovery link pointing to a valid feed in their HEAD section. Do NOT ping Technorati with post-URIs! Invest the time to click through to the blog’s main page and submit the blog-URI instead. Post-URI pings get mistaken for noise and trigger spam traps, that means their links will not  increase your Technorati authority/rank.

 

Results:


</p> <p style="color:red; font-weight:bolder;">It seems your user agent can&#8217;t ping Technorati. Go get a <a href="http://www.mozilla.com/en-US/firefox/">browser</a>.</p> <p>

Actually, this tool pings other services than Technorati too. Pingable contents make it on the SERPs, not only at Technorati.

If you make use of URL canonicalization routines that add a trailing slash to invalid URLs like http://example.com then make sure that you claim your blog at Technorati with the trailing slash.

Please note that this tool is experimental and expects a Web standard friendly browser. It might not work for you, and I’ll remove it if it gets abused.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

No more RSS feeds in Google’s search results

Google killing RSS feedsFolks try all sorts of naughty things when by accident a blog’s feed outranks the HTML version of a post. Usually that happened mostly to not that popular blogs, or with very old posts and categorized feeds that contain ancient articles.

The problem seems to be that Google’s Web search doesn’t understand the XML structure of feeds, so that a feed’s textual contents get indexed like stuff from text files. Due to “subscribe” buttons and other links, feeds can gather more PageRank than some HTML pages. Interestingly .xml is considered an unknown file type, and advanced search doesn’t provide a way to search within XML files.

Now that has changed1. Googler Bogdan Stănescu posts on the German Webmaster blog2 We remove feeds from our search results:

As Webmasters many of you were probably worried that your RSS or Atom feeds could outrank the accompanying HTML pages in Google’s search results. The emergence of feeds in our search results could be a poor user experience:

1. Feeds increase the probability that the user gets the same search result twice.

2. Users who click on the feed link on a SERP may miss out on valuable content, which is only available on the HTML page referenced in the XML file.

For these reasons, we have removed feeds from our Web search results - with the exception of podcasts (feeds with media files).

[…] We are aware that in addition to the podcasts out there some feeds exist that are not linked with an HTML page, and that is why it is not quite ideal to remove all feeds from the search results. We’re still open for feedback and suggestions for improvements to the handling of feeds. We look forward to your comments and questions in the crawling, indexing and ranking section of our discussion forum for Webmasters. [Translation mine]

I’m not yet sure whether or not that’s ending in a ban of all/most XML documents. I hope they suppress RSS/Atom feeds only, and provide improved ways to search for and within other XML resources.

So what does that mean for blog SEO? Unless Google provides a procedure to prevent feeds from accumulating PageRank whilst allowing access for blog search crawlers that request feeds (I believe something like that is in the works), it’s still a good idea to nofollow all feed links, but there’s absolutely no reason to block them in robots.txt any more.

I think that’s a great move into the right direction, but a preliminary solution, though. The XML structure of feeds isn’t that hard to parse, and there are only so many ways to extract the URL of the HTML page. Then when a relevant feeds lands in a raw result set, Google should display a link to the HTML version on the SERP. What do you think?


1 Danny reminded me that according to Matt Cutts that’s going on for a few months now.

2 24 hours later Google published the announcement in English language too.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Nominate a red crab in the 2007 Search Blog Awards!

Nominate the red SEO crabToday Loren asked for selfish nominations, thus everybody posts a call for action.
So did I:

• Best search related pamphlets
I hereby selfishly submit my blog.

To no avail:

Sebastian, we’re not going to have a category for Best Pamphlets, but good try :)

There’s no such thing as a Best Crabby Search Pamphlets category just because my blog would be the sole candidate? Ok, I understand that. Really. I didn’t even swear. Yet.

So here’s my call for action. Nominate your favorite blog (that’s mine of course!) in any of the following categories that match:

  • Best SEO Blog
    You’d expect more marketing stuff from an SEO blog.
  • Best SEM Blog
    You’d expect even more marketing stuff, as well as PPC and whatnot. I suck on both.
  • Best SEO Plugin for Wordpress
    I never wrote a WordPress plugin. Actually, this year I hate WordPress because they messed up the database structure in version 2.3 without providing any documentation or at least a reasonable migration procedure. Also their coding standards suck ass and make me puke whenever I see WordPress code.
  • Best Search Agency Resource Blog
    My employers don’t blog.
  • Best Link Building Blog
    Link building pamphlets are rare nowadays.
  • Best Social Media Marketing or Optimization Blog
    I don’t game social media.
  • Best Local Search Blog
    I’m happy when I find my shoes before I leave the house, hence I can’t give any advice on local search.
  • Best Video Search Blog
    I watch x-rated videos only. Probably posting geeky clips doesn’t qualify me.
  • Best Mobile Search Blog
    When I’m on the road I usually search until I give up and ask a cabby for an escort. Cheating this way makes sure I’m not always too late, but doesn’t qualify me for mobile search consultancy.
  • Best Google Blog Not Owned by Google
    I’m not in Google news.
  • Best Search Engine Corporate Blog (owned by the search engines)
    Although I’ve developed a tiny search engine years ago, I fear that smutty results don’t count.
  • Best Contextual Advertising Blog
    My organic traffic is cheaper, and probably as reliable as PPC campaigns.
  • Best Affiliate Marketing Blog
    I sold two Seobook subscriptions recently, does that count?
  • Best Search Engine Community/Forum
    I visit Sphinn and the Google Webmaster forum and never will launch a new forum again.
  • Best New Search Engine of 2007
    See above.
  • Best Search Engine Research Blog
    I revealed that Microsoft plans to relaunch Live Search as porn affiliate program, why eBay and Wikipedia rule Google’s SERPs, and more SEO research like that.
  • Best Search Linkbait of 2007
    When I try it, folks bury it.
  • Breakout Blog of 2007
    I’m blogging since 2005 but moved my blog away from blogspot this year.
  • Best Search Conference Coverage of 2007
    I don’t even attend conferences.
  • Best Search Conference Coverage in Photos
    See above.
  • Best Search Marketing Facebook Group
    Facebook killed my account for spamming or so.
  • Most Giving Search Blogger
    I can’t give away a fraction of Bill Slawski’s great insights.
  • Best Independent Search Blog (not owned by media company or marketing agency)
    What does that mean? Ok, I’m in.
  • Best Search Blog Post of 2007
    I wrote a dull book on redirects, and more.

Oh well. Instead of nominating my stuff better convince Search Engine Journal that they really need a Crabby Pamphlets Category. Or try Category #16 at Performancing.

Update December/28/2007: YAY! Thank you all! Now you can vote for my pamphlets in the “Best SEO Blog of 2007″ category at the SEJ Search Blog Award 2007 contest. Here are the candidates:

It truly is an honor just to be nominated together with these great SEO bloggers.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

BlogCatalog needs professional help

BlogCatalog DevilA while ago I helped BlogCatalog to fix an issue with their JavaScript click tracking that Google considered somewhat crappy. The friendly BlogCatalog guys said thanks, and since then joining BC was on my ToDo-list because it seemed to be a decent service.

Recently I missed my cute red crab icon in a blog’s sidebar widget, realized that it’s powered by BlogCatalog and not MyBlogLog, so I finally signed up.

Roughly 24 hours later I was quite astonished as I received this email:

BlogCatalog - Submission Declined: Sebastian´s Pamphlets

Dear Sebastian,

Thank you for submitting your blog Sebastian`s Pamphlets (http://sebastians-pamphlets.com/) to BlogCatalog.

Unfortunately upon reviewing your blog we are unable to grant it access to the directory.

Your blog was declined for the following reason:

* You did not add a link back to Blog Catalog from your website.
To add a link visit: http://www.blogcatalog.com/buttons.php

If you believe this to be a mistake, you can login to Blog Catalog ( http://www.blogcatalog.com/blogs/manage_blog.html ) and change anything which may have caused it to get declined. After updating your blog, it will be put back into the submission queue.

If you have any questions/comments/suggestions/ideas please feel free to contact us.

Thanks,
BlogCatalog

Crap on, I followed the instructions on http://www.blogcatalog.com/buttons.php:

Meta Tag Verification

Id you’d rather not add a link back to BlogCatalog you can alternatively copy the meta tag listed below and paste it in your site’s home page in the first <head> section of the page, before the first <body> section.

<meta name=”blogcatalog” content=”9BC8674180″ />

It’s laughable to talk about the “first HEAD section” because an HTML file can have only one. Also having more than one BODY section is certainly not compliant to any standard. But bullshit aside, they clearly state that they’re fine with a meta tag if a blogger refuses to add a reciprocal link or even a pile of server sided code that slows down each and every page.

If I remember correctly, BC folks accused of hoarding PageRank defended their policy with statements like

I should quickly clear up that we provide also widgets and meta tags to verify ownership for anyone who doesn’t want to link back to us. We understand PageRank is sacred to many of our bloggers and give them the options to preserve their PR. [emphasis mine, also I’ve removed typos]

Not that I care much about PageRank leaks, but I never link to directories. And why should I when they can verify my submission in other ways?

Obviously, BlogCatalog staff can’t be bothered to view my home page’s source code, and they’ve no scripts capable to find the meta tag
<meta name=”blogcatalog” content=”9BC8674180″ />

in my one and only and therefore first HEAD section.

The meta tag verification is somewhat buried on the policy page, it looks like BlogCatalog chases inbound links no matter what it costs. Dear BlogCatalog, in my case it costs reputation. You guys don’t really think that I send you a private message so that you can silently approve the declined sign up, don’t you? I’m pretty sure that you treat others the same way. Either dump the meta tag verification, or play by your very own rules.

It seems to me that BlogCatalog needs more professional advice from bright consultants (scroll down to Andy’s full disclosure).

Update: A few hours after publishing this post my submission got approved.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

BlogRush amoebas ban high quality blogs in favor of crap

Whilst blogs like The Tampon Blog are considered “high quality” by clueless amoebas hired by BlogRush, many great blogs like Tamar’s were banned by the Reeve gang.

In my book that qualifies BlogRush as a full blown scam. If it’s not a scam, it’s an amateurish operation intended to hoodwink bloggers at least. Hiring low-life surfers for 12 bucks per hour to judge the quality of blogs talking about topics the average assclown on BlogRush’s payroll cannot understand is ridiculous, if not a sign of criminal intent. Here is how they hire their amoebas:

We’re looking to hire a bunch of people that would like to earn some extra cash. If you or someone you know might be interested, please forward this message to them. This would be perfect for a stay-at-home mom, college student, or anyone else looking to make some extra money.

All that’s required is sitting in front of their computer and doing the following…

Login to our Review System with an account we will setup for them. There will be a top “frame” control strip that has a few buttons:

“Approve” “Reject” and “Not Sure.”

The bottom frame will automatically load a blog that needs to be reviewed. After reviewing the blog, just press the appropriate button. That’s it.

* We have created a little training video to teach reviewers what to look for and how to decide what gets approved or rejected. It’s very simple.

After pushing one of the buttons the next blog to be reviewed automatically loads in that bottom frame. It’s as simple as that.

Here’s The Deal…

We’re paying USD $12.00/hour for this review work. It’s not a fortune, but it’s a pretty simple task. Heck, just put on some music and sit back and review some blogs. Pretty easy work. :-)

I’m not pissed because they rejected me and lots of other great blogs. I’m not even pissed because they sent emails like

Congratulations! You are receiving this update because your blog has passed our strict Quality Guidelines and criteria — we believe you have a high-quality blog and we are happy you’re a member of our network!

to blogs which didn’t even bother to put up their crappy widget. I’m pissed because they constantly lie and cheat:

We’ve just completed a massive SWEEP of our entire network. We’ve removed over *10,000* blogs (Yes, ten thousand) that did not meet our new Quality Guidelines.

We have done a huge “quality control audit” of our network and have
reviewed all the blogs one-at-a-time. We will continue to review each
NEW blog that is ever submitted to our network.

You will notice the HUGE DIFFERENCE in the quality of blogs that now
appear in your widget. This major *sweep* of our network will also
increase the click-rates across the entire network and you will start
to receive more traffic.

They still do not send any|much traffic to niche blogs, they still get cheated, and they still have tons of crap in their network. They still overpromise and underdeliver. There’s no such thing as a “massive amount of targeted traffic” sent by BlogRush.

The whole BlogRush operation is a scam. Avoid BlogRush like the plague.

BlogRush's pile of crapUpdate: Here is one of John Reeve’s lame excuses, posted in reply to a “reviewed and dumped by BlogRush idiots” post on John Cow’s blog. A laughable pile of bullcrap, politely put.

John Reese from BlogRush here.

I am not sure why your blog wasn’t approved by the reviewer that reviewed your blog. (We have a team of reviewers.) From what I can tell, your blog passes our guidelines. I’m not sure if the reviewer loaded your blog on a day where your primary post(s) were heavy on the promotional side or not — that’s just a guess of what might have influenced them.

You have my email address from this comment. Please contact me directly (if you wish) and I will investigate the issue for you and see about reactivating your account.

AND FOR THE RECORD…

No one is being BANNED from BlogRush. If any account doesn’t have any approved blogs, the account is moved to an “inactive” status until changes are made or until another blog that meets our guidelines gets approved. Nothing happens to referrals or an account’s referral network; they are left completely intact and as soon as the account is “active” again everything returns to the way it was.

* I just found out that your pingback message was deleted by one of our blog moderators because we don’t want any comments (or pingbacks) showing up for that main post. A few childish users started posting profanity and other garbage that was getting past our filters and we needed to shut it off for now.

There’s no “conspiracy theory” happening. In fact, we’ve been incredibly transparent and honest ever since we launched — openly admitting to mistakes that we’ve made and what we planned to do about them.

~John



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

  1 | 2 | 3  Next Page »