Crap

Archived posts from the 'Crap' Category

Why storing URLs with truncated trailing slashes is an utterly idiocy

Posted on 6 February, 2008

Yahoo steals my trailing slashes With some Web services URL canonicalization has a downside. What works great for major search engines like Google can fire back when a Web service like Yahoo thinks circumcising URLs is cool. Proper URL canonicalization might, for example, screw your blog’s reputation at Technorati.

In fact the problem is not your URL canonicalization, e.g. 301 redirects from http://example.com to http://example.com/ respectively http://example.com/directory to http://example.com/directory/, but crappy software that removes trailing forward slashes from your URLs.

Dear Web developers, if you really think that home page locations respectively directory URLs look way cooler without the trailing slash, then by all means manipulate the anchor text, but do not manipulate HREF values, and do not store truncated URLs in your databases (not that “http://example.com” as anchor text makes any sense when the URL in HREF points to “http://example.com/”). Spreading invalid URLs is not funny. People as well as Web robots take invalid URLs from your pages for various purposes. Many usages of invalid URLs are capable to damage the search engine rankings of the link destinations. You can’t control that, hence don’t screw our URLs. Never. Period.

Folks who don’t agree with the above said read on.

TOC:

What is a trailing slash? About URLs, directory URIs, default documents, directory indexes, …
How to rescue stolen trailing slashes About Apache’s handling of directory requests, and rewriting respectively redirecting invalid directory URIs in .htaccess as well as in PHP scripts.
Why stealing trailing slashes is not cool Truncating slashes is not only plain robbery (bandwidth theft), it often causes malfunctions at the destination server and 3rd party services as well.
How URL canonicalization irritates Technorati 301 redirects that “add” a trailing slash to directory URLs, respectively virtual URIs that mimic directories, seem to irritate Technorati so much that it can’t compute reputation, recent post lists, and so on.

What is a trailing slash?

The Web’s standards say (links and full quotes): The trailing path segment delimiter “/” represents an empty last path segment. Normalization should not remove delimiters when their associated component is empty. (Read the polite “should” as “must”.)

To understand that, lets look at the most common URL components:
scheme:// server-name.tld /path ?query-string #fragment
The (red) path part begins with a forward slash “/” and must consist of at least one byte (the trailing slash itself in case of the home page URL http://example.com/).

If an URL ends with a slash, it points to a directory’s default document, or, if there’s no default document, to a list of objects stored in a directory. The home page link lacks a directory name, because “/” after the TLD (.com|net|org|…) stands for the root directory.

Automated directory indexes (a list of links to all files) should be forbidden, use Options -Indexes in .htaccess to send such requests to your 403-Forbidden page.

In order to set default file names and their search sequence for your directories use DirectoryIndex index.html index.htm index.php /error_handler/missing_directory_index_doc.php. In this example: on request of http://example.com/directory/ Apache will first look for /directory/index.html, then if that doesn’t exist for /directory/index.htm, then /directory/index.php, and if all that fails, it will serve an error page (that should log such requests so that the Webmaster can upload the missing default document to /directory/).

The URL http://example.com (without the trailing slash) is invalid, and there’s no specification telling a reason why a Web server should respond to it with meaningful contents. Actually, the location http://example.com points to Null (nil, zilch, nada, zip, nothing), hence the correct response is “404 - we haven’t got ‘nothing to serve’ yet”.

The same goes for sub-directories. If there’s no file named “/dir”, the URL http://example.com/dir points to Null too. If you’ve a directory named “/dir”, the canonical URL http://example.com/dir/ either points to a directory index page (an autogenerated list of all files) or the directory’s default document “index.(html|htm|shtml|php|…)”. A request of http://example.com/dir -without the trailing slash that tells the Web server that the request is for a directory’s index- resolves to “not found”.

You must not reference a default document by its name! If you’ve links like http://example.com/index.html you can’t change the underlying technology without serious hassles. Say you’ve a static site with a file structure like /index.html, /contact/index.html, /about/index.html and so on. Tomorrow you’ll realize that static stuff sucks, hence you’ll develop a dynamic site with PHP. You’ll end up with new files: /index.php, /contact/index.php, /about/index.php and so on. If you’ve coded your internal links as http://example.com/contact/ etc. they’ll still work, without redirects from .html to .php. Just change the DirectoryIndex directive from “… index.html … index.php …” to “… index.php … index.html …”. (Of course you can configure Apache to parse .html files for PHP code, but that’s another story.)

It seems that truncating default document names can make sense for services that deal with URLs, but watch out for sites that serve different contents under various extensions of “index” files (intentionally or not). I’d say that folks submitting their ugly index.html files to directories, search engines, top lists and whatnot deserve all the hassles that come with later changes.

How to rescue stolen trailing slashes

Since Web servers know that users are faulty by design, they jump through a couple of resource burning hoops in order to either add the trailing slash so that relative references inside HTML documents (CSS/JS/feed links, image locations, HREF values …) work correctly, or apply voodoo to accomplish that without (visibly) changing the address bar.

With Apache, DirectorySlash On enables this behavior (check whether your Apache version does 301 or 302 redirects, in case of 302s find another solution). You can also rewrite invalid requests in .htaccess when you need special rules: RewriteEngine on RewriteBase /content/ RewriteRule ^dir1$ http://example.com/content/dir1/ [R=301,L] RewriteRule ^dir2$ http://example.com/content/dir2/ [R=301,L]

With content management systems (CMS) that generate virtual URLs on the fly, often there’s no other chance than hacking the software to canonicalize invalid requests. To prevent search engines from indexing invalid URLs that are in fact duplicates of canonical URLs, you’ll perform permanent redirects (301).

Here is a WordPress (header.php) example: $requestUri = $_SERVER["REQUEST_URI"]; $queryString = $_SERVER["QUERY_STRING"]; $doRedirect = FALSE; $fileExtensions = array(".html", ".htm", ".php"); $serverName = $_SERVER["SERVER_NAME"]; $canonicalServerName = $serverName; // if you prefer http://example.com/* URLs remove the "www.": $srvArr = explode(".", $serverName); $canonicalServerName = $srvArr[count($srvArr) - 2] ."." .$srvArr[count($srvArr) - 1]; $url = parse_url ("http://" .$canonicalServerName .$requestUri); $requestUriPath = $url["path"]; if (substr($requestUriPath, -1, 1) != "/") { $isFile = FALSE; foreach($fileExtensions as $fileExtension) { if ( strtolower(substr($requestUriPath, strlen($fileExtension) * -1, strlen($fileExtension))) == strtolower($fileExtension) ) { $isFile = TRUE; } } if (!$isFile) { $requestUriPath .= "/"; $doRedirect = TRUE; } } $canonicalUrl = "http://" .$canonicalServerName .$requestUriPath; if ($queryString) { $canonicalUrl .= "?" . $queryString; } if ($url["fragment"]) { $canonicalUrl .= "#" . $url["fragment"]; } if ($doRedirect) { @header("HTTP/1.1 301 Moved Permanently", TRUE, 301); @header("Location: $canonicalUrl"); exit; }
Check your permalink settings and edit the values of $fileExtensions and $canonicalServerName accordingly. For other CMSs adapt the code, perhaps you need to change the handling of query strings and fragments. The code above will not run under IIS, because it has no REQUEST_URI variable.

Why stealing trailing slashes is not cool

This section expressed in one sentence: Cool URLs don’t change, hence changing other people’s URLs is not cool.

Folks should understand the “U” in URL as unique. Each URL addresses one and only one particular resource. Technically spoken, if you change one single character of an URL, the altered URL points to a different resource, or nowhere.

Think of URLs as phone numbers. When you call 555-0100 you reach the switchboard, 555-0101 is the fax, and 555-0109 is the phone extension of somebody. When you steal the last digit, dialing 555-010, you get nowhere.

Yahoo'ish fools steal our trailing slashes Only a fool would assert that a phone number shortened by one digit is way cooler than the complete phone number that actually connects somewhere. Well, the last digit of a phone number and the trailing slash of a directory link aren’t much different. If somebody hands out an URL (with trailing slash), then use it as is, or don’t use it at all. Don’t “prettify” it, because any change destroys its serviceability.

If one requests a directory without the trailing slash, most Web servers will just reply to the user agent (brower, screen reader, bot) with a redirect header telling that one must use a trailing slash, then the user agent has to re-issue the request in the formally correct way. From a Webmaster’s perspective, burning resources that thoughtlessly is plain theft. From a user’s perspective, things will often work without the slash, but they’ll be quicker with it. “Often” doesn’t equal “always”:

Some Web servers will serve the 404 page.
Some Web servers will serve the wrong content, because /dir is a valid script, virtual URI, or page that has nothing to do with the index of /dir/.
Many Web servers will respond with a 302 HTTP response code (Found) instead of a correct 301-redirect, so that most search engines discovering the sneakily circumcised URL will index the contents of the canonical URL under the invalid URL. Now all search engine users will request the incomplete URL too, running into unnecessary redirects.
Some Web servers will serve identical contents for /dir and /dir/, that leads to duplicate content issues with search engines that index both URLs from links. Most Web services that rank URLs will assign different scorings to all known URL variants, instead of accumulated rankings to both URLs (which would be the right thing to do, but is technically, well, challenging).
Some user agents can’t handle (301) redirects properly. Exotic user agents might serve the user an empty page or the redirect’s “error message”, and Web robots like the crawlers sent out by Technorati or MSN-LiveSearch hang up respectively process garbage.

Does it really make sense to maliciously manipulate URLs just because some clueless developers say “dude, without the slash it looks way cooler”? Nope. Stealing trailing slashes in general as well as storing amputated URLs is a brain dead approach.

KISS (keep it simple, stupid) is a great principle. “Cosmetic corrections” like trimming URLs add unnecessary complexity that leads to erroneous behavior and requires even more code tweaks. GIGO (garbage in, garbage out) is another great principle that applies here. Smart algos don’t change their inputs. As long as the input is processible, they accept it, otherwise they skip it.

Exceptions

URLs in print, radio, and offline in general, should be truncated in a way that browsers can figure out the location - “domain.co.uk” in print and “domain dot co dot uk” on radio is enough. The necessary redirect is cheaper than a visitor who doesn’t type in the canonical URL including scheme, www-prefix, and trailing slash.

How URL canonicalization seems to irritate Technorati

Due to the not exactly responsively (respectively swamped) Technorati user support parts of this section should be interpreted as educated speculation. Also, I didn’t research enough cases to come to a working theory. So here is just the story “how Technorati fails to deal with my blog”.

When I moved my blog from blogspot to this domain, I’ve enhanced the faulty WordPress URL canonicalization. If any user agent requests http://sebastians-pamphlets.com it gets redirected to http://sebastians-pamphlets.com/. Invalid post/page URLs like http://sebastians-pamphlets.com/about redirect to http://sebastians-pamphlets.com/about/. All redirects are permanent, returning the HTTP response code “301″.

I’ve claimed my blog as http://sebastians-pamphlets.com/, but Technorati shows its URL without the trailing slash. …<div class="url"><a href="http://sebastians-pamphlets.com">http://sebastians-pamphlets.com</a> </div> <a class="image-link" href="/blogs/sebastians-pamphlets.com"><img …

By the way, they forgot dozens of fans (folks who “fave’d” either my old blogspot outlet or this site) too.
Blogs claimed at Technorati

I’ve added a description and tons of tags, that both don’t show up on public pages. It seems my tags were deleted, at least they aren’t visible in edit mode any more.
Edit blog settings at Technorati

Shortly after the submission, Technorati stopped to adjust the reputation score from newly discovered inbound links. Furthermore, the list of my recent posts became stale, although I’ve pinged Technorati with every update, and technorati received my update notifications via ping services too. And yes, I’ve tried manual pings to no avail.

I’ve gained lots of fresh inbound links, but the authority score didn’t change. So I’ve asked Technorati’s support for help. A few weeks later, in December/2007, I’ve got an answer:

I’ve taken a look at the issue regarding picking up your pings for “sebastians-pamphlets.com”. After making a small adjustment, I’ve sent our spiders to revisit your page and your blog should be indexed successfully from now on.

Please let us know if you experience any problems in the future. Do not hesitate to contact us if you have any other questions.

Indeed, Technorati updated the reputation score from “56″ to “191″, and refreshed the list of posts including the most recent one.

Of course the “small adjustment” didn’t persist (I assume that a batch process stole the trailing slash that the friendly support person has added). I’ve sent a follow-up email asking whether that’s a slash issue or not, but didn’t receive a reply yet. I’m quite sure that Technorati doesn’t follow 301-redirects, so that’s a plausible cause for this bug at least.

Since December 2007 Technorati didn’t update my authority score (just the rank goes up and down depending on the number of inbound links Technorati shows on the reactions page - by the way these numbers are often unreal and change in the range of hundreds from day to day).
Blog reactions and authority scoring at Technorati

It seems Technorati didn’t index my posts since then (December/18/2007), so probably my outgoing links don’t count for their destinations.
Stale list of recent posts at Technorati

(All screenshots were taken on February/05/2008. When you click the Technorati links today, it ~~could~~ hopefully will look differently.)

I’m not amused. I’m curious what would happen when I add if (!preg_match("/Technorati/i", "$userAgent")) {/* redirect code */}
to my canonicalization routine, but I can resist to handle particular Web robots. My URL canonicalization should be identical both for visitors and crawlers. Technorati should be able to fix this bug without code changes at my end or weeky support requests. Wishful thinking? Maybe.

Update 2008-03-06: Technorati crawls my blog again. The 301 redirects weren’t the issue. I’ll explain that in a follow-up post soon.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

52 comments Sebastian | Blogging, Usability, Technorati, Duplicate Content, Web development, .htaccess, Anchor Text, Crap, SEO

The hacker tool MSN-LiveSearch is responsible for brute force attacks

Posted on 1 February, 2008

401 = Private Property, keep out! A while ago I’ve staged a public SEO contest, asking whether the 401 HTTP response code prevents from search engine indexing or not.

Password protected site areas should be safe from indexing, because legit search engine crawlers do not submit user/password combos. Hence their try to fetch a password protected URL bounces with a 401 HTTP response code that translates to a polite “Authorization Required”, meaning “Forbidden unless you provide valid authorization”.

Experience of life and common sense tell search engines, that when a Webmaster protects content with a user/password query, this content is not available to the public. Search engines that respect Webmasters/site owners do not point their users to protected content.

Also, that makes no sense for the search engine. Searchers submitting a query with keywords that match a protected URL would be pissed when they click the promising search result on the SERP, but the linked site responds with an unfriendly “Enter user and password in order to access [title of the protected area]”, that resolves to a harsh error message because the searcher can’t provide such information, and usually can’t even sign up from the 401 error page¹.

Unfortunately, search results that contain URLs of password protected content are valuable tools for hackers. Many content management systems and payment processors that Webmasters use to protect and monetize their contents leave footprints in URLs, for example /members/. Even when those systems can handle individual URLs, many Webmasters leave default URLs in place that are either guessable or well known on the Web.

Developing a script that searches for a string like /members/ in URLs and then “tests” the search results with brute force attacks is a breeze. Also, such scripts are available (for a few bucks or even free) at various places. Without the help of a search engine that provides the lists of protected URLs, the hacker’s job is way more complicated. In other words, search engines that list protected URLs on their SERPs willingly support and encourage hacking, content theft, and DOS-like server attacks.

Ok, lets look at the test results. All search engines have casted their votes now. Here are the winners:

Google

Once my test was out, Matt Cutts from Google researched the question and told me:

My belief from talking to folks at Google is that 401/forbidden URLs that we crawl won’t be indexed even as a reference, so .htacess password-protected directories shouldn’t get indexed as long as we crawl enough to discover the 401. Of course, if we discover an URL but didn’t crawl it to see the 401/Forbidden status, that URL reference could still show up in Google.

Well, that’s exactly the expected behavior, and I wasn’t surprised that my test results confirm Matt’s statement. Thanks to Google’s BlitzIndexing™ Ms. Googlebot spotted the 401 so fast, that the URL never showed up on Google’s SERPs. Google reports the protected URL in my Webmaster Console account for this blog as not indexable.

Yahoo

Yahoo’s crawler Slurp also fetched the protected URL in no time, and Yahoo did the right thing too. I wonder whether or not that’s going to change if M$ buys Yahoo.

Ask

Ask’s crawler isn’t the most diligent Web robot out there. However, somehow Ask has managed not to index a reference to my password protected URL.

And here is the ultimate loser:

MSN LiveSearch

Oh well. Obviously MSN LiveSearch is a must have in a deceitful cracker’s toolbox:

MSN LiveSearch indexes password protected URLs

As if indexing references to password protected URLs wouldn’t be crappy enough, MSN even indexes sitemap files that are referenced in robots.txt only. Sitemaps are machine readable URL submission files that have absolute no value for humans. Webmasters make use of sitemap files to mass submit their URLs to search engines. The sitemap protocol, that MSN officially supports, defines a communication channel between Webmasters and search engines - not searchers, and especially not scrapers that can use indexed sitemaps to steal Web contents more easily. Here is a screen shot of an MSN SERP:

MSN LiveSearch indexes unlinked sitemaps files (MSN SERP)

All the other search engines got the sitemap submission of the test URL too, but none of them fell for it. Neither Google, Yahoo, nor Ask have indexed the sitemap file (they never index submitted sitemaps that have no inbound links by the way) or its protected URL.

Summary

All major search engines except MSN respect the 401 barrier.

Since MSN LiveSearch is well known for spamming, it’s not a big surprise that they support hackers, scrapers and other content thieves.

Of course MSN search is still an experiment, operating in a not yet ready to launch stage, and the big players made their mistakes in the beginning too. But MSN has a history of ignoring Web standards as well as Webmaster concerns. It took them two years to implement the pretty simple sitemaps protocol, they still can’t handle 301 redirects, their sneaky stealth bots spam the referrer logs of all Web sites out there in order to fake human traffic from MSN SERPs (MSN traffic doesn’t exist in most niches), and so on. Once pointed to such crap, they don’t even fix the simplest bugs in a timely manner. I mean, not complying to the HTTP 1.1 protocol from the last century is an evidence of incapacity, and that’s just one example.

Update Feb/06/2008: Last night I’ve received an email from Microsoft confirming the 401 issue. The MSN Live Search engineer said they are currently working on a fix, and he provided me with an email address to report possible further issues. Thank you, Nathan Buggia! I’m still curious how MSN Live Search will handle sitemap files in the future.

¹ Smart Webmasters provide sign up as well as login functionality on the page referenced as ErrorDocument 401, but the majority of all failed logins leave the user alone with the short hard coded 401 message that Apache outputs if there’s no 401 error document. Please note that you shouldn’t use a PHP script as 401 error page, because this might disable the user/password prompt (due to a PHP bug). With a static 401 error page that fires up on invalid user/pass entries or a hit on the cancel button, you can perform a meta refresh to redirect the visitor to a signup page. Bear in mind that in .htaccess you must not use absolute URLs (http://… or https://…) in the ErrorDocument 401 directive, and that on the error page you must use absolute URLs for CSS, images, links and whatnot because relative URIs don’t work there!

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

9 comments Sebastian | Testing, MSN, Search Quality, Crap, Yahoo, .htaccess, Google

Sorry Aaron Wall - I fucked up

Posted on 29 January, 2008

I am sorry, Aaron Wall My somewhat sarcastic post “Avoiding the well known #4 penalty“, where I joked about a possible Google #6 filter and criticized the SEO/Webmaster community for invalid methods of dealing with SERP anomalies, reads like “Aaron Wall is a clueless douche-bag”. Of course that’s not true, I never thought that, and I apologize for damaging Aaron’s reputation so thoughtlessly.

To express that I believe Aaron is a smart and very nice guy, I link to his related great post about things SEOs can learn from search engine bugs and glitches:

Do You Care About Google Glitches? Excerpt:

Glitches reveal engineer intent. And they do it early enough that you have time to change your strategy before your site is permanently filtered or banned. When you get to Google’s size, market share, and have that much data, glitches usually mean something.

To make my point clear: calling a SERP anomaly a filter or penalty unless its intents and causes are properly analyzed, and this analyze is backed up with a reasonable data set, is as thoughtlessly as damaging a fellow SEOs reputation in a way that someone new to the field reading my post and/or comments at Sphinn must think that I’m poking Aaron, although I’m just sick of the almost daily WMW penalty inventions (WMW members -not Aaron!- invented the “Google position #6 penalty / filter” term). The sole reason for mentioning Aaron in my post was that his post (also read this one) triggered a great discussion at Sphinn that I’ve cited in parts.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

4 comments Sebastian | Folks, Crap, SEO

Avoiding the well known #4 SERP-hero-penalty …

Posted on 25 January, 2008

Seb the red claw … I just have to link to North South Media’s neat collection of Search Action Figures.

Paul pretty much dislikes folks who don’t link to him, so Danny Sullivan and Rand Fishkin are well advised to drop a link every now and then, and David Naylor better gives him an interview slot asap.

Google’s numbered “penalties”, esp. #6

As for numeric penalties in general … repeat("Sigh", ∞) … enjoy this brains trust moderated by Marty Weintraub (unauthorized):

Marty: Folks, please welcome Aaron Wall, who recently got his #6 penalty removed!

Audience: clap(26) sphinn(26)

The Gypsy: Sorry Marty but come on… this is complete BS and there is NO freakin #6 filter just like the magical minus 90…900 bla bla bla. These anomalies NEVER have any real consensus on a large enough data set to even be considered a viable theory.

A Red Crab: As long as Bill can’t find a plus|minus-n-raise|penalty patent, or at least a white paper or so leaked out from Google, or for all I care a study that provides proof instead of weird assumptions based on claims of webmasters jumping on todays popular WMW band wagon that aren’t plausible nor verifiable, such beasts don’t exist. There are unexplained effects that might look like a pattern, but in most cases it makes no sense to gather a few examples coming with similarities because we’ll never reach the critical mass of anomalies to discuss a theory worth more than a thumbs-down click.

Marty: Maybe Aaron is joking. Maybe he thinks he has invented the next light bulb.

Gamermk: Aaron is grasping at straws on this one.

Barry Welford: I would like this topic to be seen by many.

Audience: clap(29) sphinn(29)

The Gypsy: It is just some people that have DECIDED on an end result and trying to make various hypothesis fit the situation (you know, like tobacco lobby scientists)… this is simply bad form IMO.

Danny Sullivan: Well, I’ve personally seen this weirdness. Pages that I absolutely thought “what on earth is that doing at six” rather than at the top of the page. Not four, not seven — six. It was freaking weird for several different searches. Nothing competitive, either.

I don’t know that sixth was actually some magic number. Personally, I’ve felt like there’s some glitch or problem with Google’s ranking that has prevented the most authorative page in some instances from being at the top. But something was going on.

Remember, there’s no sandbox, either. We got that for months and months, until eventually it was acknowledge that there were a range of filters that might produce a “sandbox like” effect.

The biggest problem I find with these types of theories is they often start with a specific example, sometimes that can be replicated, then they become a catch-all. Not ranking. Oh, it’s the sandbox. Well no — not if you were an established site, it wasn’t. The sandbox was typicaly something that hit brand new sites. But it became a common excuse for anything, producing confusion.

Jim Boykin: I’ll jump in and say I truely believe in the 6 filter. I’ve seen it. I wouldn’t have believed it if I hadn’t seen it happen to a few sites.

Audience: clap(31) sphinn(31)

A Red Crab: Such terms tend to become a life of their own, IOW an excuse for nearly every way a Webmaster can fuck up rankings. Of course Google’s query engine has thresholds (yellow cards or whatever they call them) that don’t allow some sites to rank above a particular position, but that’s a symtom that doesn’t allow back-references to a particular cause, or causes. It’s speculation as long as we don’t know more.

IncrediBill: I definitely believe it’s some sort of filter or algo tweak but it’s certainly not a penalty which is why I scoff at calling it such. One morning you wake up and Matt has turned all the dials to the left and suddenly some criteria bumps you UP or DOWN. Sites have been going up and down in Google SERPs for years, nothing new or shocking about that and this too will have some obvious cause and effect that could probably be identified if people weren’t using the shotgun approach at changing their site

G1smd: By the time anyone works anything out with Google, they will already be in the process of moving the goalposts to another country.

Slightly Shady SEO: The #6 filter is a fallacy.

Old School: It certainly occured but only affected certain sites.

Danny Sullivan: Perhaps it would have been better called a -5 penalty. Consider. Say Google for some reason sees a domain but decides good, but not sure if I trust it. Assign a -5 to it, and that might knock some things off the first page of results, right?

Look — it could all be coincidence, and it certainly might not necessarily be a penalty. But it was weird to see pages that for the life of me, I couldn’t understand why they wouldn’t be at 1, showing up at 6.

Slightly Shady SEO: That seems like a completely bizarre penalty. Not Google’s style. When they’ve penalized anything in the past, it hasn’t been a “well, I guess you can stay on the frontpage” penalty. It’s been a smackdown to prove a point.

Matt Cutts: Hmm. I’m not aware of anything that would exhibit that sort of behavior.

Audience: Ugh … oohhhh … you weren’t aware of the sandbox, either!

Danny Sullivan: Remember, there’s no sandbox, either. We got that for months and months, until eventually it was acknowledge that there were a range of filters that might produce a “sandbox like” effect.

Audience: Bah, humbug! We so want to believe in our lame excuses …

Tedster: I’m not happy with the current level of analysis, however, and definitely looking for more ideas.

Audience: clap(40) sphinn(40)

Of course the panel above is fictional, respectively assembled from snippets which in some cases change the message when you read them in their context. So please follow the links.

I wouldn’t go that far to say there’s no such thing as a fair amount of Web pages that deserve a #1 spot on Google’s SERPs, but rank #6 for unknown reasons (perhaps link monkey business, staleness, PageRank flow in disarray, anchor text repetitions, …). There’s something worth investigating.

However, I think that labelling a discussion of glitches or maybe filters that don’t behave based on a way too tiny dataset “#6 penalty” leads to the lame excuse for literally anything phenomenon.

Folks who don’t follow the various threads closely enough to spot the highly speculative character of the beast, will take it as fact and switch to winter sleep mode instead of enhancing their stuff like Aaron did. I can’t wait for the first “How to escape the Google -5 penalty” SEO tutorial telling the great unwashed that a “+5″ revisit-after meta tag will heal it.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

18 comments Sebastian | Ego Food, Folks, Fun, Crap, SEO, Google

Dealing with spamming content thieves / plagiarists (oylinki.com)

Posted on 21 January, 2008

Dealing with plagiarists When it comes to crap like plagiarism you shouldn’t consider me a gentleman.

If assclowns like Veronica Domb steal my content and publish it along with likewise stolen comments on their blatantly spamming site oylinki.com, I’m somewhat upset.

Then when I leave a polite note asking the thief Veronica Domb from EmeryVille to remove my stuff asap, see my comment marked as “in moderation”, but neither my content gets removed nor my comment is published within 24 hours, I stay annoyed.

When I’m annoyed, I write blog posts like this one. I’m sure it will rank high enough for [Veronica Domb] when the assclown’s banker or taxman searches for her name. I’m sure it’ll be visible on any SERP that any other (potential) business partner submits at a major search engine.

Content Thieves Veronica Domb et al, P.O.BOX 99800, EmeryVille, 94662, CA are blatant spammers

Hey, outing content thieves is way more fun than filing boring DMCA complaints, and way more effective. Plagiarists do ego searches too, and from now on Veronica Domb from EmeryVille will find the footsteps of her criminal activities on the Web with each and every ego search. Isn’t that nice?

Not. Of course Veronica Domb is a pseudonym of Slade Kitchens, Jamil Akhtar, … However, some plagiarists and scam artists aren’t smart enough to hide their identity, so watch out.

Maybe I’ve done some companies a little favor, because they certainly don’t need to sent out money sneakily “earned” with Web spam and criminal activities that violate the TOS of most affiliate programs.

AdBrite will love to cancel the account for these affiliate links: http://ads.adbrite.com/mb/text_group.php?sid=448245&br=1 &dk=736d616c6c20627573696e6573735f355f315f776562 http://www.adbrite.com/mb/commerce/purchase_form.php?opid=448245&afsid=1

Google’s webspam team as well as other search engines will most likely delist oylinki.com that comes with 100% stolen text and links and faked whois info as well.

Spamcop and alike will happily blacklist oylinki.com (IP: 66.199.174.80 , cwh2.canadianwebhosting.com) because the assclown’s blog software sends out email spam masked as trackbacks.

If anybody is interested, here’s a track of the real “Veronica Domb” from Canada clicking the link to this post from her WP admin panel: 74.14.107.36 - - [21/Jan/2008:07:50:40 -0500] "GET /outing-plagiarist-2008-01-21/ HTTP/1.1" 200 9921 "http://oylinki.com/blog/wp-admin/edit-comments.php" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SU 3.005; .NET CLR 1.1.4322; InfoPath.1; Alexa Toolbar; .NET CLR 2.0.50727)"

Common sense is not as common as you think.

Disclaimer: I’ve outed plagiarists in the past, because it works. Whether you do that on ego-SERPs or not depends on your ethics. Some folks think that’s even worse than theft and spamming. I say that publishing plagiarisms in the first place deserves bad publicity.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

11 comments Sebastian | Webspam, Blogging, Spam, Copyrights, Plagiarism, Crap

BlogCatalog needs professional help

Posted on 13 December, 2007

BlogCatalog Devil A while ago I helped BlogCatalog to fix an issue with their JavaScript click tracking that Google considered somewhat crappy. The friendly BlogCatalog guys said thanks, and since then joining BC was on my ToDo-list because it seemed to be a decent service.

Recently I missed my cute red crab icon in a blog’s sidebar widget, realized that it’s powered by BlogCatalog and not MyBlogLog, so I finally signed up.

Roughly 24 hours later I was quite astonished as I received this email:

BlogCatalog - Submission Declined: Sebastian´s Pamphlets

Dear Sebastian,

Thank you for submitting your blog Sebastian`s Pamphlets (http://sebastians-pamphlets.com/) to BlogCatalog.

Unfortunately upon reviewing your blog we are unable to grant it access to the directory.

Your blog was declined for the following reason:

* You did not add a link back to Blog Catalog from your website.
To add a link visit: http://www.blogcatalog.com/buttons.php

If you believe this to be a mistake, you can login to Blog Catalog ( http://www.blogcatalog.com/blogs/manage_blog.html ) and change anything which may have caused it to get declined. After updating your blog, it will be put back into the submission queue.

If you have any questions/comments/suggestions/ideas please feel free to contact us.

Thanks,
BlogCatalog

Crap on, I followed the instructions on http://www.blogcatalog.com/buttons.php:

Meta Tag Verification

Id you’d rather not add a link back to BlogCatalog you can alternatively copy the meta tag listed below and paste it in your site’s home page in the first <head> section of the page, before the first <body> section.

<meta name=”blogcatalog” content=”9BC8674180″ />

It’s laughable to talk about the “first HEAD section” because an HTML file can have only one. Also having more than one BODY section is certainly not compliant to any standard. But bullshit aside, they clearly state that they’re fine with a meta tag if a blogger refuses to add a reciprocal link or even a pile of server sided code that slows down each and every page.

If I remember correctly, BC folks accused of hoarding PageRank defended their policy with statements like

I should quickly clear up that we provide also widgets and meta tags to verify ownership for anyone who doesn’t want to link back to us. We understand PageRank is sacred to many of our bloggers and give them the options to preserve their PR. [emphasis mine, also I’ve removed typos]

Not that I care much about PageRank leaks, but I never link to directories. And why should I when they can verify my submission in other ways?

Obviously, BlogCatalog staff can’t be bothered to view my home page’s source code, and they’ve no scripts capable to find the meta tag <meta name=”blogcatalog” content=”9BC8674180″ />
in my one and only and therefore first HEAD section.

The meta tag verification is somewhat buried on the policy page, it looks like BlogCatalog chases inbound links no matter what it costs. Dear BlogCatalog, in my case it costs reputation. You guys don’t really think that I send you a private message so that you can silently approve the declined sign up, don’t you? I’m pretty sure that you treat others the same way. Either dump the meta tag verification, or play by your very own rules.

It seems to me that BlogCatalog needs more professional advice from bright consultants (scroll down to Andy’s full disclosure).

Update: A few hours after publishing this post my submission got approved.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

11 comments Sebastian | Blogging, Crap

MSN spam to continue says the Live Search Blog

Posted on 5 December, 2007

It seems MSN/LiveSearch has tweaked their rogue bots and continues to spam innocent Web sites just in case they could cloak. I see a rant coming, but first the facts and news.

Since August 2007 MSN runs a bogus bot faking a human visitor coming from a search results page, that follows their crawler. This spambot downloads everything from a page, that is images and other objects, external CSS/JS files, and ad blocks rendering even contextual advertising from Google and Yahoo. It fakes MSN SERP referrers diluting the search term stats with generic and unrelated keywords. Webmasters running non-adult sites wondered why a database tutorial suddenly ranks for [oral sex] and why MSN sends visitors searching for [MILF pix] to a teenager’s diary. Webmasters assumed that MSN is after deceitful cloaking, and laughed out loud because their webspam detection method was that primitive and easy to fool.

Now MSN admits all their sins -except the launch of a porn affiliate program- and posted a vague excuse on their Webmaster Blog telling the world that they discovered the evil cloakers and their index is somewhat spam free now. Donna has chatted with the MSN spam team about their spambot and reports that blocking its IP addresses is a bad idea, even for sites that don’t cloak. Vanessa Fox summarized MSN’s poor man’s cloaking detection at Search Engine Land:

And one has to wonder how effective methods like this really are. Those savvy enough to cloak may be able to cloak for this new cloaker detection bot as well.

They say that they no longer spam sites that don’t cloak, but reverse this statement telling Donna

we need to be able to identify the legitimate and illegitimate content

and Vanessa

sites that are cloaking may continue to see some amount of traffic from this bot. This tool crawls sites throughout the web — both those that cloak and those that don’t — but those not found to be cloaking won’t continue to see traffic.

Here is an excerpt from yesterdays referrer log of a site that does not cloak, and never did: http://search.live.com/results.aspx?q=webmaster&mrt=en-us&FORM=LIVSOP http://search.live.com/results.aspx?q=smart&mrt=en-us&FORM=LIVSOP http://search.live.com/results.aspx?q=search&mrt=en-us&FORM=LIVSOP http://search.live.com/results.aspx?q=progress&mrt=en-us&FORM=LIVSOP http://search.live.com/results.aspx?q=google&mrt=en-us&FORM=LIVSOP http://search.live.com/results.aspx?q=google&mrt=en-us&FORM=LIVSOP http://search.live.com/results.aspx?q=domain&mrt=en-us&FORM=LIVSOP http://search.live.com/results.aspx?q=database&mrt=en-us&FORM=LIVSOP http://search.live.com/results.aspx?q=content&mrt=en-us&FORM=LIVSOP http://search.live.com/results.aspx?q=business&mrt=en-us&FORM=LIVSOP
Why can’t the MSN dudes tell the truth, not even when they apologize?

Another lie is “we obey robots.txt”. Of course the spambot doesn’t request it to bypass bot traps, but according to MSN it uses a copy served to the LiveSearch crawler “msnbot”:

Yes, this robot does follow the robots.txt file. The reason you don’t see it download it, is that we use a fresh copy from our index. The tool does respect the robots.txt the same way that MSNBot does with a caveat; the tool behaves like a browser and some files that a crawler would ignore will be viewed just like real user would.

In reality, it doesn’t help to block CSS/JS files or images in robots.txt, because MSN’s spambot will download them anyway. The long winded statement above translates to “We promise to obey robots.txt, but if it fits our needs we’ll ignore it”.

Well, MSN is not the only search engine running stealthy bots to detect cloaking, but they aren’t clever enough to do it in a less abusive and detectable way.

Their insane spambot led all cloaking specialists out there to their not that obvious spam detection methods. They may have caught a few cloaking sites, but considering the short life cycle of Webspam on throwaway domains they shot themselves in both feet. What they really have achieved is that the cloaking scripts are MSN spam detection immune now.

Was it really necessary to annoy and defraud the whole Webmaster community and to burn huge amounts of bandwidth just to catch a few cloakers who launched new scripts on new throwaway domains hours after the first appearance of the MSN spam bot?

Can cosmetic changes with regard to their useless spam activities restore MSN’s lost reputation? I doubt it. They’ve admitted their miserable failure five months too late. Instead of dumping the spambot, they announce that they’ll spam away for the foreseeable future. How silly is that? I thought Microsoft is somewhat profit orientated, why do they burn their and our money with such amateurish projects?

Besides all this crap MSN has good news too. Microsoft Live Search told Search Engine Roundtable that they’ll spam our sites with keywords related to our content from now on, at least they’ll try it. And they have a forum and a contact form to gather complaints. Crap on, so much bureaucratic efforts to administer their ridiculous spam fighting funeral. They’d better build a search engine that actually sends human traffic.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

9 comments Sebastian | Spoofing, MSN, Search Quality, Webspam, Crap, Spam, Cloaking

Microsoft funding bankrupt Live Search experiment with porn spam

Posted on 16 November, 2007

If only this headline would be linkbait … of course it’s not sarcastic.

Rumors are out that Microsoft will launch a porn affiliate programm soon. The top secret code name for this project is “pornbucks”, but analysts say that it will be launched as “M$ SMUT CASH” next year or so.

Since Microsoft just can’t ship anything in time, and the usual delays aren’t communicated internally, their search dept. began to promote it to Webmasters this summer.

Surprisingly, Webmasters across the globe weren’t that excited to find promotinal messages from Live Search in their log files, so a somewhat confused MSN dude posted a lame excuse to a large Webmaster forum.

Meanwhile we found out that Microsoft Live Search does not only target the adult entertainment industry, they’re testing the waters with other money terms like travel or pharmaceutic products too.

Anytime soon the Live Search menu bar will be updated to something like this:

Here is the sad -but true- story of a search engine’s downfall.

A few months ago Microsoft Live Search discovered that x-rated referrer spam is a must-have technique in a sneaky smut peddlar’s marketing toolbox.

Since August 2007 a bogus Web robot follows Microsoft’s search engine crawler “MSNbot” to spam the referrer logs of all Web sites out there with URLs pointing to MSN search result pages featuring porn.

Read your referrer logs and you’ll find spam from Microsoft too, but perhaps they peeve you with viagra spam, offer you unwanted but cheap payday loans, or try to enlarge your penis. Of course they know every trick in the book on spam, so check for harmless catchwords too. Here is an example URL: http://search.live.com/results.aspx?q= spammy-keyword &mrt=en-us&FORM=LIVSOP

Microsoft’s spam bot not only leaves bogus URLs in log files, hoping that Webmasters will click them on their referrer stats pages and maybe sign up for something like “M$ Porn Bucks” or so. It downloads and renders even adverts powered by their rival Google, lowering their CTR; obviously to make programs like AdSense less attractive im comparison with Microsoft’s own ads (sorry, no link love from here).

Let’s look at Microsoft’s misleading statement:

The traffic you are seeing is part of a quality check we run on selected pages. While we work on addressing your conerns, we would request that you do not actively block the IP addreses used by this quality check; blocking these IP addresses could prevent your site from being included in the Live Search index.

That’s not traffic, that’s bot activity: These hits come within seconds of being indexed by MSNBot. The pattern is like this: the page is requested by MSNBot (which is authenticated, so it’s genuine) and within a few seconds, the very same page is requested with a live.com search result URL as referer by the MSN spam bot faking a human visitor.
If that’s really a quality check to detect cloaking, that’s more than just lame. The IP addresses don’t change, the bogus bot uses a static user agent name, and there are other footprints which allow every cloaking script out there to serve this sneaky bot the exact same spider fodder that MSNbot got seconds before. This flawed technique might catch poor man’s cloaking every once in a while, but it can’t fool savvy search marketers.
The FUD “could prevent your site from being included in the Live Search index” is laughable, because in most niches MSN search traffic is not existent.

All major search engines, including MSN, promise that they obey the robots exclusion standard. Obeying robots.txt is the holy grail of search engine crawling. A search engine that ignores robots.txt and other normed crawler directives cannot be trusted. The crappy MSN bot not even bothers to read robots.txt, so there’s no chance to block it with standardized methods. Only IP blocking can keep it out, but then it still seems to download ads from Google’s AdSense servers by executing the JavaScript code that the MSN crawler gathered before (not obeying Google’s AdSense robots.txt as well).

This unethical spam bot downloading all images, external CSS and JS files, and whatnot also burns bandwidth. That’s plain theft.

Since this method cannot detect (most) cloaking, and the so called “search quality control bot” doesn’t stop visiting sites which obviously do not cloak, it is a sneaky marketing tool. Whether or not Microsoft Live Search tries to promote cyberspace porn and on-line viagra shops plays no role. Even spamming with safe-at-work keywords is evil. Do these assclowns really believe that such unethical activities will increase the usage of their tiny and pretty unpopular search engine? Of course they do, otherwise they would have shutted down the spam bot months ago.

Dear reader, please tell me: what do you think of a search engine that steals (bandwidth and AdSense revenue), lies, spams away, and is not clever enough to stop their criminal activities when they’re caught?

Recently a Live Search rep whined in an interview because so many robots.txt files out there block their crawler:

One thing that we noticed for example while mining our logs is that there are still a fair number of sites that specifically only allow Googlebot and do not allow MSNBot.

There’s a suitable answer, though. Update your robots.txt:User-agent: MSNbot Disallow: /

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

9 comments Sebastian | Internet Marketing, MSN, Search Quality, Spam, Crawler Directives, Crap

Gaming Sphinn is not worth it

Posted on 3 November, 2007

Thou shalt not spam Sphinn! OMFG, yet another post on Sphinn? Yup. I tell you why gaming Sphinn is counter productive, because I just don’t want to read another whiny rant in the lines of “why do you ignore my stuff whilst A listers [whatever this undefined term means] get their crap sphunn hot in no time”. Also, discussions assuming that success equals bad behavior like this or this one aren’t exactly funny nor useful. As for the whiners: Grow the fuck up and produce outstanding content, then network politely but not obtrusive to promote it. As for the gamers: Think before you ruin your reputation!

What motivates a wannabe Internet marketer to game Sphinn?

Traffic of course, but that’s a myth. Sphinn sends very targeted traffic but also very few visitors (see my stats below).

Free uncondomized links. Ok, that works, one can gain enough link love to get a page indexed by the search engines, but for this purpose it’s not necessary to push the submission to the home page.

Attention is up next. Yep, Sphinn is an eldorado for attention whores, but not everybody is an experienced high-class call girl. Most are amateurs giving it a (first) try, or wrecked hookers pushing too hard to attract positive attention.

The keyword is positive attention. Sphinners are smart, they know every trick in the book. Many of them make a living with ~~gaming~~ creative use of social media. Cheating professional gamblers is a waste of time, and will not produce positive attention. Even worse, the shit sticks at the handle of the unsuccessful cheater (and in many cases the real name). So if you want to burn your reputation, go found a voting club to feed your crap.

Fortunately, getting caught for artificial voting at Sphinn comes with devalued links too. The submitted stories are taken off the list, that means no single link at Sphinn (besides profile pages) feeds them any more, hence search engines forget them. Instead of a good link from an unpopular submission you get zilch when you try to cheat your way to the popular links pages.

Although Sphinn doesn’t send shitloads of traffic, this traffic is extremely valuable. Many spinners operate or control blogs and tend to link to outstanding articles they found at Sphinn. Many sphinners have accounts on other SM sites too, and bookmark/cross-submit good content. It’s not unusual that 10 visits from Sphinn result in hundreds or even thousands of hits from StumbleUpon & Co. — but spinners don’t bookmark/blog/cross-submit/stumble crap.

So either write great content and play by the rules, or get nowhere with your crappy submission. The first “10 reasons why 10 tricks posts about 10 great tips to write 10 numbered lists” submission was fun. The 10,000 plagiarisms following were just boring noise. Nobody except your buddies or vote bots sphinn crap like that, so don’t bother to provide the community with footprints of your lousy gaming.

If you’re playing number games, here is why ruining a reputation by gaming Sphinn is not worth it. Look at my visitor stats from July to today. I got 3.6k referrers in 4 months from Sphinn because a few of my posts went hot. When a post sticks with 1-5 votes, you won’t attract much more click throughs than from those 1-5 folks who sphunn it (that would give 100-200 hits or so with the same amount of submissions). When you cheat, the story gets buried and you get nothing but flames. Think about that. Thanks.

Rank	Last Date/Time	Referral Site	Count
1	Oct 09, 2007 @ 23:29	http: / / sphinn.com/ story/ 1622	504
2	Oct 23, 2007 @ 14:53	http: / / sphinn.com/ story/ 2764	419
3	Nov 01, 2007 @ 03:42	http: / / sphinn.com	293
4	Oct 08, 2007 @ 04:21	http: / / sphinn.com/ story/ 5469	288
5	Nov 02, 2007 @ 13:35	http: / / sphinn.com/ story/ 8883	192
6	Oct 09, 2007 @ 23:38	http: / / sphinn.com/ story/ 4335	185
7	Oct 22, 2007 @ 23:55	http: / / sphinn.com/ story/ 5362	139
8	Oct 29, 2007 @ 15:02	http: / / sphinn.com/ upcoming	131
9	Nov 02, 2007 @ 13:34	http: / / sphinn.com/ story/ 7170	131
10	Sep 10, 2007 @ 09:09	http: / / sphinn.com/ story/ 1976	116
11	Oct 15, 2007 @ 22:40	http: / / sphinn.com/ story/ 6122	113
12	Sep 22, 2007 @ 13:39	http: / / sphinn.com/ story/ 3593	90
13	Oct 05, 2007 @ 21:56	http: / / sphinn.com/ story/ 5648	87
14	Sep 22, 2007 @ 13:25	http: / / sphinn.com/ story/ 4072	80
15	Oct 14, 2007 @ 17:24	http: / / sphinn.com/ story/ 5973	77
16	Aug 30, 2007 @ 04:17	http: / / sphinn.com/ story/ 1796	72
17	Oct 16, 2007 @ 05:46	http: / / sphinn.com/ story/ 6761	61
18	Oct 11, 2007 @ 05:56	http: / / sphinn.com/ story/ 1447	60
19	Sep 13, 2007 @ 12:27	http: / / sphinn.com/ story/ 4548	54
20	Nov 02, 2007 @ 22:14	http: / / sphinn.com/ story/ 11547	53
21	Sep 03, 2007 @ 09:34	http: / / sphinn.com/ story/ 4068	44
22	Oct 09, 2007 @ 23:40	http: / / sphinn.com/ story/ 5093	42
23	Nov 02, 2007 @ 01:46	http: / / sphinn.com/ story/ 248	41
24	Sep 14, 2007 @ 05:58	http: / / sphinn.com/ story/ 2287	36
25	Oct 31, 2007 @ 06:17	http: / / sphinn.com/ story/ 11205	35
26	Oct 07, 2007 @ 12:07	http: / / sphinn.com/ story/ 6124	25
27	Nov 01, 2007 @ 09:41	http: / / sphinn.com/ user/ view/ profile/ Sebastian	22
28	Aug 08, 2007 @ 10:52	http: / / sphinn.com/ story/ 245	21
29	Sep 02, 2007 @ 19:17	http: / / sphinn.com/ story/ 3877	17
30	Sep 22, 2007 @ 00:42	http: / / sphinn.com/ story/ 4968	17
31	Oct 01, 2007 @ 12:49	http: / / sphinn.com/ story/ 5310	17
32	Aug 30, 2007 @ 08:20	http: / / sphinn.com/ story/ 4143	14
33	Sep 11, 2007 @ 21:38	http: / / sphinn.com/ story/ 3783	13
34	Nov 01, 2007 @ 15:50	http: / / sphinn.com/ published/ page/ 2	11
35	Sep 01, 2007 @ 23:03	http: / / sphinn.com/ story/ 597	10
36	Oct 24, 2007 @ 18:17	http: / / sphinn.com/ story/ 1767	10
37	Sep 15, 2007 @ 08:26	http: / / sphinn.com/ story.php? id= 5469	8
38	Oct 30, 2007 @ 09:42	http: / / sphinn.com/ upcoming/ mostpopular	7
39	Oct 24, 2007 @ 18:38	http: / / sphinn.com/ story/ 10881	7
40	Oct 30, 2007 @ 01:19	http: / / sphinn.com/ upcoming/ page/ 2	6
41	Sep 20, 2007 @ 07:09	http: / / sphinn.com/ user/ view/ profile/ login/ Sebastian	5
42	Jul 22, 2007 @ 09:39	http: / / sphinn.com/ story/ 1017	5
43	Oct 13, 2007 @ 08:34	http: / / sphinn.com/ published/ week	5
44	Sep 08, 2007 @ 04:17	http: / / sphinn.com/ story/ 4653	5
45	Oct 31, 2007 @ 06:55	http: / / sphinn.com/ story/ 11614	5
46	Aug 13, 2007 @ 03:06	http: / / sphinn.com/ story/ 2764/ editcomment/ 4018	4
47	Aug 23, 2007 @ 07:52	http: / / sphinn.com/ story.php? id= 3593	4
48	Sep 20, 2007 @ 06:21	http: / / sphinn.com/ published/ page/ 1	4
49	Oct 23, 2007 @ 15:01	http: / / sphinn.com/ story/ 748	3
50	Jul 29, 2007 @ 10:47	http: / / sphinn.com/ story/ title/ Google- launched- a- free- ranking- checker	3
51	Sep 30, 2007 @ 21:13	http: / / sphinn.com/ category/ Google/ parent_ name/ Google	3
52	Aug 25, 2007 @ 04:47	http: / / sphinn.com/ story.php? id= 3735	3
53	Sep 15, 2007 @ 11:28	http: / / sphinn.com/ story.php? id= 5648	3
54	Sep 29, 2007 @ 01:35	http: / / sphinn.com/ story/ 7058	3
55	Oct 28, 2007 @ 22:56	http: / / sphinn.com/ greatesthits	3
56	Oct 23, 2007 @ 04:44	http: / / sphinn.com/ story/ 10380	3
57	Oct 27, 2007 @ 04:10	http: / / sphinn.com/ story/ 11233	3
58	Jul 13, 2007 @ 04:23	Google Search: http: / / sphinn.com	2
59	Jul 21, 2007 @ 03:19	http: / / sphinn.com/ story.php? id= 849	2
60	Jul 27, 2007 @ 10:06	http: / / sphinn.com/ story.php? id= 1447	2
61	Jul 30, 2007 @ 20:09	http: / / sphinn.com/ story.php? id= 1796	2
62	Aug 07, 2007 @ 10:01	http: / / sphinn.com/ published/ page/ 3	2
63	Aug 13, 2007 @ 11:20	http: / / sphinn.com/ story.php? id= 2764	2
64	Sep 05, 2007 @ 05:23	http: / / sphinn.com/ story/ 3735	2
65	Aug 28, 2007 @ 01:56	http: / / sphinn.com/ story.php? id= 3877	2
66	Aug 27, 2007 @ 10:01	http: / / sphinn.com/ submit.php? url= http: / / sebastians- pamphlets.com/ links/ categories	2
67	Aug 31, 2007 @ 14:13	http: / / sphinn.com/ story.php? id= 4335	2
68	Sep 02, 2007 @ 14:29	http: / / sphinn.com/ story.php? id= 1622	2
69	Sep 08, 2007 @ 19:48	http: / / sphinn.com/ story.php? id= 4548	2
70	Sep 05, 2007 @ 01:07	http: / / sphinn.com/ submit.php? url= http: / / sebastians- pamphlets.com/ why- ebay- and- wikipedia- rule- googles- serps	2
71	Sep 06, 2007 @ 13:22	http: / / sphinn.com/ published/ page/ 4	2
72	Sep 16, 2007 @ 13:30	http: / / sphinn.com/ story.php? id= 3783	2
73	Sep 18, 2007 @ 11:55	http: / / sphinn.com/ story.php? id= 5973	2
74	Sep 19, 2007 @ 08:15	http: / / sphinn.com/ story.php? id= 6122	2
75	Sep 19, 2007 @ 14:37	http: / / sphinn.com/ story.php? id= 6124	2
76	Oct 23, 2007 @ 00:07	http: / / sphinn.com/ story/ 10387	2
77	Jul 16, 2007 @ 18:21	http: / / sphinn.com/ upcoming/ category/ AllCategories/ parent_ name/ All Categories	1
78	Jul 19, 2007 @ 20:19	http: / / sphinn.com/ story/ 864	1
79	Jul 20, 2007 @ 15:57	http: / / sphinn.com/ story/ title/ Buy- Viagra- from- Reddit	1
80	Jul 27, 2007 @ 10:48	http: / / sphinn.com/ story/ title/ Blogger- to- rule- search- engine- visibility	1
81	Jul 31, 2007 @ 06:07	http: / / sphinn.com/ story/ title/ The- Unavailable- After- tag- is- totally- and- utterly- useless	1
82	Aug 02, 2007 @ 14:45	http: / / sphinn.com/ user/ view/ history/ login/ Sebastian	1
83	Aug 03, 2007 @ 10:59	http: / / sphinn.com/ story.php? id= 1976	1
84	Aug 06, 2007 @ 03:59	http: / / sphinn.com/ user/ view/ commented/ login/ Sebastian	1
85	Aug 15, 2007 @ 08:27	http: / / sphinn.com/ category/ LinkBuilding	1
86	Aug 15, 2007 @ 14:17	http: / / sphinn.com/ story/ 2764/ editcomment/ 4362	1
87	Aug 28, 2007 @ 13:42	http: / / sphinn.com/ story/ 849	1
88	Sep 09, 2007 @ 15:15	http: / / sphinn.com/ user/ view/ commented/ login/ flyingrose	1
89	Sep 10, 2007 @ 05:15	http: / / sphinn.com/ published/ page/ 20	1
90	Sep 10, 2007 @ 05:55	http: / / sphinn.com/ published/ page/ 19	1
91	Sep 11, 2007 @ 12:22	http: / / sphinn.com/ published/ page/ 8	1
92	Sep 11, 2007 @ 23:13	http: / / sphinn.com/ category/ Blogging	1
93	Sep 12, 2007 @ 09:04	http: / / sphinn.com/ story.php? id= 5362	1
94	Sep 13, 2007 @ 06:36	http: / / sphinn.com/ category/ GoogleSEO/ parent_ name/ Google	1
95	Sep 14, 2007 @ 08:21	http: / / hwww.sphinn.com	1
96	Sep 16, 2007 @ 14:52	http: / / sphinn.com/ GoogleSEO/ Did- Matt- Cutts- by- accident- reveal- a- sure- fire- procedure- to- identify- supplemental- results	1
97	Sep 18, 2007 @ 08:05	http: / / sphinn.com/ story/ 5721	1
98	Sep 18, 2007 @ 09:08	http: / / sphinn.com/ story/ title/ If- yoursquore- not- an- Amway- millionaire- avoid- BlogRush- like- the- plague	1
99	Sep 18, 2007 @ 10:02	http: / / sphinn.com/ story/ 5973#wholecomment8559	1
100	Sep 19, 2007 @ 11:48	http: / / sphinn.com/ user/ view/ voted/ login/ bhancock	1
101	Sep 19, 2007 @ 20:27	http: / / sphinn.com/ published/ page/ 5	1
102	Sep 20, 2007 @ 00:39	http: / / blogmarks.net/ my/ marks,new? title= How to get the perfect logo for your blog& url= http: / / sebastians- pamphlets.com/ how- to- get- the- perfect- logo- for- your- blog/ & summary= & via= http: / / sphinn.com/ story/ 6122	1
103	Sep 20, 2007 @ 01:34	http: / / sphinn.com/ user/ page/ 3/ voted/ Wiep	1
104	Sep 24, 2007 @ 15:49	http: / / sphinn.com/ greatesthits/ page/ 3	1
105	Sep 24, 2007 @ 19:51	http: / / sphinn.com/ story.php? id= 6761	1
106	Sep 24, 2007 @ 22:32	http: / / sphinn.com/ greatesthits/ page/ 2	1
107	Sep 26, 2007 @ 15:13	http: / / sphinn.com/ story.php? id= 7170	1
108	Sep 29, 2007 @ 05:27	http: / / sphinn.com/ category/ SphinnZone	1
109	Oct 09, 2007 @ 11:44	http: / / sphinn.com/ story.php? id= 8883	1
110	Oct 10, 2007 @ 10:04	http: / / sphinn.com/ published/ month	1
111	Oct 24, 2007 @ 15:07	http: / / sphinn.com/ story.php? id= 10881	1
112	Oct 26, 2007 @ 09:53	http: / / sphinn.com/ story.php? id= 11205	1
113	Oct 30, 2007 @ 08:58	http: / / sphinn.com/ upcoming/ page/ 3	1
114	Oct 30, 2007 @ 12:31	http: / / sphinn.com/ upcoming/ most	1
Total			3,688

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

21 comments Sebastian | Internet Marketing, Social Web, Spam, Crap

BlogRush amoebas ban high quality blogs in favor of crap

Posted on 22 October, 2007

Whilst blogs like The Tampon Blog are considered “high quality” by clueless amoebas hired by BlogRush, many great blogs like Tamar’s were banned by the Reeve gang.

In my book that qualifies BlogRush as a full blown scam. If it’s not a scam, it’s an amateurish operation intended to hoodwink bloggers at least. Hiring low-life surfers for 12 bucks per hour to judge the quality of blogs talking about topics the average assclown on BlogRush’s payroll cannot understand is ridiculous, if not a sign of criminal intent. Here is how they hire their amoebas:

We’re looking to hire a bunch of people that would like to earn some extra cash. If you or someone you know might be interested, please forward this message to them. This would be perfect for a stay-at-home mom, college student, or anyone else looking to make some extra money.

All that’s required is sitting in front of their computer and doing the following…

Login to our Review System with an account we will setup for them. There will be a top “frame” control strip that has a few buttons:

“Approve” “Reject” and “Not Sure.”

The bottom frame will automatically load a blog that needs to be reviewed. After reviewing the blog, just press the appropriate button. That’s it.

* We have created a little training video to teach reviewers what to look for and how to decide what gets approved or rejected. It’s very simple.

After pushing one of the buttons the next blog to be reviewed automatically loads in that bottom frame. It’s as simple as that.

Here’s The Deal…

We’re paying USD $12.00/hour for this review work. It’s not a fortune, but it’s a pretty simple task. Heck, just put on some music and sit back and review some blogs. Pretty easy work.

I’m not pissed because they rejected me and lots of other great blogs. I’m not even pissed because they sent emails like

Congratulations! You are receiving this update because your blog has passed our strict Quality Guidelines and criteria — we believe you have a high-quality blog and we are happy you’re a member of our network!

to blogs which didn’t even bother to put up their crappy widget. I’m pissed because they constantly lie and cheat:

We’ve just completed a massive SWEEP of our entire network. We’ve removed over *10,000* blogs (Yes, ten thousand) that did not meet our new Quality Guidelines.

We have done a huge “quality control audit” of our network and have
reviewed all the blogs one-at-a-time. We will continue to review each
NEW blog that is ever submitted to our network.

You will notice the HUGE DIFFERENCE in the quality of blogs that now
appear in your widget. This major *sweep* of our network will also
increase the click-rates across the entire network and you will start
to receive more traffic.

They still do not send any|much traffic to niche blogs, they still get cheated, and they still have tons of crap in their network. They still overpromise and underdeliver. There’s no such thing as a “massive amount of targeted traffic” sent by BlogRush.

The whole BlogRush operation is a scam. Avoid BlogRush like the plague.

BlogRush's pile of crap Update: Here is one of John Reeve’s lame excuses, posted in reply to a “reviewed and dumped by BlogRush idiots” post on John Cow’s blog. A laughable pile of bullcrap, politely put.

John Reese from BlogRush here.

I am not sure why your blog wasn’t approved by the reviewer that reviewed your blog. (We have a team of reviewers.) From what I can tell, your blog passes our guidelines. I’m not sure if the reviewer loaded your blog on a day where your primary post(s) were heavy on the promotional side or not — that’s just a guess of what might have influenced them.

You have my email address from this comment. Please contact me directly (if you wish) and I will investigate the issue for you and see about reactivating your account.

AND FOR THE RECORD…

No one is being BANNED from BlogRush. If any account doesn’t have any approved blogs, the account is moved to an “inactive” status until changes are made or until another blog that meets our guidelines gets approved. Nothing happens to referrals or an account’s referral network; they are left completely intact and as soon as the account is “active” again everything returns to the way it was.

* I just found out that your pingback message was deleted by one of our blog moderators because we don’t want any comments (or pingbacks) showing up for that main post. A few childish users started posting profanity and other garbage that was getting past our filters and we needed to shut it off for now.

There’s no “conspiracy theory” happening. In fact, we’ve been incredibly transparent and honest ever since we launched — openly admitting to mistakes that we’ve made and what we planned to do about them.

~John

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

17 comments Sebastian | Internet Marketing, Blogging, Crap

« Previous Page 1 | 2 | 3 | 4 | 5 | 6 Next Page »

Sebastian’s Pamphlets

Archived posts from the 'Crap' Category

Why storing URLs with truncated trailing slashes is an utterly idiocy

What is a trailing slash?

How to rescue stolen trailing slashes

Why stealing trailing slashes is not cool

Exceptions

How URL canonicalization seems to irritate Technorati

The hacker tool MSN-LiveSearch is responsible for brute force attacks

Google

Yahoo

Ask

MSN LiveSearch

Summary

Sorry Aaron Wall - I fucked up

Avoiding the well known #4 SERP-hero-penalty …

Google’s numbered “penalties”, esp. #6

Dealing with spamming content thieves / plagiarists (oylinki.com)

Content Thieves Veronica Domb et al, P.O.BOX 99800, EmeryVille, 94662, CA are blatant spammers

BlogCatalog needs professional help

MSN spam to continue says the Live Search Blog

Microsoft funding bankrupt Live Search experiment with porn spam

Gaming Sphinn is not worth it

BlogRush amoebas ban high quality blogs in favor of crap

Categories

Monthly Archives

Links

RSS Feeds