Microsoft funding bankrupt Live Search experiment with porn spam

If only this headline would be linkbait … of course it’s not sarcastic.

M$ PORN CASHRumors are out that Microsoft will launch a porn affiliate programm soon. The top secret code name for this project is “pornbucks”, but analysts say that it will be launched as “M$ SMUT CASH” next year or so.

Since Microsoft just can’t ship anything in time, and the usual delays aren’t communicated internally, their search dept. began to promote it to Webmasters this summer.

Surprisingly, Webmasters across the globe weren’t that excited to find promotinal messages from Live Search in their log files, so a somewhat confused MSN dude posted a lame excuse to a large Webmaster forum.

Meanwhile we found out that Microsoft Live Search does not only target the adult entertainment industry, they’re testing the waters with other money terms like travel or pharmaceutic products too.

Anytime soon the Live Search menu bar will be updated to something like this:
Live Search Porn Spam Menu

Here is the sad –but true– story of a search engine’s downfall.

A few months ago Microsoft Live Search discovered that x-rated referrer spam is a must-have technique in a sneaky smut peddlar’s marketing toolbox.

Since August 2007 a bogus Web robot follows Microsoft’s search engine crawler “MSNbot” to spam the referrer logs of all Web sites out there with URLs pointing to MSN search result pages featuring porn.

Read your referrer logs and you’ll find spam from Microsoft too, but perhaps they peeve you with viagra spam, offer you unwanted but cheap payday loans, or try to enlarge your penis. Of course they know every trick in the book on spam, so check for harmless catchwords too. Here is an example URL:
http://search.live.com/results.aspx?q= spammy-keyword &mrt=en-us&FORM=LIVSOP

Microsoft’s spam bot not only leaves bogus URLs in log files, hoping that Webmasters will click them on their referrer stats pages and maybe sign up for something like “M$ Porn Bucks” or so. It downloads and renders even adverts powered by their rival Google, lowering their CTR; obviously to make programs like AdSense less attractive im comparison with Microsoft’s own ads (sorry, no link love from here).

Let’s look at Microsoft’s misleading statement:

The traffic you are seeing is part of a quality check we run on selected pages. While we work on addressing your conerns, we would request that you do not actively block the IP addreses used by this quality check; blocking these IP addresses could prevent your site from being included in the Live Search index.

  • That’s not traffic, that’s bot activity: These hits come within seconds of being indexed by MSNBot. The pattern is like this: the page is requested by MSNBot (which is authenticated, so it’s genuine) and within a few seconds, the very same page is requested with a live.com search result URL as referer by the MSN spam bot faking a human visitor.
  • If that’s really a quality check to detect cloaking, that’s more than just lame. The IP addresses don’t change, the bogus bot uses a static user agent name, and there are other footprints which allow every cloaking script out there to serve this sneaky bot the exact same spider fodder that MSNbot got seconds before. This flawed technique might catch poor man’s cloaking every once in a while, but it can’t fool savvy search marketers.
  • The FUD “could prevent your site from being included in the Live Search index” is laughable, because in most niches MSN search traffic is not existent.

All major search engines, including MSN, promise that they obey the robots exclusion standard. Obeying robots.txt is the holy grail of search engine crawling. A search engine that ignores robots.txt and other normed crawler directives cannot be trusted. The crappy MSN bot not even bothers to read robots.txt, so there’s no chance to block it with standardized methods. Only IP blocking can keep it out, but then it still seems to download ads from Google’s AdSense servers by executing the JavaScript code that the MSN crawler gathered before (not obeying Google’s AdSense robots.txt as well).

This unethical spam bot downloading all images, external CSS and JS files, and whatnot also burns bandwidth. That’s plain theft.

Since this method cannot detect (most) cloaking, and the so called “search quality control bot” doesn’t stop visiting sites which obviously do not cloak, it is a sneaky marketing tool. Whether or not Microsoft Live Search tries to promote cyberspace porn and on-line viagra shops plays no role. Even spamming with safe-at-work keywords is evil. Do these assclowns really believe that such unethical activities will increase the usage of their tiny and pretty unpopular search engine? Of course they do, otherwise they would have shutted down the spam bot months ago.

Dear reader, please tell me: what do you think of a search engine that steals (bandwidth and AdSense revenue), lies, spams away, and is not clever enough to stop their criminal activities when they’re caught?

Recently a Live Search rep whined in an interview because so many robots.txt files out there block their crawler:

One thing that we noticed for example while mining our logs is that there are still a fair number of sites that specifically only allow Googlebot and do not allow MSNBot.

There’s a suitable answer, though. Update your robots.txt:

User-agent: MSNbot
Disallow: /



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Q&A: An undocumented robots.txt crawler directive from Google

What's the fuss about noindex in Google's robots.txt?Blogging should be fun every now and then. Today I don’t tell you anything new about Google’s secret experiments with the robots exclusion protocol. I ask you instead, because I’m sure you know your stuff. Unfortunately, the Q&A on undocumented robots.txt syntax from Google’s labs utilizes JavaScript, so perhaps it looks somewhat weird in your feed reader.

Q: Please look at this robots.txt file and figure out why it’s worth a Q&A with you, my dear reader:


User-Agent: *
Disallow: /
Noindex: /

Ok, click here to show the first hint.

I know, this one was a breeze, so here comes your challenge.
Q: Which crawler directive used in the robots.txt above was introduced 1996 in the Robots Exclusion Protocol (REP), but was not defined in its very first version from 1994?

Ok, click here to show the second hint.

Congrats, you are smart. I’m sure you don’t need to lookup the next answers.
Q: Which major search engine has a team permanently working on REP extensions and releases those quite frequently, and who is the engineer in charge?

Ok, click here to show the third hint.

Exactly. Now we’ve gathered all the pieces of this robots.txt puzzle.
Q: Could you please summarize your cognitions and conclusions?

Ok, click here to show the fourth hint.

Thank you, dear reader! Now lets see what we can dig out. If the appearance of a “Noindex:” directive in robots.txt is an experiment, it would make sense that Ms. Googlebot understands and obeys it. Unfortunetely, I sold all the source code I’ve stolen from Google and didn’t keep a copy for myself, so I need to speculate a little.

Last time I looked, Google’s cool robots.txt validator emulated crawler behavior, that means that the crawlers understood syntax the validator didn’t handle correctly. Maybe this was changed in the meantime, perhaps the validator pulls its code from the “real thing” now, or at least the “Noindex:” experiment may have found its way into the validator’s portfolio. So I thought that testing the newish robots.txt statement “Noindex:” in the Webmaster Console is worth a try. And yes, it told me that Googlebot understands this command, and interprets it as “Disallow:”.
Blocked by line 27: Noindex: /noindex/

Since validation is no proof of crawler behavior, I’ve set up a page “blocked” with a “Noindex:” directive in robots.txt and linked it in my sidebar. The noindex statement was in place long enough before I’ve uploaded and linked the spider trap, so that the engines shouldn’t use a cached robots.txt when they follow my links. My test is public, feel free to check out my robots.txt as well as the crawler log.

While I’m waiting for the expected growth of my noindex crawler log, I’m speculating. Why the heck would Google use a new robots.txt directive which behaves like the good old Disallow: statement? Makes no sense to me.

Lets not forget that this mysterious noindex statement was discovered in the robots.txt of Google’s ad server, not in the better known and closely watched robots.txt of google.com. Google is not the only search engine trying to better understand client sided code. None of the major engines should be interested in crawling ads for ranking purposes. The MSN/LiveSearch referrer spam fiasco demonstrates that search engine bots can fetch and render Google ads outputted in iFrames on pagead2.googlesyndication.com.

Since nobody supports Google’s X-Robots-Tag (sending “noindex” and other REP directives in the HTTP header) until today, maybe the engines have a silent deal that content marked with “Noindex:” in robots.txt shouldn’t be indexed. Microsoft’s bogus spam bot which doesn’t bother with robots.txt because it somewhat hapless tries to emulate a human surfer is not considered a crawler, it’s existence just proves that “software shop” is not a valid label for M$.

This theory has a few weak points, but it could point to something. If noindex in robots.txt really prevents from indexing of contents crawled by accident, or non-HTML contents that can’t supply robots meta tags, that would be a very useful addition to the robots exclusion protocol. Of course we’d then need Noarchive:, Nofollow: and Nopreview: too, probably more but I’m not really in a greedy mood today.

Back to my crawler trap. Refreshing the log reveals that 30 minutes after spreading links pointing to it, Googlebot has fetched the page. That seems to prove that the Noindex: statement doesn’t prevent from crawling, regardless the false (?) information handed out by Google’s robots.txt validator.

(Or didn’t I give Ms. Googlebot enough time to refetch my robots.txt? Dunno. The robots.txt copy in my Google Webmaster Console still doesn’t show the Noindex: statement, but I doubt that’s the version Googlebot uses because according to the last-downloaded timestamp in GWC the robots.txt has been changed at the time of the download. Never mind. If I was way too impatient, I still can test whether a newly discovered noindex directive in robots.txt actually deindexes stuff or not.)

On with the show. The next interesting question is: Will the crawler trap page make it in Google’s search index? Without the possibly non-effective noindex directive a few hundred links should be able to accomplish that. Alas, a quoted search query delivers zilch so far.

Of course I’ve asked Google for more information, but didn’t receive a conclusive answer so far. While waiting for an official statement, I take a break from live blogging this quick research in favor of terrorizing a few folks with respectless blog comments. Stay tuned. Be right back.


Well, meanwhile I had dinner, the kids fell asleep –hopefully until tomorrow morning–, but nothing else happened. A very nice and friendly Googler tries to find out what the noindex in robots.txt fuss is all about, thanks and I can’t wait! However, I suspect the info is either forgotten or deeply buried in some well secured top secret code libraries, hence I’ll push the red button soon.


Thanks to Google’s great Webmaster Central team, especially Susan, I learned that I was flogging a dead horse. Here is Google’s take on Noindex in robots.txt:

As stated in my previous note, I wasn’t aware that we recognized any directives other than Allow/Disallow/Sitemap, so I did some asking around.

Unfortunately, I don’t have an answer that I can currently give you. […] I can’t contribute any clarifications right now.

Thank you Susan!

Update: John Müller from Google has just confirmed that their crawler understands the Noindex: syntax, but it’s not yet set in stone.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Act out your sophisticated affiliate link paranoia

GOOD: paranoid affiliate linkMy recent posts on managing affiliate links and nofollow cloaking paid links led to so many reactions from my readers that I thought explaining possible protection levels could make sense. Google’s request to condomize affiliate links is a bit, well, thin when it comes to technical tips and tricks:

Links purchased for advertising should be designated as such. This can be done in several ways, such as:
* Adding a rel=”nofollow” attribute to the <a> tag
* Redirecting the links to an intermediate page that is blocked from search engines with a robots.txt file

Also, Google doesn’t define paid links that clearly, so try this paid link definition instead before your read on. Here is my linking guide for the paranoid affiliate marketer.

Google recommends hiding of any content provided by affiliate programs from their crawlers. That means not only links and banner ads, so think about tactics to hide content pulled from a merchants data feed too. Linked graphics along with text links, testimonials and whatnot copied from an affiliate program’s sales tools page count as duplicate content (snippet) in its worst occurance.

Pasting code copied from a merchant’s site into a page’s or template’s HTML is not exactly a smart way to put ads. Those ads aren’t manageable nor trackable, and when anything must be changed, editing tons of files is a royal PITA. Even when you’re just running a few ads on your blog, a simple ad management script allows flexible administration of your adverts.

There are tons of such scripts out there, so I don’t post a complete solution, but just the code which saves your ass when a search engine hating your ads and paid links comes by. To keep it simple and stupid my code snippets are mostly taken from this blog, so when you’ve a WordPress blog you can adapt them with ease.

Cover your ass with a linking policy

Googlers as well as hired guns do review Web sites for violations of Google’s guidelines, also competitors might be in the mood to turn you in with a spam report or paid links report. A (prominently linked) full disclosure of your linking attitude can help to pass a human review by search engine staff. By the way, having a policy for dofollowed blog comments is also a good idea.

Since crawler directives like link condoms are for search engines (only), and those pay attention to your source code and hints addressing search engines like robots.txt, you should leave a note there too, look into the source of this page for an example. View sample HTML comment.

Block crawlers from your propaganda scripts

Put all your stuff related to advertising (scripts, images, movies…) in a subdirectory and disallow search engine crawling in your /robots.txt file:
User-agent: *
Disallow: /propaganda/

Of course you’ll use an innocuous name like “gnisitrevda” for this folder, which lacks a default document and can’t get browsed because you’ve a
Options -Indexes

statement in your .htaccess file. (Watch out, Google knows what “gnisitrevda” means, so be creative or cryptic.)

Crawlers sent out by major search engines do respect robots.txt, hence it’s guaranteed that regular spiders don’t fetch it. As long as you don’t cheat too much, you’re not haunted by those legendary anti-webspam bots sneakily accessing your site via AOL proxies or Level3 IPs. A robots.txt block doesn’t prevent you from surfing search engine staff, but I don’t tell you things you’d better hide from Matt’s gang.

Detect search engine crawlers

Basically there are three common methods to detect requests by search engine crawlers.

  1. Testing the user agent name (HTTP_USER_AGENT) for strings like “Googlebot”, “Slurp”, “MSNbot” or so which identify crawlers. That’s easy to spoof, for example PrefBar for FireFox lets you choose from a list of user agents.
  2. Checking the user agent name, and only when it indicates a crawler, verifying the requestor’s IP address with a reverse lookup, respectively against a cache of verified crawler IP addresses and host names.
  3. Maintaining a list of all search engine crawler IP addresses known to man, checking the requestor’s IP (REMOTE_ADDR) against this list. (That alone isn’t bullet-proof, but I’m not going to write a tutorial on industrial-strength cloaking IP delivery, I leave that to the real experts.)

For our purposes we use method 1) and 2). When it comes to outputting ads or other paid links, checking the user agent is save enough. Also, this allows your business partners to evaluate your linkage using a crawler as user agent name. Some affiliate programs won’t activate your account without testing your links. When crawlers try to follow affiliate links on the other hand, you need to verify their IP addresses for two reasons. First, you should be able to upsell spoofing users too. Second, if you allow crawlers to follow your affiliate links, this may have impact on the merchants’ search engine rankings, and that’s evil in Google’s eyes.

We use two PHP functions to detect search engine crawlers. checkCrawlerUA() returns TRUE and sets an expected crawler host name, if the user agent name identifies a major search engine’s spider, or FALSE otherwise. checkCrawlerIP($string) verifies the requestor’s IP address and returns TRUE if the user agent is indeed a crawler, or FALSE otherwise. checkCrawlerIP() does a primitive caching in a flat file, so that once a crawler was verified on its very first content request, it can be detected from this cache to avoid pretty slow DNS lookups. The input parameter is any string which will make it into the log file. checkCrawlerIP() does not verify an IP address if the user agent string doesn’t match a crawler name.

View|hide PHP code. (If you’ve disabled JavaScript you can’t grab the PHP source code!)

Grab and implement the PHP source, then you can code statements like
$isSpider = checkCrawlerUA ();
...
if ($isSpider) {
$relAttribute = " rel=\"nofollow\" ";
}
...
$affLink = "<a href=\"$affUrl\" $relAttribute>call for action</a>";

or
$isSpider = checkCrawlerIP ($sponsorUrl);
...
if ($isSpider) {
// don't redirect to the sponsor, return a 403 or 410 instead
}

More on that later.

Don’t deliver your advertising to search engine crawlers

It’s possible to serve totally clean pages to crawlers, that is without any advertising, not even JavaScript ads like AdSense’s script calls. Whether you go that far or not depends on the grade of your paranoia. Suppressing ads on a (thin|sheer) affiliate site can make sense. Bear in mind that hiding all promotional links and related content can’t guarantee indexing, because Google doesn’t index shitloads of templated pages witch hide duplicate content as well as ads from crawling, without carrying a single piece of somewhat compelling content.

Here is how you could output a totally uncrawlable banner ad:
...
$isSpider = checkCrawlerIP ($PHP_SELF);
...
print "<div class=\"css-class-sidebar robots-nocontent\">";
// output RSS buttons or so
if (!$isSpider) {
print "<script type=\"text/javascript\" src=\"http://sebastians-pamphlets.com/propaganda/output.js.php? adName=seobook&adServed=banner\"></script>";
...
}
...
print "</div>\n";
...

Lets look at the code above. First we detect crawlers “without doubt” (well, in some rare cases it can still happen that a suspected Yahoo crawler comes from a non-’.crawl.yahoo.net’ host but another IP owned by Yahoo, Inktomi, Altavista or AllTheWeb/FAST, and I’ve seen similar reports of such misbehavior for other engines too, but that might have been employees surfing with a crawler-UA).

Currently the robots-nocontent  class name in the DIV is not supported by Google, MSN and Ask, but it tells Yahoo that everything in this DIV shall not be used for ranking purposes. That doesn’t conflict with class names used with your CSS, because each X/HTML element can have an unlimited list of space delimited class names. Like Google’s section targeting that’s a crappy crawler directive, though. However, it doesn’t hurt to make use of this Yahoo feature with all sorts of screen real estate that is not relevant for search engine ranking algos, for example RSS links (use autodetect and pings to submit), “buy now”/”view basket” links or references to TOS pages and alike, templated text like terms of delivery (but not the street address provided for local search) … and of course ads.

Ads aren’t outputted when a crawler requests a page. Of course that’s cloaking, but unless the united search engine geeks come out with a standardized procedure to handle code and contents which aren’t relevant for indexing that’s not deceitful cloaking in my opinion. Interestingly, in many cases cloaking is the last weapon in a webmaster’s arsenal that s/he can fire up to comply to search engine rules when everything else fails, because the crawlers behave more and more like browsers.

Delivering user specific contents in general is fine with the engines, for example geo targeting, profile/logout links, or buddy lists shown to registered users only and stuff like that, aren’t penalized. Since Web robots can’t pull out the plastic, there’s no reason to serve them ads just to waste bandwidth. In some cases search engines even require cloaking, for example to prevent their crawlers from fetching URLs with tracking variables and unavoidable duplicate content. (Example from Google: “Allow search bots to crawl your sites without session IDs or arguments that track their path through the site” is a call for search engine friendly URL cloaking.)

Is hiding ads from crawlers “safe with Google” or not?

BAD: uncloaked affiliate linkCloaking ads away is a double edged sword from a search engine’s perspective. Way too strictly interpreted that’s against the cloaking rule which states “don’t show crawlers other content than humans”, and search engines like to be aware of advertising in order to rank estimated user experiences algorithmically. On the other hand they provide us with mechanisms (Google’s section targeting or Yahoo’s robots-nocontent class name) to disable such page areas for ranking purposes, and they code their own ads in a way that crawlers don’t count them as on-the-page contents.

Although Google says that AdSense text link ads are content too, they ignore their textual contents in ranking algos. Actually, their crawlers and indexers don’t render them, they just notice the number of script calls and their placement (at least if above the fold) to identify MFA pages. In general, they ignore ads as well as other content outputted with client sided scripts or hybrid technologies like AJAX, at least when it comes to rankings.

Since in theory the contents of JavaScript ads aren’t considered food for rankings, cloaking them completely away (supressing the JS code when a crawler fetches the page) can’t be wrong. Of course these script calls as well as on-page JS code are a ranking factors. Google possibly counts ads, maybe calculates even ratios like screen size used for advertising etc. vs. space used for content presentation to determine whether a particular page provides a good surfing experience for their users or not, but they can’t argue seriously that hiding such tiny signals –which they use for the sole purposes of possible downranks– is against their guidelines.

For ages search engines reps used to encourage webmasters to obfuscate all sorts of stuff they want to hide from crawlers, like commercial links or redundant snippets, by linking/outputting with JavaScript instead of crawlable X/HTML code. Just because their crawlers evolve, that doesn’t mean that they can take back this advice. All this JS stuff is out there, on gazillions of sites, often on pages which will never be edited again.

Dear search engines, if it does not count, then you cannot demand to keep it crawlable. Well, a few super mega white hat trolls might disagree, and depending on the implementation on individual sites maybe hiding ads isn’t totally riskless in any case, so decide yourself. I just cloak machine-readable disclosures because crawler directives are not for humans, but don’t try to hide the fact that I run ads on this blog.

Usually I don’t argue with fair vs. unfair, because we talk about war business here, what means that everything goes. However, Google does everything to talk the whole Internet into obfuscating disclosing ads with link condoms of any kind, and they take a lot of flak for such campaigns, hence I doubt they would cry foul today when webmasters hide both client sided as well as server sided delivery of advertising from their crawlers. Penalizing for delivery of sheer contents would be unfair. ;) (Of course that’s stuff for a great debate. If Google decides that hiding ads from spiders is evil, they will react and don’t care about bad press. So please don’t take my opinion as professional advice. I might change my mind tomorrow, because actually I can imagine why Google might raise their eyebrows over such statements.)

Outputting ads with JavaScript, preferably in iFrames

Delivering adverts with JavaScript does not mean that one can’t use server sided scripting to adjust them dynamically. With content management systems it’s not always possible to use PHP or so. In WordPress for example, PHP is executable in templates, posts and pages (requires a plugin), but not in sidebar widgets. A piece of JavaScript on the other hand works (nearly) everywhere, as long as it doesn’t come with single quotes (WordPress escapes them for storage in its MySQL database, and then fails to output them properly, that is single quotes are converted to fancy symbols which break eval’ing the PHP code).

Lets see how that works. Here is a banner ad created with a PHP script and delivered via JavaScript:

And here is the JS call of the PHP script:
<script type="text/javascript" src="http://sebastians-pamphlets.com/propaganda/output.js.php? adName=seobook&adServed=banner"></script>

The PHP script /propaganda/output.js.php evaluates the query string to pull the requested ad’s components. In case it’s expired (e.g. promotions of conferences, affiliate program went belly up or so) it looks for an alternative (there are tons of neat ways to deliver different ads dependent on the requestor’s location and whatnot, but that’s not the point here, hence the lack of more examples). Then it checks whether the requestor is a crawler. If the user agent indicates a spider, it adds rel=nofollow to the ad’s links. Once the HTML code is ready, it outputs a JavaScript statement:
document.write(‘<a href="http://sebastians-pamphlets.com/propaganda/router.php? adName=seobook&adServed=banner" title="DOWNLOAD THE BOOK ON SEO!"><img src="http://sebastians-pamphlets.com/propaganda/seobook/468-60.gif" width="468" height="60" border="0" alt="The only current book on SEO" title="The only current book on SEO" /></a>’);
which the browser executes within the script tags (replace single quotes in the HTML code with double quotes). A static ad for surfers using ancient browsers goes into the noscript tag.

Matt Cutts said that JavaScript links don’t prevent Googlebot from crawling, but that those links don’t count for rankings (not long ago I read a more recent quote from Matt where he stated that this is future-proof, but I can’t find the link right now). We know that Google can interpret internal and external JavaScript code, as long as it’s fetchable by crawlers, so I wouldn’t say that delivering advertising with client sided technologies like JavaScript or Flash is a bullet-proof procedure to hide ads from Google, and the same goes for other major engines. That’s why I use rel-nofollow –on crawler requests– even in JS ads.

Change your user agent name to Googlebot or so, install Matt’s show nofollow hack or something similar, and you’ll see that the affiliate-URL gets nofollow’ed for crawlers. The dotted border in firebrick is extremely ugly, detecting condomized links this way is pretty popular, and I want to serve nice looking pages, thus I really can’t offend my readers with nofollow’ed links (although I don’t care about crawler spoofing, actually that’s a good procedure to let advertisers check out my linking attitude).

We look at the affiliate URL from the code above later on, first lets discuss other ways to make ads more search engine friendly. Search engines don’t count pages displayed in iFrames as on-page contents, especially not when the iFrame’s content is hosted on another domain. Here is an example straight from the horse’s mouth:
<iframe name="google_ads_frame" src="http://pagead2.googlesyndication.com/pagead/ads? very-long-and-ugly-query-string" marginwidth="0" marginheight="0" vspace="0" hspace="0" allowtransparency="true" frameborder="0" height="90" scrolling="no" width="728"></iframe>
In a noframes tag we could put a static ad for surfers using browsers which don’t support frames/iFrames.

If for some reasons you don’t want to detect crawlers, or it makes sound sense to hide ads from other Web robots too, you could encode your JavaScript ads. This way you deliver totally and utterly useless gibberish to anybody, and just browsers requesting a page will render the ads. Example: any sort of text or html block that you would like to encrypt and hide from snoops, scrapers, parasites, or bots, can be run through Michael’s Full Text/HTML Obfuscator Tool (hat tip to Donna).

Always redirect to affiliate URLs

There’s absolutely no point in using ugly affiliate URLs on your pages. Actually, that’s the last thing you want to do for various reasons.

  • For example, affiliate URLs as well as source codes can change, and you don’t want to edit tons of pages if that happens.
  • When an affiliate program doesn’t work for you, goes belly up or bans you, you need to route all clicks to another destination when the shit hits the fan. In an ideal world, you’d replace outdated ads completely with one mouse click or so.
  • Tracking ad clicks is no fun when you need to pull your stats from various sites, all of them in another time zone, using their own –often confusing– layouts, providing different views on your data, and delivering program specific interpretations of impressions or click throughs. Also, if you don’t track your outgoing traffic, some sponsors will cheat and you can’t prove your gut feelings.
  • Scrapers can steal revenue by replacing affiliate codes in URLs, but may overlook hard coded absolute URLs which don’t smell like affiliate URLs.

When you replace all affiliate URLs with the URL of a smart redirect script on one of your domains, you can really manage your affiliate links. There are many more good reasons for utilizing ad-servers, for example smart search engines which might think that your advertising is overwhelming.

Affiliate links provide great footprints. Unique URL parts respectively query string variable names gathered by Google from all affiliate programs out there are one clear signal they use to identify affiliate links. The values identify the single affiliate marketer. Google loves to identify networks of ((thin) affiliate) sites by affiliate IDs. That does not mean that Google detects each and every affiliate link at the time of the very first fetch by Ms. Googlebot and the possibly following indexing. Processes identifying pages with (many) affiliate links and sites plastered with ads instead of unique contents can run afterwords, utilizing a well indexed database of links and linking patterns, reporting the findings to the search index respectively delivering minus points to the query engine. Also, that doesn’t mean that affiliate URLs are the one and only trackable footmark Google relies on. But that’s one trackable footprint you can avoid to some degree.

If the redirect-script’s location is on the same server (in fact it’s not thanks to symlinks) and not named “adserver” or so, chances are that a heuristic check won’t identify the link’s intent as promotional. Of course statistical methods can discover your affiliate links by analyzing patterns, but those might be similar to patterns which have nothing to do with advertising, for example click tracking of editorial votes, links to contact pages which aren’t crawlable with paramaters, or similar “legit” stuff. However, you can’t fool smart algos forever, but if you’ve a good reason to hide ads every little might help. Of course, providing lots of great contents countervails lots of ads (from a search engine’s point of view, and users might agree on this).

Besides all these (pseudo) black hat thoughts and reasoning, there is a way more important advantage of redirecting links to sponsors: blocking crawlers. Yup, search engine crawlers must not follow affiliate URLs, because it doesn’t benefit you (usually). Actually, every affiliate link is a useless PageRank leak. Why should you boost the merchants search engine rankings? Better take care of your own rankings by hiding such outgoing links from crawlers, and stopping crawlers before they spot the redirect, if they by accident found an affiliate link without link condom.

The behavior of an adserver URL masking an affiliate link

Lets look at the redirect-script’s URL from my code example above:
/propaganda/router.php?adName=seobook&adServed=banner
On request of router.php the $adName variable identifies the affiliate link, $adServed tells which sort/type/variation of ad was clicked, and all that gets stored with a timestamp under title and URL of the page carrying the advert.

Now that we’ve covered the statistical requirements, router.php calls the checkCrawlerIP() function setting $isSpider to TRUE only when both the user agent as well as the host name of the requestor’s IP address identify a search engine crawler, and a reverse DNS lookup equals the requestor’s IP addy.

If the requestor is not a verified crawler, router.php does a 307 redirect to the sponsor’s landing page:
$sponsorUrl = "http://www.seobook.com/262.html";
$requestProtocol = $_SERVER["SERVER_PROTOCOL"];
$protocolArr = explode("/",$requestProtocol);
$protocolName = trim($protocolArr[0]);
$protocolVersion = trim($protocolArr[1]);
if (stristr($protocolName,"HTTP")
&& strtolower($protocolVersion) > "1.0" ) {
$httpStatusCode = 307;
}
else {
$httpStatusCode = 302;
}
$httpStatusLine = "$requestProtocol $httpStatusCode Temporary Redirect";
@header($httpStatusLine, TRUE, $httpStatusCode);
@header("Location: $sponsorUrl");
exit;

A 307 redirect avoids caching issues, because 307 redirects must not be cached by the user agent. That means that changes of sponsor URLs take effect immediately, even when the user agent has cached the destination page from a previous redirect. If the request came in via HTTP/1.0, we must perform a 302 redirect, because the 307 response code was introduced with HTTP/1.1 and some older user agents might not be able to handle 307 redirects properly. User agents can cache the locations provided by 302 redirects, so possibly when they run into a page known to redirect, they might request the outdated location. For obvious reasons we can’t use the 301 response code, because 301 redirects are always cachable. (More information on HTTP redirects.)

If the requestor is a major search engine’s crawler, we perform the most brutal bounce back known to man:
if ($isSpider) {
@header("HTTP/1.1 403 Sorry Crawlers Not Allowed", TRUE, 403);
@header("X-Robots-Tag: nofollow,noindex,noarchive");
exit;
}

The 403 response code translates to “kiss my ass and get the fuck outta here”. The X-Robots-Tag in the HTTP header instructs crawlers that the requested URL must not be indexed, doesn’t provide links the poor beast could follow, and must not be publically cached by search engines. In other words the HTTP header tells the search engine “forget this URL, don’t request it again”. Of course we could use the 410 response code instead, which tells the requestor that a resource is irrevocably dead, gone, vanished, non-existent, and further requests are forbidden. Both the 403-Forbidden response as well as the 410-Gone return code prevent you from URL-only listings on the SERPs (once the URL was crawled). Personally, I prefer the 403 response, because it perfectly and unmistakably expresses my opinion on this sort of search engine guidelines, although currently nobody except Google understands or supports X-Robots-Tags in HTTP headers.

If you don’t use URLs provided by affiliate programs, your affiliate links can never influence search engine rankings, hence the engines are happy because you did their job so obedient. Not that they otherwise would count (most of) your affiliate links for rankings, but forcing you to castrate your links yourself makes their life much easier, and you don’t need to live in fear of penalties.

NICE: prospering affiliate linkBefore you output a page carrying ads, paid links, or other selfish links with commercial intent, check if the requestor is a search engine crawler, and act accordingly.

Don’t deliver different (editorial) contents to users and crawlers, but also don’t serve ads to crawlers. They just don’t buy your eBook or whatever you sell, unless a search engine sends out Web robots with credit cards able to understand Ajax, respectively authorized to fill out and submit Web forms.

Your ads look plain ugly with dotted borders in firebrick, hence don’t apply rel=”nofollow” to links when the requestor is not a search engine crawler. The engines are happy with machine-readable disclosures, and you can discuss everything else with the FTC yourself.

No nay never use links or content provided by affiliate programs on your pages. Encapsulate this kind of content delivery in AdServers.

Do not allow search engine crawlers to follow your affiliate links, paid links, nor other disliked votes as per search engine guidelines. Of course condomizing such links is not your responsibility, but getting penalized for not doing Google’s job is not exactly funny.

I admit that some of the stuff above is for extremely paranoid folks only, but knowing how to be paranoid might prevent you from making silly mistakes. Just because you believe that you’re not paranoid, that does not mean Google will not chase you down. You really don’t need to be a so called black hat to displease Google. Not knowing respectively not understanding Google’s 12 commandments doesn’t prevent you from being spanked for sins you’ve never heard of. If you’re keen on Google’s nicely targeted traffic, better play by Google’s rules, leastwise on creawler requests.

Feel free to contribute your tips and tricks in the comments.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Internet marketing is one big popularity contest, and that’s not a good thing

SMO - Social Media OptimizationThis is a guest post by Tanner Christensen.

What are you doing to make Internet marketing a better industry to be a part of? As it sits now: Internet marketing is one big popularity contest, and that’s not a good thing. Internet marketers are making it nearly impossible for the average person to find valuable content.

The real online content providers - the websites who deserve all of your attention - are becoming harder and harder to discover because of Internet marketers like us. Though Internet marketers - both you and I - can’t really be blamed, our job is all about getting attention. The more attention we get for our website(s), the more popular our website(s) become, the more money we can make.

But because of the recent surge of interest in Internet marketing and search engine optimization, websites that focus on providing content - rather than getting attention - are being ignored. And because these content-focused websites are being cast into the shadows of attention-focused websites, they too are jumping on the Internet marketing popularity contest bandwagon.

Even though every webmaster and his or her mother is jumping on the bandwagon, it’s not accurate to say that Internet marketers are making all less-important, less-helpful, and less-useful websites more popular than really helpful website, but there is definitely the possibility of real news and information being masked by attention-seeking content.

So what do we do? What do Internet marketers and search engine optimizers do to make sure that the Internet popularity contest doesn’t become a contest of lies and attention-seeking tactics; but rather a contest of quality, helpful, interesting, important, groundbreaking content?

The first step is to become a part of the online community. I’m not talking about the Internet marketing community - it’s biased in a lot of ways. I’m talking about the real online communities. Doing so will help create a universal feeling of online morals; or what’s good information and what is bad information.

And discovering where the real helpful and important websites are online will help Internet marketers such as ourselves learn where the websites we work with really should be ranked.

Sure, there are still those people who don’t care about quality of content and only care about the all-mighty dollar sign. But poor-content will eventually catch up with them, when websites that really deserve attention in the online popularity contest are lost in the fold and the dollar sign loses it’s value.

Tanner is a Web specialist and designer who writes helpful, inspiring, and creative internet-related articles. A while ago I’ve contributed an article to his blog Internet Hunger: The anatomy of a debunking post. I think “can agessive SMO tactics push crap on the long haul” would be an interesting, and related discussion. I mean, search engines evolve too, not only in Web search, so kinda fair rankings of well linked crap as well as good stuff not on the SM radar might be possible to some extent.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Text link broker woes: Google’s smart paid link sniffers

Google's smart paid link sniffer at workAfter the recent toolbar PageRank massacre link brokers are in the spotlight. One of them, TNX beta1, asked me to post a paid review of their service. It took a while to explain that nobody can buy a sales pitch here. I offered to write a pitiless honest review for a low hourly fee, provided a sample on their request, but got no order or payment yet. Never mind. Since the topic is hot, here’s my review, paid or not.

So what does TNX offer? Basically it’s a semi-automated link exchange where everybody can sign up to sell and/or purchase text links. TNX takes 25% commission, 12.5% from the publisher, and 12.5% from the advertiser. They calculate the prices based on Google’s toolbar PageRank and link popularity pulled from Yahoo. For example a site putting five blocks of four links each on one page with toolbar PageRank 4/10 and four pages with a toolbar PR 3/10 will earn $46.80 monthly.

TNX provides a tool to vary the links, so that when an advertiser purchases for example 100 links it’s possible to output those in 100 variations of anchor text as well as surrounding text before and after the A element, on possibly 100 different sites. Also TNX has a solution to increase the number of links slowly, so that search engines can’t find a gazillion of uniformed links to a (new) site all of a sudden. Whether or not that’s sufficient to simulate natural link growth remains an unanswered question, because I’ve no access to their algorithm.

Links as well as participating sites are reviewed by TNX staff, and frequently checked with bots. Links shouldn’t appear on pages which aren’t indexed by search engines or viewed by humans, or on 404 pages, pages with long and ugly URLs and such. They don’t accept PPC links or offensive ads.

All links are outputted server sided, what requires PHP or Perl (ASP/ASPX coming soon). There is a cache option, so it’s not necessary to download the links from the TNX servers for each page view. TNX recommends renaming the /cache/ directory to avoid an easily detectable sign for the occurence of TNX paid links on a Web site. Links are stored as plain HTML, besides the target="_blank" attribute there is no obvious footprint or pattern on link level. Example:
Have a website? See this <a href="http://www.example.com" target="_blank">free affiliate program</a>.
Have a blog? Check this <a href="http://www.example.com" target="_blank">affiliate program with high comissions</a> for publishers.

Webmasters can enter any string as delimiter, for example <br /> or “•”:

Have a website? See this free affiliate program. • Have a blog? Check this affiliate program with high comissions for publishers.

Publishers can choose from 17 niches, 7 languages, 5 linkpop levels, and 7 toolbar PageRank values to target their ads.

From the system stats in the members area the service is widely used:

  • As of today [2007-11-06] we have 31,802 users (daily growth: +0.62%)
  • Links in the system: 31,431,380
  • Links created in last hour: 1,616
  • Number of pages indexed by TNX: 37,221,398

Long story short, TNX jumped through many hoops to develop a system which is supposed to trade paid links that are undetectable by search engines. Is that so?

The major weak point is the system’s growth and that its users are humans. Even if such a system would be perfect, users will make mistakes and reveal the whole network to search engines. Here is how Google has identified most if not all of the TNX paid links:

Some Webmasters put their TNX links in sidebars under a label that identifies them as paid links. Google crawled those pages, and stored the link destinations in its paid links database. Also, they devalued at least the labelled links, if not the whole page or even the complete site lost its ability to pass link juice because the few paid links aren’t condomized.

Many Webmasters implemented their TNX links in templates, so that they appear on a large number of pages. Actually, that’s recommended by TNX. Even if the advertisers have used the text variation tool, their URLs appeared multiple times on each site. Google can detect site wide links, even if not each and every link appears on all pages, and flags them accordingly.

Maybe even a few Googlers have signed up and served the TNX links on their personal sites to gather examples, although that wasn’t neccessary because so many Webmasters with URLs in their signatures have told Google in this DP thread that they’ve signed up and at least tested TNX links on their pages.

Next Google compared the anchor text as well as the surrounding text of all flagged links, and found some patterns. Of course putting text before and after the linked anchor text seems to be a smart way to fake a natural link, but in fact Webmasters applied a bullet-proof procedure to outsmart themselves, because with multiple occurences of the same text constellations pointing to an URL, especially when found on unrelated sites (different owners, hosts etc., topically irrelevancy plays no role in this context), paid link detection is a breeze. Linkage like that may be “natural” with regard to patterns like site wide advertising or navigation, but a lookup in Google’s links database revealed that the same text constellations and URLs were found on n  other sites too.

Now that Google had compiled the seed, each and every instance of Googlebot delivered more evidence. It took Google only one crawl cycle to identify most sites carrying TNX links, and all TNX advertisers. Paid link flags from pages on sites with a low crawling frequency were delivered in addition. Meanwhile Google has drawed a comprehensive picture of the whole TNX network.

I’ve developed such a link network many years ago (it’s defunct now). It was successful because only very experienced Webmasters controlling a fair amount of squeaky clean sites were invited. Allowing newbies to participate in such an organized link swindle is the kiss of death, because newbies do make newbie mistakes, and Google makes use of newbie mistakes to catch all participants. By the way, with the capabilities Google has today, my former approach to manipulate rankings with artificial linkage would be detectable with statistical methods similar to the algo outlined above, despite the closed circle of savvy participants.

From reading the various DP threads about TNX as well as their sales pitches, I’ve recognized a very popular misunderstanding of Google’s mentality. Folks are worrying whether an algo can detect the intention of links or not, usually focusing on particular links or linking methods. Google on the other hand looks at the whole crawlable Web. When they develop a paid link detection algo, they have a copy of the known universe to play with, as well as a complete history of each and every hyperlink crawled by Ms. Googlebot since 1998 or so. Naturally, their statistical methods will catch massive artificial linkage first, but fine tuning the sensitivity of paid link sniffers respectively creating variants to cover different linking patterns is no big deal. Of course there is always a way to hide a paid link, but nobody can hide millions of them.

Unfortunately, the unique selling point of the TNX service –that goes for all link brokers by the way– is manipulation of search engine rankings, hence even if they would offer nofollow’ed links to trade traffic instead of PageRank, most probably they would be forced to reduce the prices. Since TNX links are rather cheap, I’m not sure that will pay. It would be a shame when they decide to change the business model but it doesn’t pay for TNX, because the underlying concept is great. It just shouldn’t be used to exchange clean links. All the tricks developed to outsmart Google, like the text variation tool or not putting links on not exactly trafficked pages, are suitable to serve non-repetitive ads (coming with attractive CTRs) to humans.

I’ve asked TNX: I’ve decided to review your service on my blog, regardless whether you pay me or not. The result of my research is that I can’t recommend TNX in its current shape. If you still want a paid review, and/or a quote in the article, I’ve a question: Provided Google has drawn a detailed picture of your complete network, are you ready to switch to nofollow’ed links in order to trade traffic instead of PageRank, possibly with slightly reduced prices? Their answer:

We would be glad to accept your offer of a free review, because we don’t want to pay for a negative review.
Nobody can draw a detailed picture of our network - it’s impossible for one advertiser to buy links from all or a majority sites of our network. Many webmasters choose only relevant advertisers.
We will not switch to nofollow’ed links, but we are planning not to use Google PR for link pricing in the near future - we plan to use our own real-time page-value rank.

Well, it’s not necessary to find one or more links on all sites to identify a network.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

If you suffer from a dead slow FireFox browser …

The FireFox prefs-9999.js pitfall… and you’ve tried all the free-memory-on-minimize tricks out there, you could do a couple of things.

You could try to update to the newest version. If you’re using old extensions like I do, FireFox has already sent a bazillion of security alerts trying to talk you into the update process. Well, you’d get rid of the update alerts, but that doesn’t solve all your problems, because your beloved extensions were deactivated automatically.

A reinstall is brutal, but helps. Unfortunately you’ll lose a lot of stuff.

Waiting until your history.dat file exceeds 12 gigs and session saver forced the creation of prefs-9999.js in your profile is another way to handle the crisis. The prefs-9999.js is the ultimate prefs file, which stores all your settings by the way. Once FireFox creates it, it stops working all of a sudden and cannot be restarted.

I figured out that such a prefs-n.js file is created up to a few times daily, and with every new file FireFox slows down a bit. It is absolutely genial that in times of 64 bit integers the file counter has such a low hard limit. I mean I wouldn’t surf any more in a few weeks if my browser would become much slower.

So I appreciated the heart attack forcing me to look into the issue, deleted all prefs-*.js files but kept prefs.js itself, and now I’ve my fast FireFox1.5 back. I lost a few open tabs and such, because I was too lazy to rename prefs-9999.js to prefs-1.js. Memory allocation right after starting the browser went down to laughable 300 megs and still counting, so I guess there’s some work left when I want to continue surfing with hundreds of tabs in several project specific windows. Sigh.

I’m an old fart and tend to forget such things, hence I post it and next time my browser slows down I just ask you guys on Twitter what to do with it. Thanks in advance for your help.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Gaming Sphinn is not worth it

Thou shalt not spam Sphinn!OMFG, yet another post on Sphinn? Yup. I tell you why gaming Sphinn is counter productive, because I just don’t want to read another whiny rant in the lines of “why do you ignore my stuff whilst A listers [whatever this undefined term means] get their crap sphunn hot in no time”. Also, discussions assuming that success equals bad behavior like this or this one aren’t exactly funny nor useful. As for the whiners: Grow the fuck up and produce outstanding content, then network politely but not obtrusive to promote it. As for the gamers: Think before you ruin your reputation!

What motivates a wannabe Internet marketer to game Sphinn?

Traffic of course, but that’s a myth. Sphinn sends very targeted traffic but also very few visitors (see my stats below).

Free uncondomized links. Ok, that works, one can gain enough link love to get a page indexed by the search engines, but for this purpose it’s not necessary to push the submission to the home page.

Attention is up next. Yep, Sphinn is an eldorado for attention whores, but not everybody is an experienced high-class call girl. Most are amateurs giving it a (first) try, or wrecked hookers pushing too hard to attract positive attention.

The keyword is positive attention. Sphinners are smart, they know every trick in the book. Many of them make a living with gaming creative use of social media. Cheating professional gamblers is a waste of time, and will not produce positive attention. Even worse, the shit sticks at the handle of the unsuccessful cheater (and in many cases the real name). So if you want to burn your reputation, go found a voting club to feed your crap.

Fortunately, getting caught for artificial voting at Sphinn comes with devalued links too. The submitted stories are taken off the list, that means no single link at Sphinn (besides profile pages) feeds them any more, hence search engines forget them. Instead of a good link from an unpopular submission you get zilch when you try to cheat your way to the popular links pages.

Although Sphinn doesn’t send shitloads of traffic, this traffic is extremely valuable. Many spinners operate or control blogs and tend to link to outstanding articles they found at Sphinn. Many sphinners have accounts on other SM sites too, and bookmark/cross-submit good content. It’s not unusual that 10 visits from Sphinn result in hundreds or even thousands of hits from StumbleUpon & Co. — but spinners don’t bookmark/blog/cross-submit/stumble crap.

So either write great content and play by the rules, or get nowhere with your crappy submission. The first “10 reasons why 10 tricks posts about 10 great tips to write 10 numbered lists” submission was fun. The 10,000 plagiarisms following were just boring noise. Nobody except your buddies or vote bots sphinn crap like that, so don’t bother to provide the community with footprints of your lousy gaming.

If you’re playing number games, here is why ruining a reputation by gaming Sphinn is not worth it. Look at my visitor stats from July to today. I got 3.6k referrers in 4 months from Sphinn because a few of my posts went hot. When a post sticks with 1-5 votes, you won’t attract much more click throughs than from those 1-5 folks who sphunn it (that would give 100-200 hits or so with the same amount of submissions). When you cheat, the story gets buried and you get nothing but flames. Think about that. Thanks.

Rank Last Date/Time Referral Site Count
1 Oct 09, 2007 @ 23:29 http: / / sphinn.com/ story/ 1622 504
2 Oct 23, 2007 @ 14:53 http: / / sphinn.com/ story/ 2764 419
3 Nov 01, 2007 @ 03:42 http: / / sphinn.com 293
4 Oct 08, 2007 @ 04:21 http: / / sphinn.com/ story/ 5469 288
5 Nov 02, 2007 @ 13:35 http: / / sphinn.com/ story/ 8883 192
6 Oct 09, 2007 @ 23:38 http: / / sphinn.com/ story/ 4335 185
7 Oct 22, 2007 @ 23:55 http: / / sphinn.com/ story/ 5362 139
8 Oct 29, 2007 @ 15:02 http: / / sphinn.com/ upcoming 131
9 Nov 02, 2007 @ 13:34 http: / / sphinn.com/ story/ 7170 131
10 Sep 10, 2007 @ 09:09 http: / / sphinn.com/ story/ 1976 116
11 Oct 15, 2007 @ 22:40 http: / / sphinn.com/ story/ 6122 113
12 Sep 22, 2007 @ 13:39 http: / / sphinn.com/ story/ 3593 90
13 Oct 05, 2007 @ 21:56 http: / / sphinn.com/ story/ 5648 87
14 Sep 22, 2007 @ 13:25 http: / / sphinn.com/ story/ 4072 80
15 Oct 14, 2007 @ 17:24 http: / / sphinn.com/ story/ 5973 77
16 Aug 30, 2007 @ 04:17 http: / / sphinn.com/ story/ 1796 72
17 Oct 16, 2007 @ 05:46 http: / / sphinn.com/ story/ 6761 61
18 Oct 11, 2007 @ 05:56 http: / / sphinn.com/ story/ 1447 60
19 Sep 13, 2007 @ 12:27 http: / / sphinn.com/ story/ 4548 54
20 Nov 02, 2007 @ 22:14 http: / / sphinn.com/ story/ 11547 53
21 Sep 03, 2007 @ 09:34 http: / / sphinn.com/ story/ 4068 44
22 Oct 09, 2007 @ 23:40 http: / / sphinn.com/ story/ 5093 42
23 Nov 02, 2007 @ 01:46 http: / / sphinn.com/ story/ 248 41
24 Sep 14, 2007 @ 05:58 http: / / sphinn.com/ story/ 2287 36
25 Oct 31, 2007 @ 06:17 http: / / sphinn.com/ story/ 11205 35
26 Oct 07, 2007 @ 12:07 http: / / sphinn.com/ story/ 6124 25
27 Nov 01, 2007 @ 09:41 http: / / sphinn.com/ user/ view/ profile/ Sebastian 22
28 Aug 08, 2007 @ 10:52 http: / / sphinn.com/ story/ 245 21
29 Sep 02, 2007 @ 19:17 http: / / sphinn.com/ story/ 3877 17
30 Sep 22, 2007 @ 00:42 http: / / sphinn.com/ story/ 4968 17
31 Oct 01, 2007 @ 12:49 http: / / sphinn.com/ story/ 5310 17
32 Aug 30, 2007 @ 08:20 http: / / sphinn.com/ story/ 4143 14
33 Sep 11, 2007 @ 21:38 http: / / sphinn.com/ story/ 3783 13
34 Nov 01, 2007 @ 15:50 http: / / sphinn.com/ published/ page/ 2 11
35 Sep 01, 2007 @ 23:03 http: / / sphinn.com/ story/ 597 10
36 Oct 24, 2007 @ 18:17 http: / / sphinn.com/ story/ 1767 10
37 Sep 15, 2007 @ 08:26 http: / / sphinn.com/ story.php? id= 5469 8
38 Oct 30, 2007 @ 09:42 http: / / sphinn.com/ upcoming/ mostpopular 7
39 Oct 24, 2007 @ 18:38 http: / / sphinn.com/ story/ 10881 7
40 Oct 30, 2007 @ 01:19 http: / / sphinn.com/ upcoming/ page/ 2 6
41 Sep 20, 2007 @ 07:09 http: / / sphinn.com/ user/ view/ profile/ login/ Sebastian 5
42 Jul 22, 2007 @ 09:39 http: / / sphinn.com/ story/ 1017 5
43 Oct 13, 2007 @ 08:34 http: / / sphinn.com/ published/ week 5
44 Sep 08, 2007 @ 04:17 http: / / sphinn.com/ story/ 4653 5
45 Oct 31, 2007 @ 06:55 http: / / sphinn.com/ story/ 11614 5
46 Aug 13, 2007 @ 03:06 http: / / sphinn.com/ story/ 2764/ editcomment/ 4018 4
47 Aug 23, 2007 @ 07:52 http: / / sphinn.com/ story.php? id= 3593 4
48 Sep 20, 2007 @ 06:21 http: / / sphinn.com/ published/ page/ 1 4
49 Oct 23, 2007 @ 15:01 http: / / sphinn.com/ story/ 748 3
50 Jul 29, 2007 @ 10:47 http: / / sphinn.com/ story/ title/ Google- launched- a- free- ranking- checker 3
51 Sep 30, 2007 @ 21:13 http: / / sphinn.com/ category/ Google/ parent_ name/ Google 3
52 Aug 25, 2007 @ 04:47 http: / / sphinn.com/ story.php? id= 3735 3
53 Sep 15, 2007 @ 11:28 http: / / sphinn.com/ story.php? id= 5648 3
54 Sep 29, 2007 @ 01:35 http: / / sphinn.com/ story/ 7058 3
55 Oct 28, 2007 @ 22:56 http: / / sphinn.com/ greatesthits 3
56 Oct 23, 2007 @ 04:44 http: / / sphinn.com/ story/ 10380 3
57 Oct 27, 2007 @ 04:10 http: / / sphinn.com/ story/ 11233 3
58 Jul 13, 2007 @ 04:23 Google Search: http: / / sphinn.com 2
59 Jul 21, 2007 @ 03:19 http: / / sphinn.com/ story.php? id= 849 2
60 Jul 27, 2007 @ 10:06 http: / / sphinn.com/ story.php? id= 1447 2
61 Jul 30, 2007 @ 20:09 http: / / sphinn.com/ story.php? id= 1796 2
62 Aug 07, 2007 @ 10:01 http: / / sphinn.com/ published/ page/ 3 2
63 Aug 13, 2007 @ 11:20 http: / / sphinn.com/ story.php? id= 2764 2
64 Sep 05, 2007 @ 05:23 http: / / sphinn.com/ story/ 3735 2
65 Aug 28, 2007 @ 01:56 http: / / sphinn.com/ story.php? id= 3877 2
66 Aug 27, 2007 @ 10:01 http: / / sphinn.com/ submit.php? url= http: / / sebastians- pamphlets.com/ links/ categories 2
67 Aug 31, 2007 @ 14:13 http: / / sphinn.com/ story.php? id= 4335 2
68 Sep 02, 2007 @ 14:29 http: / / sphinn.com/ story.php? id= 1622 2
69 Sep 08, 2007 @ 19:48 http: / / sphinn.com/ story.php? id= 4548 2
70 Sep 05, 2007 @ 01:07 http: / / sphinn.com/ submit.php? url= http: / / sebastians- pamphlets.com/ why- ebay- and- wikipedia- rule- googles- serps 2
71 Sep 06, 2007 @ 13:22 http: / / sphinn.com/ published/ page/ 4 2
72 Sep 16, 2007 @ 13:30 http: / / sphinn.com/ story.php? id= 3783 2
73 Sep 18, 2007 @ 11:55 http: / / sphinn.com/ story.php? id= 5973 2
74 Sep 19, 2007 @ 08:15 http: / / sphinn.com/ story.php? id= 6122 2
75 Sep 19, 2007 @ 14:37 http: / / sphinn.com/ story.php? id= 6124 2
76 Oct 23, 2007 @ 00:07 http: / / sphinn.com/ story/ 10387 2
77 Jul 16, 2007 @ 18:21 http: / / sphinn.com/ upcoming/ category/ AllCategories/ parent_ name/ All Categories 1
78 Jul 19, 2007 @ 20:19 http: / / sphinn.com/ story/ 864 1
79 Jul 20, 2007 @ 15:57 http: / / sphinn.com/ story/ title/ Buy- Viagra- from- Reddit 1
80 Jul 27, 2007 @ 10:48 http: / / sphinn.com/ story/ title/ Blogger- to- rule- search- engine- visibility 1
81 Jul 31, 2007 @ 06:07 http: / / sphinn.com/ story/ title/ The- Unavailable- After- tag- is- totally- and- utterly- useless 1
82 Aug 02, 2007 @ 14:45 http: / / sphinn.com/ user/ view/ history/ login/ Sebastian 1
83 Aug 03, 2007 @ 10:59 http: / / sphinn.com/ story.php? id= 1976 1
84 Aug 06, 2007 @ 03:59 http: / / sphinn.com/ user/ view/ commented/ login/ Sebastian 1
85 Aug 15, 2007 @ 08:27 http: / / sphinn.com/ category/ LinkBuilding 1
86 Aug 15, 2007 @ 14:17 http: / / sphinn.com/ story/ 2764/ editcomment/ 4362 1
87 Aug 28, 2007 @ 13:42 http: / / sphinn.com/ story/ 849 1
88 Sep 09, 2007 @ 15:15 http: / / sphinn.com/ user/ view/ commented/ login/ flyingrose 1
89 Sep 10, 2007 @ 05:15 http: / / sphinn.com/ published/ page/ 20 1
90 Sep 10, 2007 @ 05:55 http: / / sphinn.com/ published/ page/ 19 1
91 Sep 11, 2007 @ 12:22 http: / / sphinn.com/ published/ page/ 8 1
92 Sep 11, 2007 @ 23:13 http: / / sphinn.com/ category/ Blogging 1
93 Sep 12, 2007 @ 09:04 http: / / sphinn.com/ story.php? id= 5362 1
94 Sep 13, 2007 @ 06:36 http: / / sphinn.com/ category/ GoogleSEO/ parent_ name/ Google 1
95 Sep 14, 2007 @ 08:21 http: / / hwww.sphinn.com 1
96 Sep 16, 2007 @ 14:52 http: / / sphinn.com/ GoogleSEO/ Did- Matt- Cutts- by- accident- reveal- a- sure- fire- procedure- to- identify- supplemental- results 1
97 Sep 18, 2007 @ 08:05 http: / / sphinn.com/ story/ 5721 1
98 Sep 18, 2007 @ 09:08 http: / / sphinn.com/ story/ title/ If- yoursquore- not- an- Amway- millionaire- avoid- BlogRush- like- the- plague 1
99 Sep 18, 2007 @ 10:02 http: / / sphinn.com/ story/ 5973#wholecomment8559 1
100 Sep 19, 2007 @ 11:48 http: / / sphinn.com/ user/ view/ voted/ login/ bhancock 1
101 Sep 19, 2007 @ 20:27 http: / / sphinn.com/ published/ page/ 5 1
102 Sep 20, 2007 @ 00:39 http: / / blogmarks.net/ my/ marks,new? title= How to get the perfect logo for your blog& url= http: / / sebastians- pamphlets.com/ how- to- get- the- perfect- logo- for- your- blog/ & summary= & via= http: / / sphinn.com/ story/ 6122 1
103 Sep 20, 2007 @ 01:34 http: / / sphinn.com/ user/ page/ 3/ voted/ Wiep 1
104 Sep 24, 2007 @ 15:49 http: / / sphinn.com/ greatesthits/ page/ 3 1
105 Sep 24, 2007 @ 19:51 http: / / sphinn.com/ story.php? id= 6761 1
106 Sep 24, 2007 @ 22:32 http: / / sphinn.com/ greatesthits/ page/ 2 1
107 Sep 26, 2007 @ 15:13 http: / / sphinn.com/ story.php? id= 7170 1
108 Sep 29, 2007 @ 05:27 http: / / sphinn.com/ category/ SphinnZone 1
109 Oct 09, 2007 @ 11:44 http: / / sphinn.com/ story.php? id= 8883 1
110 Oct 10, 2007 @ 10:04 http: / / sphinn.com/ published/ month 1
111 Oct 24, 2007 @ 15:07 http: / / sphinn.com/ story.php? id= 10881 1
112 Oct 26, 2007 @ 09:53 http: / / sphinn.com/ story.php? id= 11205 1
113 Oct 30, 2007 @ 08:58 http: / / sphinn.com/ upcoming/ page/ 3 1
114 Oct 30, 2007 @ 12:31 http: / / sphinn.com/ upcoming/ most 1
Total  3,688


Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

The day the routers died

Why the fuck do we dumb and clueless Internet marketers care about Google’s Toolbar PageRank when the Internet faces real issues? Well, both the toolbar slider as well as IPv4 are somewhat finite.

I can hear the IM crowd singing “The day green pixels died” … whilst Matt’s gang in building 43 intones “No mercy, smack paid links, no place to hide for TLA links” … Enjoy this video, it’s friggin’ hilarious:

 

Since Gary Feldman’s song “The Day The Routers Died” will become an evergreen soon, I thought you might be interested in a transcript:

A long long time ago
I can still remember
When my laptop could connect elsewhere.

And I tell you all there was a day
The network card I threw away
Had a purpose and it worked for you and me.

But 18 years completely wasted
With each address we’ve aggregated
The tables overflowing
The traffic just stopped flowing.

And now we’re bearing all the scars
And all my traceroutes showing stars
The packets would travel faster in cars
The day the routers died.

So bye bye, folks at RIPE:55
Be persuaded to upgrade it or your network will die
IPv6 makes me let out a sigh
But I spose we’d better give it a try
I suppose we’d better give it a try!

Now did you write an RFC
That dictated how we all should be
Did we listen like we should that day?

Now were you back at RIPE fifty-four
Where we heard the same things months before
And the people knew they’d have to change their ways.

And we knew that all the ISPs
Could be future proof for centuries.

But that was then not now
Spent too much time playing WoW.

Ooh there was time we sat on IRC
Making jokes on how this day would be
Now there’s no more use for TCP
The day the routers died.

So bye bye, folks at RIPE:55
Be persuaded to upgrade it or your network will die
IPv6 just makes me let out a sigh
But I spose we’d better give it a try
I suppose we’d better give it a try!

I remember those old days I mourn
Sitting in my room, downloading porn
Yeah that’s how it used to be.

When the packets flowed from A to B
Via routers that could talk IP
There was data [that] could be exchanged between you and me.

Oh but I could see you all ignore
The fact we’d fill up IPv4!

But we all lost the nerve
And we got what we deserved!

And while we threw our network kit away
And wished we’d heard the things they say
Put all our lives in disarray
The day the routers died.

So bye bye, folks at RIPE:55
Be persuaded to upgrade it or your network will die
IPv6 just makes me let out a sigh
But I spose we’d better give it a try
I suppose we’d better give it a try!

Saw a man with whom I used to peer
Asked him to rescue my career
He just sighed and turned away.

I went down to the ‘net cafe
That I used to visit everyday
But the man there said I might as well just leave.

[And] now we’ve all lost our purpose
My cisco shares completely worthless
No future meetings for me
At the Hotel Krasnapolsky.

And the men that make us push and push
Like Geoff Huston and Randy Bush
Should’ve listened to what they told us
The day the routers died.

So bye bye, folks at RIPE:55
Be persuaded to upgrade it or your network will die
IPv6 just makes me let out a sigh
But I spose we’d better give it a try
[I suppose we’d better give it a try!]

Recorded at the RIPE:55 meeting in Amsterdam (NL) at the Krasnapolsky Hotel between 22 and 26 October 2007.

Just in case the video doesn’t load, here is another recording.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

A pragmatic defence against Google’s anti paid links campaign

Google’s recent shot across the bows of a gazillion sites handling paid links, advertising, or internal cross links not compliant to Google’s imagination of a natural link is a call for action. Google’s message is clear: “condomize your commercial links or suffer” (from deducted toolbar PageRank, links without the ability to pass real PageRank and relevancy signals, or perhaps even penalties).

Paid links: good versus evilOf course that’s somewhat evil, because applying nofollow values to all sorts of links is not exactly a natural thing to do; visitors don’t care about invisible link attributes and sometimes they’re even pissed when they get redirected to an URL not displayed in their status bar. Also, this requirement forces Webmasters to invest enormous efforts in code maintenance for the sole purpose of satisfying search engines. The argument “if Google doesn’t like these links, then they can discount them in their system, without bothering us” has its merits, but unfortunately that’s not the way Google’s cookie crumbles for various reasons. Hence lets develop a pragmatic procedure to handle those links.

The problem

Google thinks that uncondomized paid links as well as commercial links to sponsors or affiliated entities aren’t natural, because the terms “sponsor|pay for review|advertising|my other site|sign-up|…” and “editorial vote” are not compatible in the sense of Google’s guidelines. This view at the Web’s linkage is pretty black vs. white.

Either you link out because a sponsor bought ads, or you don’t sell ads and link out for free because you honestly think your visitors will like a page. Links to sponsors without condom are black, links to sites you like and which you don’t label “sponsor” are white.

There’s nothing in between, respectively gray areas like links to hand picked sponsors on a page with a gazillion of links count as black. Google doesn’t care whether or not your clean links actually pass a reasonable amount of PageRank to link destinations which buy ad space too, the sole possibility that those links could  influence search results is enough to qualify you as sort of a link seller.

The same goes for paid reviews on blogs and whatnot, see for example Andy’s problem with his honest reviews which Google classifies as paid links, and of course all sorts of traffic deals, affiliate links, banner ads and stuff like that.

You don’t even need to label a clean link as advert or sponsored. If the link destination matches a domain in Google’s database of on-line advertisers, link buyers, e-commerce sites / merchants etcetera, or Google figures out that you link too much to affiliated sites or other sites you own or control, then your toolbar PageRank is toast and most probably your outgoing links will be penalized. Possibly these penalties have impact on your internal links too, what results in less PageRank landing on subsidiary pages. Less PageRank gathered by your landing pages means less crawling, less ranking, less SERP referrers, less revenue.

The solution

You’re absolutely right when you say that such search engine nitpicking should not force you to throw nofollow crap on your links like confetti. From your and my point of view condomizing links is wrong, but sometimes it’s better to pragmatically comply to such policies in order to stay in the game.

Although uncrawlable redirect scripts have advantages in some cases, the simplest procedure to condomize a link is the rel-nofollow microformat. Here is an example of a googlified affiliate link:
<a href="http://sponsor.com/?affID=1" rel="nofollow">Sponsor</a>

Why serve your visitors search engine crawler directives?

Complying to Google’s laws does not mean that you must deliver crawler directives like rel=”nofollow” to your visitors. Since Google is concerned about search engine rankings influenced by uncondomized links with commercial intent, serving crawler directives to crawlers and clean links to users is perfectly in line with Google’s goals. Actually, initiatives like the X-Robots-Tag make clear that hiding crawler directives from users is fine with Google. To underline that, here is a quote from Matt Cutts:

[…] If you want to sell a link, you should at least provide machine-readable disclosure for paid links by making your link in a way that doesn’t affect search engines. […]

The other best practice I’d advise is to provide human readable disclosure that a link/review/article is paid. You could put a badge on your site to disclose that some links, posts, or reviews are paid, but including the disclosure on a per-post level would better. Even something as simple as “This is a paid review” fulfills the human-readable aspect of disclosing a paid article. […]

Google’s quality guidelines are more concerned with the machine-readable aspect of disclosing paid links/posts […]

To make sure that you’re in good shape, go with both human-readable disclosure and machine-readable disclosure, using any of the methods [uncrawlable redirects, rel-nofollow] I mentioned above.
[emphasis mine]

Since Google devalues paid links anyway, search engine friendly cloaking of rel-nofollow for Googlebot is a non-issue with advertisers, as long as this fact is disclosed. I bet most link buyers look at the magic green pixels anyway, but that’s their problem.

How to cloak rel-nofollow for search engine crawlers

I’ll discuss a PHP/Apache example, but this method is adaptable to other server sided scripting languages like ASP or so with ease. If you’ve a static site and PHP is available on your (*ix) host, you need to tell Apache that you’re using PHP in .html (.htm) files. Put this statement in your root’s .htaccess file:
AddType application/x-httpd-php .html .htm

Next create a plain text file, insert the code below, and upload it as “funct_nofollow.php” or so to your server’s root directory (or a subdirectory, but then you need to change some code below).
<?php
function makeRelAttribute ($linkClass) {
$numargs = func_num_args();
// optional 2nd input parameter: $relValue
if ($numargs >= 2) {
$relValue = func_get_arg(1) ." ";
}
$referrer = $_SERVER["HTTP_REFERER"];
$refUrl = parse_url($referrer);
$isSerpReferrer = FALSE;
if (stristr($refUrl[host], "google.") ||
stristr($refUrl[host], "yahoo."))
$isSerpReferrer = TRUE;
$userAgent = $_SERVER["HTTP_USER_AGENT"];
$isCrawler = FALSE;
if (stristr($userAgent, "Googlebot") ||
stristr($userAgent, "Slurp"))
$isCrawler = TRUE;
if ($isCrawler /*|| $isSerpReferrer*/ ) {
if ("$linkClass" == "ad") $relValue .= "advertising nofollow";
if ("$linkClass" == "paid") $relValue .= "sponsored nofollow";
if ("$linkClass" == "own") $relValue .= "affiliated nofollow";
if ("$linkClass" == "vote") $relValue .= "editorial dofollow";
}
if (empty($relValue))
return "";
return " rel=\"" .trim($relValue) ."\" ";
} // end function makeRelValue
?>

Next put the code below in a PHP file you’ve included in all scripts, for example header.php. If you’ve static pages, then insert the code at the very top.
<?php
@include($_SERVER["DOCUMENT_ROOT"] ."/funct_nofollow.php");
?>

Do not paste the function makeRelValue itself! If you spread code this way you’ve to edit tons of files when you need to change the functionality later on.

Now you can use the function makeRelValue($linkClass,$relValue) within the scripts or HTML pages. The function has an input parameter $linkClass and knows the (self-explanatory) values “ad”, “paid”, “own” and “vote”. The second (optional) input parameter is a value for the A element’s REL attribute itself. If you provide it, it gets appended, or, if makeRelValue doesn’t detect a spider, it creates a REL attribute with this value. Examples below. You can add more user agents, or serve rel-nofollow to visitors coming from SERPs by enabling the || $isSerpReferrer condition (remove the bold /*&*/).

When you code a hyperlink, just add the function to the A tag. Here is a PHP example:
print "<a href=\"http://google.com/\"" .makeRelAttribute("ad") .">Google</a>";

will output
<a href="http://google.com/" rel="advertising nofollow" >Google</a>
when the user agent is Googlebot, and
<a href="http://google.com/">Google</a>
to a browser.

If you can’t write nice PHP code, for example because you’ve to follow crappy guidelines and worst practices with a WordPress blog, then you can mix HTML and PHP tags:
<a href="http://search.yahoo.com/"<?php print makeRelAttribute("paid"); ?>>Yahoo</a>

Please note that this method is not safe with search engines or unfriendly competitors when you want to cloak for other purposes. Also, the link condoms are served to crawlers only, that means search engine staff reviewing your site with a non-crawler user agent name won’t spot the nofollow’ed links unless they check the engine’s cached page copy. An HTML comment in HEAD like “This site serves machine-readable disclosures, e.g. crawler directives like rel-nofollow applied to links with commercial intent, to Web robots only.” as well as a similar comment line in robots.txt would certainly help to pass reviews by humans.

A Google-friendly way to handle paid links, affiliate links, and cross linking

Load this page with different user agents and referrers. You can do this for example with a FireFox extension like PrefBar. For testing purposes you can use these user agent names:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

and these SERP referrer URLs:
http://google.com/search?q=viagra
http://search.yahoo.com/search?p=viagra&ei=utf-8&iscqry=&fr=sfp

Just enter these values in PrefBar’s user agent respectively referrer spoofing options (click “Customize” on the toolbar, select “User Agent” / “Referrerspoof”, click “Edit”, add a new item, label it, then insert the strings above). Here is the code above in action:

Referrer URL:
User Agent Name: CCBot/2.0 (http://commoncrawl.org/faq/)
Ad makeRelAttribute(”ad”): Google
Paid makeRelAttribute(”paid”): Yahoo
Own makeRelAttribute(”own”): Sebastian’s Pamphlets
Vote makeRelAttribute(”vote”): The Link Condom
External makeRelAttribute(”", “external”): W3C rel="external"
Without parameters makeRelAttribute(”"): Sphinn

When you change your browser’s user agent to a crawler name, or fake a SERP referrer, the REL value will appear in the right column.

When you’ve developed a better solution, or when you’ve a nofollow-cloaking tutorial for other programming languages or platforms, please let me know in the comments. Thanks in advance!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Google Toolbar PageRank deductions make sense

Google policing the Web's linkageSince toolbar PR is stale since April, and now only a few sites were “updated” without any traffic losses, I can imagine that’s just a “watch out” signal from Google, not yet a penalty. Of course it’s not a conventional toolbar PageRank update, because new pages aren’t affected. That means the deductions are not caused by a finite amount of PageRank spread over more pages discovered by Google since the last toolbar PR update.

Unfortunately, in the current toolbar PR hysteria next to nobody tries to figure out Google’s message. Crying foul is not very helpful, since Google is not exactly known as a company revising such decisions based on Webmaster rants lashing “unfair penalties”.

By the way, I think Andy is spot on. Paid links are definitively a cause of toolbar PageRank downgrades. Artificial links of any kind is another issue. Google obviously has a different take on interlinking respectively crosslinking for example. Site owners argue that it makes business sense, but Google might think most of these links come without value for their users. And there are tons more pretty common instances of “link monkey business”.

Maybe Google alerts all sorts of sites violating the SEO bible’s twelve commandments with a few less green pixels, before they roll out new filters which would catch those sins and penalize the offending pages accordingly. Actually, this would make a lot of sense.

All site owners and Webmasters monitor their toolbar PR. Any significant changes are discussed in a huge community. If the crowd assumes that artifical links cause toolbar PR deductions, many sites will change their linkage. This happened already after the first shot across the bows two weeks ago. And it will work again. Google gets the desired results: less disliked linkage, less sites selling uncondomized links.

That’s quite smart. Google has learned that they can’t ban or overpenalize popular sites, because that leads to fucked up search results for not only navigational search queries, in other words pissed searchers. Taking back a few green pixels from the toolbar on the other hand is not an effective penalty, because toolbar PR is unrelated to everything that matters. It is however a message with guaranteed delivery.

Running algos in development stage on the whole index and using their findings to manipulate toolbar PageRank data hurts nobody, but might force many Webmasters to change their stuff in order to comply to Google’s laws. As a side effect, this procedure even helps to avoid too much collateral damage when the actual filters become active later on.

There seems to exist another pattern. Most sites targeted by the recent toolbar PageRank deductions are SEO aware to some degree. They will spread the word. And complain loudly. Google has quite a few folks on the payroll who monitor the blogosphere, SEO forums, Webmaster hangouts and whatnot. Analyzing such reactions is a great way to gather input usable to validate and fine tune not yet launched algos.

Of course that’s sheer speculation. What do you think, does Google use toolbar PR as a “change your stuff or find yourself kicked out soon” message? Or ist it just a try to make link selling less attractive?

Update Insightful posts on Google’s toolbar PageRank manipulations:

And here is a pragmatic answer to Google’s paid links requirements: Cloak the hell out of your links with commercial intent!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

« Previous Page  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28  Next Page »