When your referrer stats turn into a porn TGP

When you wonder why your top referrers are porn galleries, make-you-rich-in-a-second scams and other pages which don’t carry your link but try to sell you something, read further.

Referrer spamming is done by bots requesting pages from your site, leaving a bogus HTTP_REFERER. These spam bots come from various IPs, change their user agents on the fly, and use other sneaky techniques to slip thru spam protection. Some of them are somewhat clever by adjusting the number of bogus requests to your site by your Alexa stats to ensure their “visits” do appear on limited realtime referrer lists and other stats by referrer. Some of them even suck the whole pages from your server, and a few even follow redirects.

So what can you do? Not much. You can’t really get rid of these log entries, because the logs are written before your spam protection handles those requests. But you can reduce the waste of bandwidth and server resources. If you redirect these requests, your server sends only a header, but not the contents. Here is a way to accomplish that:

First of all, extract the bogus referrers from your logs or stats pages, and save them in a plain text file:
Change this to a list of domains, truncating subdomains like “www” or “galleries”, and add .htaccess code:

SetEnvIf Referer \.collegefuckfest\.com GoFuckYourself=1
SetEnvIf Referer \.asstraffic\.com GoFuckYourself=1
SetEnvIf Referer \.allinternal\.com GoFuckYourself=1
SetEnvIf Referer \.mature-lessons\.com GoFuckYourself=1
SetEnvIf Referer \.wildpass\.com GoFuckYourself=1
SetEnvIf Referer \.promote-biz\.net GoFuckYourself=1

This code will create an environment variable “GoFuckYourself” with the value “1″. Following statements can now work with these marked requests:

RewriteCond %{ENV:GoFuckYourself} 1 [NC]
RewriteRule /* %{HTTP_REFERER} [R=301,L]

This redirects the request to its referrer, so if the bogus bot follows redirects, it will request a page from the spammer’s domain. Of course you can redirect to a static URL too:
RewriteRule /* http://www.example.com/gofuckyourself [R=301,L]

You could also use the environment variable in deny statements
order allow,deny
allow from all
deny from env=GoFuckYourself

but that will serve a complete page, and may produce an infinite loop. Deny as well as the similar RewriteRule .* - [F] enforce a 403-Forbidden. Then if you’ve an ErrorDocument 403 /getthefuckouttahere.html directive, the request of the error page runs into the 403 itself - this process calls itself over and over until it gets terminated after 20 or so loops.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

German spammers banning all domains out there

If you receive an email in german language from Google’s Search Quality team (donotreply@gmail.com) telling your site was banned by Google for 30 days please don’t worry. That’s faked. Legit (similar phrased) emails must come from a google.com email address. If the hoax-email comes with an attachment, don’t save or open the attached file (zipped google_webmastertools.exe)!

Here is the email:
Entfernung Ihrer Webseite [domain] aus dem Google Index
The email looks pretty authentic, its style and wording are somewhat Google-ish. I speak German, hence I’m sure that gazillions of innocent Webmasters and site owners buy it and panic. Unfortunately most filters let the zipped attachment (google_webmastertools.exe) pass thru. I didn’t open it myself and I bet it’s not a bright idea to try it.

Google told me that Stefanie from the real Search Quality team over in Dublin will soon post a warning on the german blog.

Here is an original penalty warning in german language:
Entfernung Ihrer Webseite aus dem Google IndexThese emails are sent from donotreply@google.com without attachments.

Update 05/10/2007: Here is Google’s official statement (in german language) and the english version by Vanessa. The attached .exe is a joke, it executes cmd.exe c\:clear complete harddisc (Hoax.BAT.Small.a).

Update 05/11/2007: Because these emails are easy to mistake for authentic ones from the Search Quality team, Google temporarily discontinued sending them as they work on ways to provide more secure communication mechanisms. This update reads as if Google has stopped to send out penalty notification emails in all languages: “… as we’ve temporarily stopped sending emails about guidelines violations, you can safely assume that any email you receive isn’t from us. Note that we do provide information about some violations in webmaster tools.”.

Update 06/19/2007: German forums and blogs report another flood of these faked emails, and this post got tons of visits from searches for quotes from the email quoted above. Calm down, don’t panic: Google still doesn’t send out penalty notifications via email (in Deutsch). So please ignore the spam and refer to the diagnostics tab in your Webmaster Central account when you assume a penalty.

Update 07/18/2007: Google released the message center where site owners can poll for penalty notifications. They are still working on a safe solution for emails. Probably ‘-950/-30/-n penalties’ won’t get announced any time soon.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Brittany-Spear-Nude-mesothelioma-ringtones

John Brittany Spear, blogging nude about mesothelioma and ringtones all day long, asked me to introduce the GoogleWhack Brittany-Spear-Nude-mesothelioma-ringtones.

Well, that gets me somewhat nervous, coz Brittany Spear sounds more like a pretty handsome gal. Anybody got a nude pic to download? Actually, I can’t imagine a blog titled John “Brittany Spear” talking about mesothelioma ringtones. By the way, I know what a ringtone is, but how the heck can a cell phone sound like a tumor? Is there a place to download free mesothelioma ringtones for my mobile phone? Or is that a laywer’s trick to get me sick on Brittany Spear, whoever that may be, and however she may look undressed? Not that I dislike nude Brittanies, in fact I do love a naked Brittany for breakfast, but I’m not sure I’d download a nudist suffering from mesothelioma at an all for free ringtone site.

Also, what will the allmighty Google think about my keyword stuffing when it comes to ringtones related to mesothelioma discussed by a nude Brittany Spear selling PR8 links for as low as $299.00? Is that fun or spam? Go figure …

Actually, I deserve the pain caused by exposure to asbestos fibres, particulary those of crocidolite, the fibres of which are thin and straight and penetrate to the deep layers of the lung.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Yahoo! search going to torture Webmasters

According to Danny Yahoo! supports a multi-class nonsense called robots-nocontent tag. CRAP ALERT!

Can you senseless and cruel folks at Yahoo!-search imagine how many of my clients who’d like to use that feature have copied and pasted their pages? Do you’ve a clue how many sites out there don’t make use of SSI, PHP or ASP includes, and how many sites never heard of dynamic content delivery, respectively how many sites can’t use proper content delivery techniques because they’ve to deal with legacy systems and ancient business processes? Did you ask how common templated Web design is, and I mean the weird static variant, where a new page gets build from a randomly selected source page saved as new-page.html?

It’s great that you came out with a bastardized copy of Google’s somewhat hapless (in the sense of cluttering structured code) section targeting, because we dreadfully need that functionality across all engines. And I admit that your approach is a little better than AdSense section targeting because you don’t mark payload by paydirt in comments. But why the heck did you design it that crappy? The unthoughtful draft of a microformat from what you’ve “stolen” that unfortunate idea didn’t become a standard for very good reasons. Because it’s crap. Assigning multiple class names to markup elements for the sole purpose of setting crawler directives is as crappy as inline style assignments.

Well, due to my zero-bullshit tolerance I’m somewhat upset, so I repeat: Yahoo’s robots-nocontent class name is crap by design. Don’t use it, boycott it, because if you make use of it you’ll change gazillions of files for each and every proprietary syntax supported by a single search engine in the future. When the united search geeks can agree on flawed standards like rel-nofollow, they should be able to talk about a sensible evolvement of robots.txt.

There’s a way easier solution, which doesn’t require editing tons of source files, that is standardizing CSS-like syntax to assign crawler directives to existing classes and DOM-IDs. For example extent robots.txt syntax like:

A.advertising { rel: nofollow; } /* devalue aff links */

DIV.hMenu, TD#bNav { content:noindex; rel:nofollow; } /* make site wide links unsearchable */

Unsupported robots.txt syntax doesn’t harm, proprietary attempts do harm!

Dear search engines, get together and define something useful, before each of you comes out with different half-baked workarounds like section targeting or robots-nocontent class values. Thanks!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Google hunts paid links and reciprocal linkage

Matt Cutts and Adam Lasnik have clarified Google’s take on paid links and overdone reciprocal linkage. Some of their statements are old news, but it surely helps to have a comprehensive round-up in the context of the current debate on paid links.

So what –in short– does Google consider linkspam:
Artificial link schemes, paid links and uncondomized affiliate links, overdone reciprocal linkage and interlinking.

All sorts of link schemes designed to increase a site’s ranking or PageRank. Link scheme means for example mass exchange of links pages, repeated chunks of links per site, fishy footer links, triangular PageRank boosting, 27-way-linkage where in the end only the initiator earns a few inbounds because the participants are confused, and “genial” stuff like that. Google’s pretty good at identifying link farming, and bans or penalizes accordingly. That’s old news, but such techniques are still used, widely.

Advice: don’t participate, Google will catch you eventually.

Paid links, if detected or reported, get devalued. That is, they don’t help the link destination’s search engine rankings, and in some cases the source will lose its ability to pass reputation via links. Google does this more or less silently since 2003 at least, probably longer, but until today there was no precise definition of risky paid links.

That’s going to change. Adam Lasnik, commenting Eric Enge’s “It seems to me that one of the more challenging aspects of all of this is that people have gotten really good at buying a link that show no indication that they are purchased.”

Yes and no, actually. One of the things I think Matt has commented about in his blog; it’s what we joking refer to as famous last words, which is “well, I have come up with a way to buy links that is completely undetectable”.

As people have pointed out, Google buys advertising, and a lot of other great sites engage in both the buying and selling of advertising. There is no problem with that whatsoever. The problem is that we’ve seen quite a bit of buying and selling for the very clear purpose of transferring PageRank. Some times we see people out there saying “hey, I’ve got a PR8 site” and, “this will give you some great Google boost, and I am selling it for just three hundred a month”. Well, that’s blunt, and that’s clearly in violation of the “do not engage in linking schemes that are not permitted within the webmaster guidelines”.

Two, taking a step back, our goal is not to catch one hundred percent of paid links [emphasis mine]. It’s to try to address the egregious behavior of buying and selling the links that focus on the passing of PageRank. That type of behavior is a lot more readily identifiable then I think people give us credit for.

So it seems Google’s just after PageRank selling. Adam’s following comments on the use and abuse of rel-nofollow emphasizes this interpretation:

I understand there has been some confusion on that, both in terms of how it [rel=nofollow] works or why it should be used. We want links to be treated and used primarily as votes for a site, or to say I think this is an interesting site, and good site. The buying and selling of links without the use of Nofollow, or JavaScript links, or redirects has unfortunately harmed that goal. We realize we cannot turn the web back to when it was completely noncommercial and we don’t want to do that [emphasis mine]. Because, obviously as Google, we firmly believe that commerce has an important role on the Internet. But, we want to bring a bit of authenticity back to the linking structure of the web. […] our interest isn’t in finding and taking care of a hundred percent of links that may or may not pass PageRank. But, as you point out relevance is definitely important and useful, and if you previously bought or sold a link without Nofollow, this is not the end of the world. We are looking for larger and more significant patterns [emphasis mine].

Don’t miss out on Eric Enge’s complete interview with Adam Lasnik, it’s really worth bookmarking for future references!

Matt Cutts has updated (May 12th, 2007) an older and well linked post on paid links. It also covers thoughts on the value of directory links. Here are a few quotes, but don’t miss out on Matt’s post:

… we’re open to semi-automatic approaches to ignore paid links, which could include the best of algorithmic and manual approaches.

Q: Now when you say “paid links”, what exactly do you mean by that? Do you view all paid links as potential violations of Google’s quality guidelines?
A: Good question. As someone working on quality and relevance at Google, my bottom-line concern is clean and relevant search results on Google. As such, I care about paid links that flow PageRank and attempt to game Google’s rankings. I’m not worried about links that are paid but don’t affect search engines. So when I say “paid links” it’s pretty safe to add in your head “paid links that flow PageRank and attempt to game Google’s rankings.”

Q: This is all well and fine, but I decide what to do on my site. I can do anything I want on it, including selling links.
A: You’re 100% right; you can do absolutely anything you want on your site. But in the same way, I believe Google has the right to do whatever we think is best (in our index, algorithms, or scoring) to return relevant results.

Q: Hey, as long as we’re talking about directories, can you talk about the role of directories, some of whom charge for a reviewer to evaluate them?
A: I’ll try to give a few rules of thumb to think about when looking at a directory. When considering submitting to a directory, I’d ask questions like:
- Does the directory reject URLs? If every URL passes a review, the directory gets closer to just a list of links or a free-for-all link site.
- What is the quality of urls in the directory? Suppose a site rejects 25% of submissions, but the urls that are accepted/listed are still quite low-quality or spammy. That doesn’t speak well to the quality of the directory.
- If there is a fee, what’s the purpose of the fee? For a high-quality directory, the fee is primarily for the time/effort for someone to do a genuine evaluation of a url or site.
Those are a few factors I’d consider. If you put on your user hat and ask “Does this seem like a high-quality directory to me?” you can usually get a pretty good sense as well, or ask a few friends for their take on a particular directory.

To get a better idea on how Google’s search quality team chases paid links, read Brian White’s post Paid Link Schemes Inside Original Content.

Advice: either nofollow paid links, or don’t get caught. If you buy links, pay only for the traffic, because with or without link condom there’s no search engine love involved.

Affiliate links are seen as kinda subset of paid links. Google can identify most (unmasked) affiliate links. Frankly, there’s no advantage in passing link love to sponsors.

Advice: nofollow.

Reciprocal links without much doubt nullify each other. Overdone reciprocal linkage may even cause penalties, that is the reciprocal links area of a site gets qualified as link farm, for possible consequences scroll up a bit. Reciprocal links are natural links, and Google honors them if the link profile of a site or network does not consist of a unnnatural high number of reciprocal or triangular link exchanges. It may be that natural reciprocal links pass (at least a portion of) PageRank, but no (or less than one-way links) revelancy via anchor text and trust or other link reputation.

Matt Cutts discussing “Google Hell”:

Reciprocal links by themselves aren’t automatically bad, but we’ve communicated before that there is such a thing as excessive reciprocal linking. […] As Google changes algorithms over time, excessive reciprocal links will probably carry less weight. That could also account for a site having more pages in supplemental results if excessive reciprocal links (or other link-building techniques) begin to be counted less. As I said in January: “The approach I’d recommend in that case is to use solid white-hat SEO to get high-quality links (e.g. editorially given by other sites on the basis of merit).”

Advice: It’s safe to consider reciprocal links somewhat helpful, but don’t actively chase for reciprocal links.

Interlinking all sites in a network can be counterproductive, but selfish cross-linking is not penalized in general. There’s no “interlinking penalty” when these links make sound business sense, even when the interlinked sites aren’t topically related. Interlinking sites handling each and every yellow page category on the other hand may be considered overdone. In some industries like adult entertainment, where it’s hard to gain natural links, many webmasters try to boost their rankings with links from other (unrelated) sites they own or control. Operating hundreds or thousands of interlinked travel sites spread on many domains and subdomains is risky too. In the best case such linking patterns may be just ignored by Google, that is they’ve no or very low impact on rankings at all, but it’s easy to convert a honest network into a link farm by mistake.

Advice: Carefully interlink your own sites in smaller networks, but partition these links by theme or branch in huge clusters. Consider consolidating closely related sites.

So what does all that mean for Webmasters?

Some might argue “if it ain’t broke don’t fix it”, in other words “why should I revamp my linkage when I rank fine?”. Well, rules like “any attempt to improve on a system that already works is pointless and may even be detrimental” are pointless and detrimental in a context where everything changes daily. Especially, when the tiny link-systems designed to fool another system, passively interact with that huge system (the search engine polls linkage data for all kinds of analyses). In that case the large system can change the laws of the game at any time to outsmart all the tiny cheats. So just because Google didn’t discover all link schemes or shabby reciprocal link cycles out there, that does not mean the participants are safe forever. Nothing’s set in stone, not even rankings, so better revise your ancient sins.

Bear in mind that Google maintains a database containing all links in the known universe back to 1998 or so, and that a current penalty may be the result of a historical analysis of a site’s link attitude. So when a site is squeaky clean today but doesn’t rank adequately, consider a reinclusion request if you’ve cheated in the past.

Before you think of penalties as the cause of downranked or even vanished pages, analyze your inbound links that might have started counting for less. Pull all your inbound links from Site Explorer or Webmaster Central, then remove questionable sources from the list:

  • Paid links and affiliate links where you 301-redirect all landing pages with affiliate IDs in the query string to a canonical landing page,
  • Links from fishy directories, links lists, FFAs, top rank lists, DMOZ-clones and stuff like that,
  • Links from URLs which may be considered search results,
  • Links from sites you control or which live off your contents,
  • Links from sites engaged in reciprocal link swaps with your sites,
  • Links from sites which link out to too many questionable pages in link directories or where users can insert links without editorial control,
  • Links from shabby sites regardless their toolbar PageRank,
  • Links from links pages which don’t provide editorial contents,
  • Links from blog comments, forum signatures, guestbooks and other places where you can easily drop URLs,
  • Nofollow’ed links and links routed via uncrawlable redirect scripts,

Judge by content quality, traffic figures if available, and user friendliness, not by toolbar PageRank. Just because a link appears in reverse citation results, that does not mean it carries any weight.

Look at the shrinked list of inbound links and ask yourself where on the SERPs a search engine should rank your stuff based on these remaining votes. Frustrated? Learn the fine art of link building from an expert in the field.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Categorizing posts with blogger (rant)

Google knows everything about AJAX. Why the heck can’t I assign categories to old posts without hassles? “Edit posts - change number of listed posts - scoll down - edit - scoll down - choose/enter categories - publish - repeat” is just 7 full page reloads/actions too much. On a slow DSL connection this archaic procedure drives me nuts.

Dear readers, when you click on “Labels” most probably you won’t find related posts :( I’m adding categories when I update an old post, but UI flaws hinder me to categorize the whole archive. Sorry.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

How Google & Yahoo handle the link condom

Loren Baker over at SEJ got a few official statements on use and abuse of the rel-nofollow microformat by the major players: How Google, Yahoo & Ask treat NoFollow’ed links. Great job, thanks!

Ask doesn’t “officially” support nofollow, whatever that means. Loren didn’t ask MSN, probably because he didn’t expect that they’ve even noticed that they officially support nofollow since 2005, same procedure with sitemaps by the way. Yahoo implemented it along the specs, and Google stepped way over the line the norm sets. So here is the difference:

1. Do you follow a nofollow’ed link?
Google: No (longer)
Yahoo: Yes

2. Do you index the linked page following a nofollow’ed link?
Google: Obsolete, see 1.
Yahoo: Yes

3. Does your ranking algos factor in reputation, anchor/alt/title text or whichever link love sourced from a nofollow’ed link?
Google: Obsolete, see 1.
Yahoo: No

4. Do you show nofollow’ed links in reverse citation results?
Google: Yes (in link: searches by accident, in Webmaster Central if the source page didn’t make it into the supplemental index)
Yahoo: Yes (Site Explorer)

Q&A#4 is made up but accurate. I think it’s safe to assume that MSN handles the link condom like Yahoo. (Update: As Loren clarifies in the comments, he asked MSN search but they didn’t answer in a timely fashion.)

And here’s a remarkable statement from Google’s search evangelist Adam Lasnik, who may like nofollow or not:

On a related note, though, and echoing Matt’s earlier sentiments … we hope and expect that more and more sites — including Wikipedia — will adopt a less-absolute approach to no-follow … expiring no-follows, not applying no-follows to trusted contributors, and so on.

Bravo!

Related link: rel=”nofollow” Google, Yahoo and MSN



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Erol to ship a Patch Fixing Google Troubles

Background: read these four posts on Google penalizing respectively deindexing e-commerce sites. Long story short: Recently Google’s enhanced algos began to deindex e-commerce sites powered by Erol’s shopping cart software. The shopping cart maintains a static HTML file which redirects user agents executing JavaScript to another URL. This happens with each and every page, so it’s quite understandable that Ms. Googlebot was not amused. I got involved as a few worried store owners asked for help in Google’s Webmaster Forum. After lots of threads and posts on the subject Erol’s managing director got in touch with me and we agreed to team up to find a solution to help the store owners suffering from a huge traffic loss. Here’s my report of the first technical round.

Understanding how Erol 4.x (and all prior versions) works:

The software generates a HTML page offline, which functions as an XML-like content source (called “x-page”, I use that term because all Erol customers are familar with it). The “x-page” gets uploaded to the server and is crawlable, but not really viewable. Requested by a robot it responds with 200-Ok. Requested by a human, it does a JavaScript redirect to a complex frameset, which loads the “x-page” and visualizes its contents. It responds to browsers if directly called, but returns a 404-NotFound error to robots. Example:

“x-page”: x999.html
Frameset: erol.html#999×0&&

To view the source of the “x-page” disable JavaScript before you click the link.

Understanding how search engines handle Erol’s pages:

There are two major weak points with regard to crawling and indexing. The crawlable page redirects, and the destination does not exist if requested by a crawler. This leads to these scenarios:

  1. A search engine ignoring JavaScript on crawled pages fetches the “x-page” and indexes it. That’s the default behavior of yesterdays crawlers, and still works this way at several search engines.
  2. A search engine not executing JavaScript on crawled pages fetches the “x-page”, analyzes the client sided script, and discovers the redirect (please note that a search engine crawler may change its behavior, so this can happen all of a sudden to properly indexed pages!). Possible consequences:
    • It tries to fetch the destination, gets the 404 response multiple times, and deindexes the “x-page” eventually. That would mean that depending on the crawling frequency and depth per domain the pages disappear quite fast or rather slow until the last page is phased out. Google would keep a copy in the supplemental index for a while, but this listing cannot return to the main index.
    • It’s trained to consider the unconditional JavaScript redirect “sneaky” and flags the URL accordingly. This can result in temporarily and permanent deindexing as well.
  3. A search engine executing JavaScript on crawled pages fetches the “x-page”, performs the redirect (thus ignores the contents of the “x-page”), and renders the frameset for indexing. Chances are it gives up on the complexity of the nested frames, indexes the noframe-tag of the frameset and perhaps a few snippets from subframes, considers the whole conglomerate thin, hence assignes the lowest possible priority for the query engine and moves on.

Unfortunately the search engine delivering the most traffic began to improve its crawling and indexing, hence many sites formerly receiving a fair amount of Google traffic began to suffer from scenario 2 — deindexing.

Outlining a possible work around to get the deleted pages back in the search index:

In six months or so Erol will ship version 5 of its shopping cart, and this software dumps frames, JavaScript redirects and ugly stuff like that in favor of clean XHTML and CSS. By the way, Erol has asked me for my input on their new version, so you can bet it will be search engine friendly. So what can we do in the meantime to help legions of store owners running version 4 and below?

We’ve got the static “x-page” which should not get indexed because it redirects, and which cannot be changed to serve the contents itself. The frameset cannot be indexed because it doesn’t exist for robots, and even if a crawler could eat it, we don’t consider it easy to digest spider fodder.

Let’s look at Google’s guidelines, which are the strictest around, thus applicable for other engines as well:

  1. Don’t […] present different content to search engines than you display to users, which is commonly referred to as “cloaking.”
  2. Don’t employ cloaking or sneaky redirects.

If we find a way to suppress the JavaScript code on the “x-page” when a crawler requests it, the now more sophisticated crawlers will handle the “x-page” like their predecessors, that is they would fetch the “x-pages” and hand them over to the indexer without vicious remarks. Serving identical content under different URLs to users and crawlers does not contradict the first prescript. And we’d comply to the second rule, because loading a frameset for human vistors but not for crawlers is definitely not sneaky.

Ok, now how to tell the static page that it has to behave dynamically, that is outputting different contents server sided depending on the user agent’s name? Well, Erol’s desktop software which generates the HTML can easily insert PHP tags too. The browser would not render those on a local machine, but who cares when it works after the upload on the server. Here’s the procedure for Apache servers:

In the root’s .htaccess file we enable PHP parsing of .html files:
AddType application/x-httpd-php .html

Next we create a PHP include file xinc.php which prevents crawlers from reading the offending JavaScript code:
<?php
$crawlerUAs = array(”Googlebot”, “Slurp”, “MSNbot”, “teoma”, “Scooter”, “Mercator”, “FAST”);
$isSpider = FALSE;
$userAgent = getenv(”HTTP_USER_AGENT”);
foreach ($crawlerUAs as $crawlerUA) {
if (stristr($userAgent, $crawlerUA)) $isSpider = TRUE;
}
if (!$isSpider) {
print “<script type=\”text/javascript\”> [a whole bunch of JS code] </script>\n”;
}
if ($isSpider) {
print “<!– Dear search engine staff: we’ve suppressed the JavaScript code redirecting browsers to “erol.html”, that’s a frameset serving this page’s contents more pleasant for human eyes. –>\n”;
}
?>

Erol’s HTML generator now puts <?php @include(”x.php”); ?> instead of a whole bunch of JavaScript code.

The implementation for other environments is quite similar. If PHP is not available we can do it with SSI and PERL. On Windows we can tell IIS to process all .html extensions as ASP (App Mappings) and use an ASP include. That would give three versions of that patch which should help 99% of all Erol customers until they can upgrade to version 5.

This solution comes with two disadvantages. First, the cached page copies, clickable from the SERPs and toolbars, would render pretty ugly because they lack the JavaScript code. Second, perhaps automated tools searching for deceitful cloaking might red-flag the URLs for a human review. Hopefully the search engine executioner reading the comment in the source code will be fine with it and give it a go. If not, there’s still the reinclusion request. I think store owners can live with that when they get their Google traffic back.

Rolling out the patch:

Erol thinks the above said makes sense and there is a chance of implementing it soon. While the developers are at work, please provide feedback if you think we didn’t interpret Google’s Webmaster Guidelines strict enough. Keep in mind that this is an interim solution and that the new version will handle things more standardized. Thanks.

Paid-Links-Disclosure: I do this pro bono job for the sake of the suffering store owners. Hence the links pointing to Erol and Erol’s customers are not nofollow’ed. Not that I’d nofollow them otherwise ;)



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

AdSense asks me "Are You Gay?", but why?

John emailed me this screenshot:
Are you gay?

I wondered why the heck AdSense considers a post on Google’s new URL removal tool gay in nature. Warning! These links (Google search results) are not safe at work:

0.08% gay: “fell in love
0.11% gay: “neat
0.05% gay: “terminator
0.02% gay: “competitors
0.04% gay: “user
14.7% gay: “friendly
56.4% gay: “tool

Who can help me to figure out the remaining 28.6% gayness? I mean when putting two ads asking “Are you gay” above and below the post, Ol’ AdSense should be at least 100.0% certain that this question will not offend me. Actually, I’m not offended. I’m just curious to learn more about a possible coming out.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

More anchor text analysis from Webmaster Central

If you didn’t spot my update posted a few hours ago, log in to Webmaster Central and view your anchor text stats. Find way more phrases and play with the variations, these should allow you to track down sources by quoted search queries. Also, the word-stats are back.
Have fun!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

« Previous Page  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28  Next Page »