<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.2.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Sebastian's Pamphlets &#187; Search Quality</title>
	<link>http://sebastians-pamphlets.com</link>
	<description>If you've read my articles somewhere on the Internet, expect something different here.</description>
	<pubDate>Wed, 11 Aug 2010 18:57:05 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.3</generator>
	<language>en</language>
			<item>
		<title>OMFG - Google sends porn punters to my website &#8230;</title>
		<link>http://sebastians-pamphlets.com/make-risk-free-beer-money-from-porn-traffic/</link>
		<comments>http://sebastians-pamphlets.com/make-risk-free-beer-money-from-porn-traffic/#comments</comments>
		<pubDate>Wed, 11 Aug 2010 18:11:48 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Internet Marketing]]></category>

		<category><![CDATA[Redirects]]></category>

		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/make-risk-free-beer-money-from-porn-traffic/</guid>
		<description><![CDATA[
In todays GWC doctor&#8217;s office, the webmaster of an innocent orphanage website asks Google&#8217;s Matt Cutts:
[My site] is showing up for searches on &#8216;girls in bathrooms&#8217; because they have an article about renovating the girls bathroom! What do you think of the idea if a negative keyword meta tag to block irrelevant searches? [sic!]
Well, we [...]]]></description>
			<content:encoded><![CDATA[
<p>In todays GWC doctor&#8217;s office, the webmaster of an innocent orphanage website asks Google&#8217;s Matt Cutts:</p>
<blockquote><p>[My site] is showing up for searches on &#8216;girls in bathrooms&#8217; because they have an article about renovating the girls bathroom! What do you think of the idea if a negative keyword meta tag to block irrelevant searches? [sic!]</p></blockquote>
<p><b>Well, we don&#8217;t know what the friendly guy from Google recommends &#8230;</b></p>
<p><object style="display:inline; height: 100px; width: 133px;" onmouseover="this.height='344px'; this.width='425';" onmouseout="this.height='100px;'; this.width='133px;';">
<param name="movie" value="http://www.youtube.com/v/mvYZa3NZ1HE">
<param name="allowFullScreen" value="true">
<param name="allowScriptAccess" value="always"><embed src="http://www.youtube.com/v/mvYZa3NZ1HE" type="application/x-shockwave-flash" allowfullscreen="true" allowScriptAccess="always" ></object><img src="http://sebastians-pamphlets.com/img/posts/omfg-women.png" style="margin-left:50px; margin-bottom:25px;" /></p>
<p><b>&#8230; but my dear readers do know that my bullshit detector, faced with such a moronic idea, shouts out in agony:</b></p>
<h3>There&#8217;s no such thing as bad traffic, just weak monetizing!</h3>
<p>Ok, Ok, Ok &#8230; every now and then each and every webmaster out there suffers from misleaded search engine ranking algos, that send shitloads of totally unrelated search traffic. For example, when you search for [<a href="http://google.com/search?q=how+to+fuck+a+click&#038;safe=off">how to fuck a click</a>], you won&#8217;t expect that Google considers <a href="http://sebastians-pamphlets.com/how-to-turn-click-tracking-into-miserable-failure/">this geeky pamphlet</a> the very best search result. Of course Google should&#8217;ve detected your <a href="http://google.com/search?q=how+to+fuck+a+chick&#038;safe=off">NSFW-typo</a>. Shit happens. Deal with it.</p>
<p>On the other hand, search traffic is free, so there&#8217;s no valid reason to complain. Instead of asking Google for a minus-keyword REP directive, one should think of clever ways to monetize unrelated traffic without wasting bandwidth.</p>
<p>You want to monetize irrelevant traffic from searches for smut in a way that nobody can associate your site with porn. That&#8217;s doable. Here&#8217;s how it works:</p>
<h3>Make risk-free beer money from porn traffic with a non-adult site</h3>
<p>Copy those slimy phrases from your keyword stats and paste them into Google&#8217;s search box. Once you find an adult site that seems to match the smut surfer&#8217;s needs better than your site, click on the search result, and on the landing page search for a &#8220;webmasters&#8221; link that points to their affiliate program. Sign up and save your customized affiliate link.</p>
<p>Next add some PHP code to your scripts. Make absolutely sure it gets executed before you output any other content, even whitespace:</p>
<p><code>&lt;?php </code> &nbsp;<a onclick="showContent('code_getOffsiteUri');">Show all code</a></p>
<p id="code_getOffsiteUri" style="display:none;"><code>function getReferrer () {<br />
    return $_SERVER["HTTP_REFERER"];<br />
}<br />
function getOffsiteUri() {<br />
    $searchQuery = stristr(getReferrer(), "q=");<br />
    $trash = stristr($searchQuery, "&#038;");<br />
    $searchQuery = str_replace($trash, "", $searchQuery);<br />
    $searchQuery = str_replace("+", " ", $searchQuery);<br />
    $searchQuery = str_replace("&#038;", " ", $searchQuery);<br />
    $searchQuery = str_replace("%20", " ", $searchQuery);<br />
    while (stristr($searchQuery, "  ")) {<br />
        $searchQuery = str_replace("  ", " ", $searchQuery);<br />
    }<br />
    // map irrelevant search queries to sponsor URIs<br />
    if (stristr($searchQuery, "teens in bathroom")) {<br />
        return "http://someteenpornsite.com/landingpage?affID=4711";<br />
    }<br />
}</code></p>
<p><code>$betterMatch = getOffsiteUri();<br />
if ($betterMatch) {<br />
   header("HTTP/1.1 307 Here's your smut", TRUE, 307);<br />
   header("Location: $betterMatch");<br />
   exit;<br />
}<br />
?&gt;</code> Refine the simplified code above. Use a database table to store the mappings &#8230;</p>
<p>Now a surfer coming from a SERP like <code><br />http://google.com/search?num=100&#038;q=nude+teens+in+bathroom&#038;safe=off</code> <br />will get redirected to <code><br />http://someteenpornsite.com/landingpage?affID=4711</code><br /> You&#8217;re using a 307 redirect because it&#8217;s not cached by a user agent, so that when you later on find a porn site that converts your traffic better, you can redirect visitors to another URI.</p>
<p>As you probably know, search engines don&#8217;t approve duplicate content. Hence it wouldn&#8217;t be a bright idea to put up x-rated stuff (all smut is duplicate content by design) onto your site to fulfil the misleaded searcher&#8217;s needs.</p>
<p>Of course you can use the technique outlined above to protect searchers from landing on your contact/privacy page, too, when in fact your signup page is their desired destination.</p>
<h3>Shiny whitehat disclaimer</h3>
<p>If you&#8217;re afraid of the possibility that the allmighty Google might punish you for your well meant attempt to fix it&#8217;s bugs, relax.</p>
<p>A search engine misinterpreting your content so badly, failed miserably. Your bugfix actually improves their search quality. Search engines can&#8217;t force you to report such flaws, they just kindly ask for voluntary feedback.</p>
<p>If search engines dislike smart websites that find related content on the Interwebs in case the search engine delivers shitty search results, they can act themselves. Instead of penalizing webmasters that react to flaws in their algos, they&#8217;re well advised to adjust their scoring. I mean, if they stop sending smut traffic to non-porn sites, their users don&#8217;t get redirected any longer. It&#8217;s that simple.</p>
<hr />Copyright &copy; 2010 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/make-risk-free-beer-money-from-porn-traffic/", "style": "big", "title": "OMFG - Google sends porn punters to my website ..." } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/make-risk-free-beer-money-from-porn-traffic/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Cloaking is good for you. Just ignore Bing&#8217;s/Google&#8217;s guidelines.</title>
		<link>http://sebastians-pamphlets.com/cloaking-is-good-for-your-vistors/</link>
		<comments>http://sebastians-pamphlets.com/cloaking-is-good-for-your-vistors/#comments</comments>
		<pubDate>Mon, 05 Jul 2010 18:24:08 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Web development]]></category>

		<category><![CDATA[Usability]]></category>

		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Webspam]]></category>

		<category><![CDATA[SEO]]></category>

		<category><![CDATA[Cloaking]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/cloaking-is-good-for-your-vistors/</guid>
		<description><![CDATA[
Summary first: If you feel the need to cloak, just do it within reason. Don&#8217;t cloak because you can, but because it&#8217;s technically the most elegant procedure to accomplish a Web development task. Bing and Google can&#8217;t detect your (in no way deceptive) intend algorithmically. Don&#8217;t spam away, though, because you might leave trails besides [...]]]></description>
			<content:encoded><![CDATA[
<p>Summary first: If you feel the need to cloak, just do it within reason. Don&#8217;t cloak because you can, but because it&#8217;s technically the most elegant procedure to accomplish a Web development task. Bing and Google can&#8217;t detect your (in no way deceptive) intend algorithmically. Don&#8217;t spam away, though, because you might leave trails besides cloaking alone, if you aren&#8217;t good enough at spamming search engines. Keep your users interests in mind. Don&#8217;t comply to search engine guidelines as set in stone, but to a reasonable level, for example when those <a href="http://www.youtube.com/watch?v=XWfqyy7J34s">force you to comply to Web standards</a> that make more sense than the fancy idea you&#8217;ve developed on internationalization, based on detecting browser language settings or so.</p>
<p><img src="http://sebastians-pamphlets.com/img/posts/penalizing-cloaking-is--bullshit.png" width="250" height="376" align="right" alt="search engine guidelines are bullshit WRT cloaking" title="Search engines must not penalize cloaking" style="margin-left:5px;" />This pamphlet is an opinion piece. The above said should be considered best practice, even by search engines. Of course it&#8217;s not, because search engines can and do fail, just like a webmaster who takes my statement &#8220;go cloak away if it makes sense&#8221; as technical advice and gets his search engine visibility tanked the hard way.</p>
<h3>WTF is cloaking?</h3>
<p>Cloaking, also known as IP delivery, means delivering content tailored for specific users who are identified primarily by their IP addresses, but also by user agent (browser, crawler, screen reader&#8230;) names, and whatnot. Here&#8217;s a simple demonstration of this technique. The content of the next paragraph differs depending on the user requesting this page. Googlebot, Googlers, as well as Matt Cutts at work, will read a personalized message:</p>
<p><em>Dear visitor, thanks for your visit from 38.107.191.89 (38.107.191.89).</em></p>
<p>You surely can imagine that cloaking opens <del>a can of worms</del> <ins>lots of opportunities to enhance a user&#8217;s surfing experience</ins>, besides &#8220;stalking&#8221; particular users like Google&#8217;s head of WebSpam.</p>
<h3>Why do search engines dislike cloaking?</h3>
<p>Apparently they don&#8217;t. They use IP delivery themselves. When you&#8217;re traveling in europe, you&#8217;ll get hints like &#8220;go to Google.fr&#8221; or &#8220;go to Google.at&#8221; all the time. That&#8217;s google.com checking where you are, trying to lure you into their regional services.</p>
<p>More seriously, there&#8217;s a so-called &#8220;dark side of cloaking&#8221;. Say you&#8217;re a <a href="http://fantomaster.com/fantomNews/archives/2010/07/08/fantomas-shadowmaker/">seasoned Internet marketer</a>, then you could show Googlebot an educational page with compelling content under an URI like &#8220;/games/poker&#8221; with an X-Robots-Tag HTTP header telling &#8220;noarchive&#8221;, whilst surfers (search engine users) supplying an HTTP_REFERER and not coming from employee.google.com get redirected to poker dot com (simplified example).</p>
<p>That&#8217;s hard to detect for Google&#8217;s WebSpam team. Because they don&#8217;t do evil themselves, they can&#8217;t officially operate sneaky bots that use for example AOL as their ISP to compare your spider fodder to pages/redirects served to actual users.</p>
<p>Bing sends out spam bots that request your pages &#8220;as a surfer&#8221; in order to discover deceptive cloaking. Of course those bots can be identified, so professional spammers serve them their spider fodder. Besides burning the bandwidth of non-cloaking sites, Bing doesn&#8217;t accomplish anything useful in terms of search quality.</p>
<p>Because search engines can&#8217;t detect cloaking properly, not to speak of a cloaking webmaster&#8217;s intentions, they&#8217;ve launched webmaster guidelines (FUD) that forbid cloaking at all. All Google/Bing reps tell you that cloaking is an evil black hat tactic that will get your site penalized or even banned. By the way, the same goes for perfectly legit &#8220;hidden content&#8221; that&#8217;s invisible on page load, but viewable after a mouse click on a &#8220;learn more&#8221; widget/link or so.</p>
<h3>Bullshit.</h3>
<p>If your competitor makes creative use of IP delivery to enhance their visitors&#8217; surfing experience, you can file a spam report for cloaking and Google/Bing will ban the site eventually. Just because cloaking <em>can</em> be used with deceptive intent. And yes, it works this way. See below.</p>
<p>Actually, those spam reports trigger a review by a human, so maybe your competitor gets away with it. But search engines also use spam reports to develop spam filters that penalize crawled pages totally automatted. Such filters can fail, and &#8211;trust me&#8211; they do fail often. Once you must optimize your content delivery for particular users or user groups yourself, such a filter could tank your very own stuff by accident. So don&#8217;t snitch on your competitors, because tomorrow they&#8217;ll return the favor.</p>
<h3>Enforcing a &#8220;do not cloak&#8221; policy is evil</h3>
<p>At least Google&#8217;s WebSpam team comes with cojones. They&#8217;ve even <a href="http://searchengineland.com/google-adwords-help-cloaks-to-google-gets-banned-45541">banned their very own help pages</a> for &#8220;<a href="http://google.com/search?hl=en&#038;q=matt+cutts+cloaking&#038;num=13&#038;safe=off">cloaking</a>&#8220;, although those didn&#8217;t serve porn to minors searching for SpongeBob images with safe-search=on.</p>
<p>That&#8217;s overdrawn, because the help files of any Google product aren&#8217;t usable without a search facility. When I click &#8220;help&#8221; in any Google service like AdWords, I get either blank pages, and/or links within the help system are broken because the destination pages were deindexed for cloaking. Plain evil, and counter productive.</p>
<p>Just because Google&#8217;s help software doesn&#8217;t show ads and related links to Googlebot, those pages aren&#8217;t guilty of deceptive cloaking. Ms Googlebot won&#8217;t pull the plastic, so it makes no sense to serve her advertisements. Related links are context sensitive just like ads, so it makes no sense to persist them in Google&#8217;s crawling cache, or even in Google&#8217;s search index. Also, as a user I really don&#8217;t care whether Google has crawled the same heading I see on a help page or not, as long as I get directed to relevant content, that is a paragraph or more that answers my question.</p>
<p>When a search engine doesn&#8217;t deliver the very best search results intentionally, just because those pages violate an outdated and utterly useless policy that rules fraudulent tactics in a shape lastly used in the last century and doesn&#8217;t take into account how the Internet works today, I&#8217;m pissed.</p>
<p>Maybe that&#8217;s not bad at all when applied to Google products? Bullshit, again. The same happens to any other website that doesn&#8217;t fit Google&#8217;s weird idea of &#8220;serving the same content to users and crawlers&#8221;. I mean, as long as Google&#8217;s crawlers come from US IPs only, how can a US based webmaster serve the same content in German language to a user coming from Austria and Googlebot, both requesting a URI like &#8220;/shipping-costs?lang=de&#8221; that has to be different for each user because shipping a parcel to Germany costs $30.00 and a parcel of the same weight shipped to Vienna costs $40.00? Don&#8217;t tell me bothering a user with shipping fees for all regions in CH/AT/DE all on one page is a good idea, when I can reduce the information overflow to a tailored info of just one shipping fee that my user expects to see, followed by a link to a page that lists shipping costs for all european countries, or all countries where at least some folks might speak/understand German.</p>
<p>Back to Google&#8217;s ban of its very own help pages that hid AdSense code from Googlebot. Of course Google wants to see what surfers see in order to deliver relevant search results, and that might include advertisements. However, surrounding ads don&#8217;t necessarily obfuscate the page&#8217;s content. Ads served instead of content do. So when Google wants to detect ad laden thin pages, they need to become smarter. Penalizing pages that don&#8217;t show ads to search engine crawlers is a bad idea for a search engine, because not showing ads to crawlers is a good idea, not only bandwidth-wise, for a webmaster.</p>
<p>Managing this dichotomy is the search engine&#8217;s job. They shouldn&#8217;t expect webmasters to help them solving their very own problems (maintaining search quality). In fact, bothering webmasters with policies solely put because search engine algos are fallible and incapable is plain evil. The same applies to instruments like rel-nofollow (launched to help Google devaluing spammy links but backfiring enormously) or Google&#8217;s war on paid links (as if not each and every link on the whole Internet is paid/bartered for, somehow).</p>
<p>What do you think, should search engines ditch their way too restrictive &#8220;don&#8217;t cloak&#8221; policies? <a href="http://twitter.com/home?status=Hey+@Google+@Bing,+go+ditch+your+outdated+webmaster+guidelines!+http%3A%2F%2Fsebastians-pamphlets.com/cloaking-is-good-for-your-vistors/" target="twitter" title="Stop search engines that tyrannize webmasters!"><b>Click to vote:</b> <img src="http://sebastians-pamphlets.com/img/twitter-icon.gif" width="10" height="10" style="border:none;" alt="Stop search engines that tyrannize webmasters!"  /></a></p>
<p> </p>
<p><b>Update 2010-07-06:</b> Don&#8217;t miss out on Danny Sullivan&#8217;s &#8220;<strong>Google be fair!</strong>&#8221; appeal, posted today: <a href="http://searchengineland.com/why-google-should-ban-its-own-help-pages-45781">Why Google Should Ban Its Own Help Pages — But Also Shouldn’t</a></p>
<p> <!-- Processed by EzStatic --></p>
<hr />Copyright &copy; 2010 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/cloaking-is-good-for-your-vistors/", "style": "big", "title": "Cloaking is good for you. Just ignore Bing's/Google's guidelines." } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/cloaking-is-good-for-your-vistors/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Ditch the spam on SERPs, pretty please?</title>
		<link>http://sebastians-pamphlets.com/i-want-clean-serps/</link>
		<comments>http://sebastians-pamphlets.com/i-want-clean-serps/#comments</comments>
		<pubDate>Thu, 17 Jun 2010 21:15:06 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Webspam]]></category>

		<category><![CDATA[Spam Report]]></category>

		<category><![CDATA[Crap]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/i-want-clean-serps/</guid>
		<description><![CDATA[
Say there&#8217;s a search engine that tries very hard to serve relevant results for long tail search queries. Maybe it even accepted that an algo change &#8211;supposed to wipe out shitloads of thin pages from its long tail search result pages (SERPs)&#8211; is referred to as #MayDay. One should think that this search engine isn&#8217;t [...]]]></description>
			<content:encoded><![CDATA[
<p>Say there&#8217;s a search engine that tries very hard to serve relevant results for long tail search queries. Maybe it even accepted that an algo change &#8211;supposed to wipe out shitloads of thin pages from its long tail search result pages (SERPs)&#8211; is referred to as #MayDay. One should think that this search engine isn&#8217;t exactly eager to annoy its users with crappy mash-up pages consisting of shabby stuff scraped from all known sources of duplicate content on the whole InterWebs.</p>
<p>Wrong.</p>
<p>Prominent SE spammers like <a href="http://mahalo.com" rel="nofollow" onclick="alert('You didn\'t really believe that I let you go to view spam, right?'); return false;">Mahalo</a> still flood the visible part of search indexes with boatloads of crap that never should be able to cheat its way onto any SERP, not even via a [site:spam.com] search. Learn more from <a href="http://www.seobook.com/black-hat-seo-case-study">Aaron</a> and <a href="http://smackdown.blogsblogsblogs.com/category/scams/">Michael</a>, who&#8217;ve both invested their valuable time to craft out <a href="http://smackdown.blogsblogsblogs.com/2010/06/17/need-help-understanding-the-latest-mahalo-spam/">detailled spam reports</a>, <a href="http://twitter.com/mattcutts/status/16420030375">to no avail</a>. </p>
<p>Frustrating. </p>
<p>Wait. Why does a bunch of spammy Web pages creates such a fuss? Because they&#8217;re findable in the search index. Of course a search engine must crawl all the WebSpam out there, and its indexer has to judge the value of all the content it gets feeded with. But there&#8217;s absolutely no need to bother the query engine, that gathers and ranks the stuff presented on the SERPs, with crap like that.</p>
<p>Dear Google, why do you annoy your users with spam created by &#8220;a scheme that your automated system handles quite well&#8221; at all? Those awesome spam filters should just flag crappy pages as not-SERP-worthy, so that they can never see the daylight at google.com/search. I mean, why should any searcher be at risk of pulling useless search results from your index? Hopefully not because these misleaded searchers tend to click on lots of Google ads on said pages, right?</p>
<p>I&#8217;d rather enjoy an empty SERP for an exotic search query, than suffer from a single link to a useless page plastered with huge ads, even if it comes with a tiny portion of stolen content that might be helpful if pointing to the source.</p>
<p>Do you feel like me? Speak out!</p>
<p><a href="http://twitter.com/home?status=Hey+@GoogleWebspam,+I+dislike+Mahalo+spam+on+your+SERPs!+http%3A%2F%2Fsebastians-pamphlets.com/i-want-clean-serps/+%23spam-report" target="twitter" title="Tweet Your Spam Report!"><strong>Hey Google, I dislike spam on your SERPs! #spam-report</strong> <img src="http://sebastians-pamphlets.com/img/twitter-icon.gif" width="10" height="10" style="border:none;" alt="Tweet Your Plea For Clean SERPs!"  /></a></p>
<hr />Copyright &copy; 2010 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/i-want-clean-serps/", "style": "big", "title": "Ditch the spam on SERPs, pretty please?" } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/i-want-clean-serps/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Google went belly-up: SERPs sneakily redirect to FPAs</title>
		<link>http://sebastians-pamphlets.com/google-serps-sneakily-redirect-to-ads/</link>
		<comments>http://sebastians-pamphlets.com/google-serps-sneakily-redirect-to-ads/#comments</comments>
		<pubDate>Wed, 12 May 2010 17:06:19 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Redirects]]></category>

		<category><![CDATA[Webspam]]></category>

		<category><![CDATA[Spam Report]]></category>

		<category><![CDATA[Cloaking]]></category>

		<category><![CDATA[Crap]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/google-serps-sneakily-redirect-to-ads/</guid>
		<description><![CDATA[
I&#8217;m pissed. I do know I shouldn&#8217;t blog in rage, but Google redirecting search engine result pages to totally useless InternetExplorer ads just fires up my ranting machine.
What does the almighty Google say about URIs that should deliver useful content to searchers, but sneakily redirect to full page ads? Here you go. Google&#8217;s webmaster guidelines [...]]]></description>
			<content:encoded><![CDATA[
<p>I&#8217;m pissed. I do know I shouldn&#8217;t blog in rage, but Google redirecting search engine result pages to totally useless InternetExplorer ads just fires up my ranting machine.</p>
<p>What does the almighty Google say about URIs that should deliver useful content to searchers, but sneakily redirect to full page ads? <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=35769">Here you go</a>. Google&#8217;s webmaster guidelines explicitely forbid such black hat tactics: </p>
<p>&#8220;<strong>Don&#8217;t use cloaking or sneaky redirects.</strong>&#8221; Google just did the latter with its very own <a href="http://google.com/ie?q=buy+viagra+online">SERPs</a>. The search interface <a href="http://sebastians-pamphlets.com/google-serps-sneakily-redirect-to-ads/#goog-ie-ui">google.com/ie</a>, out in the wild for nearly a decade, redirects to a piece of sidebar HTML offering a download of IE8 optimized for Google. That&#8217;s a helpful redirect for some IE6 users who don&#8217;t suffer from an IT department stuck with this outdated browser, but it&#8217;s plain misleading in the eyes of all those searchers who appreciated this clean and totally uncluttered search interface. Interestingly, <abbr title="User Agent">UA</abbr> cloaking is the only way to heal this sneaky behavior.</p>
<p>&#8220;<strong>Don&#8217;t create pages with malicious behavior.</strong>&#8221; Google&#8217;s guilty, too. Instead of checking for the user&#8217;s browser, redirecting only IE6 requests from <a href="http://www.google.com/search?output=ie&#038;num=100&#038;hl=en&#038;safe=off&#038;q=google+discontinues+IE6+support">Google&#8217;s discontiued IE6 support</a> (IE6 toolbar &#8230;) to the IE8 advertisement, whilst all other user agents get their desired search box, respectively their SERPs, under a google.com/search?output=ie&amp;&#8230; URI, Google performs an unconditional redirect to a page that&#8217;s utterly useless and also totally unexpected for many searchers. I consider misleading redirects malicious.</p>
<p>&#8220;<strong>Avoid links to web spammers or &#8216;bad neighborhoods&#8217; on the web.</strong>&#8221; I consider the propaganda for IE that Google displays instead of the search results I&#8217;d expect a bad neighborhood on the Web, because IE constantly ignores Web standards, forcing developers and designers to implement superfluous work arounds. (Ok, ok, ok &#8230; Google&#8217;s lack of geekiness doesn&#8217;t exactly count as violation of their webmaster guidelines, but it sounds good, doesn&#8217;t it?)</p>
<p><a href="http://twitter.com/home?status=Hey+@MattCutts,+about+time+to+ban+google.com/ie?q=spam!+http%3A%2F%2Fsebastians-pamphlets.com/google-serps-sneakily-redirect-to-ads/" target="twitter" title="Tweet That!"><strong>Hey Matt Cutts, about time to ban google.com/ie!</strong> <img src="http://sebastians-pamphlets.com/img/twitter-icon.gif" width="10" height="10" style="border:none;" alt="Click to tweet that"  /></a></p>
<p id="goog-ie-ui"><a href="http://sebastians-pamphlets.com/rediscover-googles-free-ranking-checker/">Google&#8217;s very best search interface</a> is history. Here is what you got under<code><br />
<b>http://www.google.com/ie?num=100&#038;hl=en&#038;safe=off&#038;q=minimalistic</b></code>:</p>
<p><img  src="http://sebastians-pamphlets.com/img/posts/google-awesome-ie-serp.png" width="448" height="503" style="text-align:center; display:block;" align="middle" alt="Google's famous minimalistic search UI" title="Google's famous minimalistic search UI" /></p>
<p>And here is where Google sneakily redirects you to when you load the SERP link above (even with Chrome!):<code><br />
<b>http://www.google.com/toolbar/ie8/sidebar.html</b></code>:</p>
<p><img  src="http://sebastians-pamphlets.com/img/posts/google-fpa-ie8.png" width="268" height="569" style="border:dotted red 1px; text-align:center; display:block;" align="middle" alt="Google's sneaky IE8 propaganda" title="Google's sneaky IE8 propaganda" /></p>
<p id="goog-ie-spam-report">It&#8217;s sad that a browser vendor like Google (and yes, Google Chrome <b>is</b> my favorite browser) feels the need to mislead its users with propaganda for a competiting browser that&#8217;s slower and doesn&#8217;t render everything as it should render it. But when this particular browser vendor also leads Web search, and makes use of black hat techniques that it bans webmasters for, then that&#8217;s a scandal. So, if you agree, please submit a spam report to Google:</p>
<p><a href="http://twitter.com/home?status=Hey+@MattCutts,+about+time+to+ban+google.com/ie! %23spam-report+http%3A%2F%2Fsebastians-pamphlets.com/google-serps-sneakily-redirect-to-ads/" target="twitter" title="Tweet Your Spam Report!"><strong>Hey Matt Cutts, about time to ban google.com/ie! #spam-report</strong> <img src="http://sebastians-pamphlets.com/img/twitter-icon.gif" width="10" height="10" style="border:none;" alt="Tweet Your Spam Report"  /></a></p>
<p>2010-05-17 I&#8217;ve updated this pamphlet because it didn&#8217;t explain the &#8220;sneakiness&#8221; clear enough. As of today, the unconditional redirect is still sneaky IMHO. Google needs to deliver searchers their desired search results, and only stubborn IE6 users ads for a somewhat better browser.</p>
<p>2010-05-18 <b>Q:</b> You&#8217;re pissed solely because your SERP scraping scrips broke. <b>A:</b> Glad you&#8217;ve asked. Yes, I&#8217;ve <a href="http://www.scroogle.org/cgi-bin/scraper.htm" rel="crap nofollow">scraped Google&#8217;s /ie search</a> too. Not because I&#8217;m a <a href="http://www.google-watch.org/" rel="crap nofollow">privacy nazi</a> like Daniel Brandt. I&#8217;ve just checked (my) rankings. However, when I spotted the redirects I didn&#8217;t even remember the location of the scripts that scraped this service, because I didn&#8217;t look at ranking reports for years. I&#8217;m interested in actual traffic, and revenues. Ego food annoys me. I just love the /ie search interface. So the answer is a bold &#8220;no&#8221;. I don&#8217;t give a fucking dead rat&#8217;s ass what ranking reports based on scraped SERPs could tell.</p>
<hr />Copyright &copy; 2010 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/google-serps-sneakily-redirect-to-ads/", "style": "big", "title": "Google went belly-up: SERPs sneakily redirect to FPAs" } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/google-serps-sneakily-redirect-to-ads/feed/</wfw:commentRss>
		</item>
		<item>
		<title>The anatomy of a deceptive Tweet spamming Google Real-Time Search</title>
		<link>http://sebastians-pamphlets.com/how-to-spam-google-real-time-search-via-twitter/</link>
		<comments>http://sebastians-pamphlets.com/how-to-spam-google-real-time-search-via-twitter/#comments</comments>
		<pubDate>Thu, 10 Dec 2009 10:12:44 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Webspam]]></category>

		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Redirects]]></category>

		<category><![CDATA[Internet Marketing]]></category>

		<category><![CDATA[Spam]]></category>

		<category><![CDATA[Twitter]]></category>

		<category><![CDATA[SEO]]></category>

		<category><![CDATA[Cloaking]]></category>

		<category><![CDATA[Crap]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/how-to-spam-google-real-time-search-via-twitter/</guid>
		<description><![CDATA[
Minutes after the launch of Google&#8217;s famous Real Time Search, the Internet marketing community began to spam the scrolling SERPs. Google gave birth to a new spam industry.
I&#8217;m sure Google&#8217;s WebSpam team will pull the plug sooner or later, but as of today Google&#8217;s real time search results are extremely vulnerable to questionable content.
The somewhat [...]]]></description>
			<content:encoded><![CDATA[
<p><img  src="http://sebastians-pamphlets.com/img/posts/spamming-google-real-time-search.png" width="250" height="345" align="right" style="margin-left:5px;" alt="Google real time search spammed and abused" title=""  />Minutes after the <a href="http://googleblog.blogspot.com/2009/12/relevance-meets-real-time-web.html?utm_source=sebastian&#038;utm_medium=pamphlet&#038;utm_campaign=thou+shalt+not+fuck+with+my+uris">launch</a> of Google&#8217;s <a href="http://searchengineland.com/search-real-time-madness-31668">famous</a> Real Time Search, the Internet marketing community <a href="http://sphinn.com/story/135685">began</a> to <a href="http://outspokenmedia.com/seo/google-real-time-spam/">spam</a> the <a href="http://www.google.com/search?hl=en&#038;safe=off&#038;esrch=RTSearch&#038;tbo=1&#038;num=100&#038;q=spam&#038;tbs=rltm:1">scrolling SERPs</a>. Google gave birth to a <a href="http://www.seo-theory.com/2009/12/07/google-launches-a-new-spam-industry/">new spam industry</a>.</p>
<p>I&#8217;m sure Google&#8217;s <a href="http://friendfeed.com/dannysullivan/d973e438/real-time-spam-google-says-been-fighting-so-long">WebSpam</a> team will pull the plug sooner or later, but as of today Google&#8217;s real time search results are extremely vulnerable to questionable content.</p>
<p>The somewhat shady approach to make creative use of real time search I&#8217;m outlining below will not work forever. It can be used for really evil purposes,  and Google is aware of the problem. Frankly, if I&#8217;d be the Googler in charge, I&#8217;d dump the whole real-time thingy until the spam defense lines are rock solid.</p>
<p id="rtss-recipe"><strong>Here&#8217;s the recipe from Dr Evil&#8217;s WebSpam-Cook-Book:</strong></p>
<h3 id="rtss-ingredients">Ingredients</h3>
<ul>
<li>1 <a href="http://www.google.com/trends?q=spam+google">popular topic</a> that pulls lots of searches, but not so many that the results scroll down too fast.</li>
<li>1 <a href="http://www.google.com/products?q=spam+google&#038;hl=en&#038;aq=f">landing page</a> that makes the punter pull out the plastic in no time.</li>
<li>1 <a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&#038;answer=93713">trusted authority page</a> totally lacking commercial intentions. View its source code, it must have a valid TITLE element with an appealing call for action related to your topic in its HEAD section.</li>
<li>1 <a href="http://goo.gl/">short</a> domain, 1 cheap Web hosting plan (Apache, PHP), 1 plain text editor, 1 FTP client, 1 Twitter account, and a prize basic coding skills.</li>
</ul>
<h3 id="rtss-preparation">Preparation</h3>
<p>Create a new text file and name it <code>hot-topic.php</code> or so. Then code:<code><br />
&lt;?php<br />
$landingPageUri = "http://affiliate-program.com/?your-aff-id";<br />
$trustedPageUri = "http://google.com/something.py";<br />
if (stristr($_SERVER["HTTP_USER_AGENT"], "Googlebot")) {<br />
   header("HTTP/1.1 307 Here you go today", TRUE, 307);<br />
   header("Location: $trustedPageUri");<br />
}<br />
else {<br />
   header("HTTP/1.1 301 Happy shopping", TRUE, 301);<br />
   header("Location: $landingPageUri");<br />
}<br />
exit;<br />
?&gt;</code></p>
<p>Provided you&#8217;re a savvy spammer, your crawler detection routine will be a little more <a href="http://fantomaster.com/fasvsspy01.html">complex</a>.</p>
<p>Save the file and upload it, then test the URI <code>http://youspamaw.ay/hot-topic.php</code> in your browser.</p>
<h3 id="rtss-serving">Serving</h3>
<ul>
<li>Login to Twitter and submit lots of nicely crafted, not too much keyword stuffed messages carrying your spammy URI. Do not use obscene language, e.g. don&#8217;t swear, and sail around phrases like &#8216;buy cheap viagra&#8217; with synonyms like &#8216;brighten up your girl friend&#8217;s romantic moments&#8217;.</li>
<li>On their SERPs, Google will display the text from the trusted page&#8217;s TITLE element, linked to your URI that leads punters to a sales pitch of your choice.</li>
<li>Just for entertainment, closely monitor Google&#8217;s real time SERPs, and your real-time sales stats as well.</li>
<li>Be happy and get rich by end of the week.</li>
</ul>
<p>Google removes links to untrusted destinations, that&#8217;s why you need to abuse authority pages. As long as you don&#8217;t launch f-bombs, Google&#8217;s profanity filters make flooding their real time SERPs with all sorts of crap a breeze.</p>
<p>Hey <a href="http://twitter.com/GoogleWebspam">Google</a>, for the sake of our children, take that as a spam report!</p>
<hr />Copyright &copy; 2010 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/how-to-spam-google-real-time-search-via-twitter/", "style": "big", "title": "The anatomy of a deceptive Tweet spamming Google Real-Time Search" } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/how-to-spam-google-real-time-search-via-twitter/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Hard facts about URI spam</title>
		<link>http://sebastians-pamphlets.com/troubles-made-by-utm-variables-from-google-analytics/</link>
		<comments>http://sebastians-pamphlets.com/troubles-made-by-utm-variables-from-google-analytics/#comments</comments>
		<pubDate>Tue, 01 Dec 2009 20:00:33 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Duplicate Content]]></category>

		<category><![CDATA[Analytics]]></category>

		<category><![CDATA[Internet Marketing]]></category>

		<category><![CDATA[Webspam]]></category>

		<category><![CDATA[Spam]]></category>

		<category><![CDATA[SEO]]></category>

		<category><![CDATA[Crap]]></category>

		<category><![CDATA[Copy+Paste-Penalties]]></category>

		<category><![CDATA[AdSense]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/troubles-made-by-utm-variables-from-google-analytics/</guid>
		<description><![CDATA[
I stole this pamphlet&#8217;s title (and more) from Google&#8217;s post Hard facts about comment spam for a reason. In fact, Google spams the Web with useless clutter, too. You doubt it? Read on. That&#8217;s the URI from the link above:
http://googlewebmastercentral.blogspot.com/2009/11/hard-facts-about-comment-spam.html?utm_source=feedburner&#038;utm_medium=feed&#038;utm_campaign=Feed%3A+blogspot%2FamDG+%28Official+Google+Webmaster+Central+Blog%29
I&#8217;ve bolded the canonical URI, everything after the questionmark is clutter added by Google.
When your Google [...]]]></description>
			<content:encoded><![CDATA[
<p>I stole this pamphlet&#8217;s title (and more) from Google&#8217;s post <a href="http://googlewebmastercentral.blogspot.com/2009/11/hard-facts-about-comment-spam.html?utm_source=feedburner&#038;utm_medium=feed&#038;utm_campaign=Feed%3A+blogspot%2FamDG+%28Official+Google+Webmaster+Central+Blog%29">Hard facts about comment spam</a> for a reason. In fact, Google spams the Web with useless clutter, too. You doubt it? Read on. That&#8217;s the URI from the link above:</p>
<p><code><b title="Canonical URI" style="color:black;">http://googlewebmastercentral.blogspot.com/2009/11/hard-facts-about-comment-spam.html</b><i title="Google's query string clutter" style="color:red;">?utm_source=feedburner&#038;utm_medium=feed<br />&#038;utm_campaign=Feed%3A+blogspot%2FamDG+%28Official+Google+Webmaster<r />+Central+Blog%29</i></code></p>
<p><img src="http://sebastians-pamphlets.com/img/posts/ga-kraken.png" width="260" height="301" style="margin-left:5px;" align="right" alt="GA Kraken" title="Google Analytics fucks your canonical URIs" />I&#8217;ve bolded the canonical URI, everything after the questionmark is <a href="http://analytics.blogspot.com/2009/11/integration-with-feedburner.html?utm_source=sebastian&#038;utm_medium=pamphlet&#038;utm_campaign=thou+shalt+not+fuck+with+my+uris">clutter added by Google</a>.</p>
<p>When your Google account lists both Feedburner and GoogleAnalytics as active services, Google will automatically screw your URIs when somebody clicks a link to your site in a feed reader (you can opt out, <a href="http://sebastians-pamphlets.com/troubles-made-by-utm-variables-from-google-analytics/#utm-opt-out">see below</a>).</p>
<h3 id="utm-bad">Why is it bad?</h3>
<p>FACT: <strong>Google&#8217;s method to track traffic from feeds to URIs creates new URIs.</strong> And lots of them. Depending on the number of possible values for each query string variable (<code>utm_source</code> <code>utm_medium</code> <code>utm_campaign</code> <code>utm_content</code> <code>utm_term</code>) the amount of cluttered URIs pointing to the same piece of content can sum up to dozens or more.</p>
<p>FACT: Bloggers (publishers, authors, anybody) naturally copy those cluttered URIs to paste them into their posts. The same goes for user link drops at Twitter and elsewhere. These links get crawled and indexed. Currently Google&#8217;s search index is flooded with <a href="http://www.google.com/search?hl=en&#038;q=inurl:utm_source&#038;utm_source=sebastian&#038;utm_medium=pamphlet&#038;utm_campaign=thou+shalt+not+fuck+with+my+uris">28,900,000 cluttered URIs</a> mostly originating from copy+paste links. <a href="http://www.bing.com/search?q=inurl:utm_source">Bing</a> and <a href="http://search.yahoo.com/search?p=inurl:utm_source">Yahoo</a> didn&#8217;t index GA tracking parameters yet.</p>
<p>That&#8217;s 29 million URIs with tracking variables that point to duplicate content as of today. With every link copied from a feed reader, this number will increase. <a href="http://mattcutts.com/blog/">Matt Cutts</a> <a href="http://friendfeed.com/mattcutts/6309e560/graywolf-i-think-johnmu-suggestions-were-solid">said</a> &#8220;I don&#8217;t think utm will cause dupe issues&#8221; and points to <a href="http://johnmu.com/">John Müller</a>&#8217;s <a href="http://www.seroundtable.com/archives/021170.html">helpful advice</a> (<a href="http://www.cre8asiteforums.com/forums/index.php?showtopic=73804">methods</a> a site owner can apply to tidy up Google&#8217;s mess).</p>
<p>Maybe Google can handle this growing duplicate content chaos in their very own search index. Lets forget that Google is the search engine that <a href="http://googlewebmastercentral.blogspot.com/2009/08/optimize-your-crawling-indexing.html?utm_source=sebastian&#038;utm_medium=pamphlet&#038;utm_campaign=thou+shalt+not+fuck+with+my+uris">advocated</a> URI canonicalization for ages, invented sitemaps, rel=canonical, and countless high sophisticated algos to merge indexed clutter under the canonical URI. It&#8217;s all water under the bridge now that Google is in the create-multiple-URIs-pointing-to-the-same-piece-of-content business itself.</p>
<p>So far that&#8217;s just disappointing. To understand why it&#8217;s downright evil, lets look at the implications from a technical point of view.</p>
<h3 id="utm-evil">Spamming URIs with <i>utm</i> tracking variables breaks lots of things</h3>
<p>Look at this URI: <code>http://www.<span title="This URI exists with another server name">example</spam>.com/search.aspx<b style="color:red;">?</b>Query=musical+mobile<b style="color:red;">?</b>utm_source=Referral&#038;utm_medium=Internet&#038;utm_campaign=celebritybabies</code></p>
<p>Google added a query string to a query string. Two URI segment delimiters (<a href="http://www.w3.org/Addressing/URL/4_URI_Recommentations.html">&#8220;?&#8221;</a>) can cause all sorts of troubles at the landing page.</p>
<p>Some scripts will process only variables from Google&#8217;s query string, because they extract GET input from the URI&#8217;s last questionmark to the fragment delimiter &#8220;#&#8221; or end of URI; some scripts expecting input variables in a particular sequence will be confused at least; some scripts might even use the same variable names &#8230; the number of possible errors caused by amateurish extended query strings is infinite. Even if there&#8217;s only one &#8220;?&#8221; delimiter in the URI.</p>
<p>In some cases the page the user gets faced with will lack the expected content, or will display a prominent error message like 404, or will consist of white space only because the underlying script failed so badly that the Web server couldn&#8217;t even show a 5xx error.</p>
<p>Regardless whether a landing page can handle query string parameters added to the original URI or not (most can), changing someone&#8217;s URI for tracking purposes is plain evil, IMHO, when implemented as opt-out instead of opt-in.</p>
<p>Appended UTM query strings can make trackbacks vanish, too. When a blog checks whether the trackback URI is carrying a link to the blog or not, for example with this <a href="http://sw-guide.de/wordpress/plugins/simple-trackback-validation/">plug-in</a>, the comparision can fail and the trackback gets deleted on arrival, without notice. If I&#8217;d dig a little deeper, most probably I could compile a huge list of other functionalities on the Internet that are broken by Google&#8217;s UTM clutter.</p>
<p>Finally, GoogleAnalytics is not the one and only stats tool out there, and it doesn&#8217;t fulfil all needs. Many webmasters rely on simple server reports, for example referrer stats or tools like awstats, for various technical purposes. Broken. Specialized content management tools feeded by real-time traffic data. Broken. Countless tools for linkpop analysis group inbound links by landing page URI. Broken. URI canonicalization routines. Broken, respecively now acting counterproductive with regard to GA reporting. Google&#8217;s UTM clutter has impact on lots of tools that make sense <em>in addition</em> to Google Analytics. All broken.</p>
<p>What a glorious mess. Frankly, I&#8217;m somewhat puzzled. Google has hired tens of thousands of this planet&#8217;s brightest minds &#8211;I really mean that, literally!&#8211;, and they came out with half-assed crap like that? Un-fucking-believable.</p>
<h3 id="utm-opt-out">What can I do to avoid URI spam on my site?</h3>
<p><strong>Boycott Google&#8217;s poor man&#8217;s approach to link feed traffic data to Web analytics.</strong> Go to <a href="http://feedburner.google.com/?utm_source=sebastian&#038;utm_medium=pamphlet&#038;utm_campaign=thou+shalt+not+fuck+with+my+uris">Feedburner</a>. For each of your feeds click on &#8220;Configure stats&#8221; and uncheck &#8220;Track clicks as a traffic source in Google Analytics&#8221;. Done. Wait for a suitable solution.</p>
<p>If you really can&#8217;t live with traffic sources gathered from a somewhat <a href="http://sebastians-pamphlets.com/webkit-please-rescue-the-http_referer/">unreliable HTTP_REFERER</a>, and you&#8217;ve deep pockets, then hire a WebDev crew to revamp all your affected code. Coward!</p>
<p>As a matter of fact, Google is responsible for this royal pain in the ass. Don&#8217;t fix Google&#8217;s errors on your site. Let Google do the fault recovery. They own the root of all UTM evil, so they have to fix it. There&#8217;s absolutely no reason why a gazillion of webmasters and developers should do Google&#8217;s job, <a href="http://sebastians-pamphlets.com/rip-rel-nofollow-funeral-party/">again and again</a>.</p>
<h3 id="utm-alternatives">What can Google do?</h3>
<p>Well, that&#8217;s quite simple. Instead of adding utterly useless crap to URIs found in feeds, Google can make use of a clever redirect script. When Feedburner serves feed items to anybody, the values of all GA tracking variables are available.</p>
<p>Instead of adding clutter to these URIs, Feedburner could replace them with a script URI that stores the timestamp, the user&#8217;s IP addy, and whatnot, then performs a 301 redirect to the canonical URI. The GA script invoked on the landing page can access and process these data quite accurately. </p>
<p>Perhaps this procedure would be even more accurate, because link drops can no longer mimick feed traffic.</p>
<h3 id="utm-speak-out">Speak out!</h3>
<p>So, if you don&#8217;t approve that Feedburner, GoogleReader, AdSense4Feeds, and GoogleAnalytics gang rape your well designed URIs, then link out to everything Google with a descriptive query string, like:</p>
<p><textarea readonly style="width:500px; height:55px; background:white; color:black; font-size:11pt;" wrap="virtual">?utm_source=sebastian&#038;utm_medium=pamphlet&#038;utm_campaign=thou+shalt+not+fuck+with+my+uris</textarea></p>
<p>I mean, nicely designed canonical URIs should be the search engineer&#8217;s porn, so perhaps somebody at Google will listen. Will ya?</p>
<p><b>Update:</b><a href="http://www.semmys.org/2010/search-tech-all-2010-nominees/"><img id="semmy2010" style="border:0;" align="right" src="http://www.semmys.org/dm/badges/10/LBnom.gif" alt="2010 SEMMY Nominee" /></a></p>
<p>I&#8217;ve just added a <a href="http://sebastians-pamphlets.com/stuff/utm-killer/">&#8220;UTM Killer&#8221; tool</a>, where you can enter a screwed URI and get a clean URI &#8212; all &#8216;utm_&#8217; crap and multiple &#8216;?&#8217; delimiters removed &#8212; in return. That&#8217;ll help when you copy URIs from your feedreader to use them in your blog posts.</p>
<p>By the way, please <a href="http://www.semmys.org/category/search-tech/">vote up this pamphlet</a> so that I get the 2010 SEMMY Award. Thanks in advance!</p>
<hr />Copyright &copy; 2010 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/troubles-made-by-utm-variables-from-google-analytics/", "style": "big", "title": "Hard facts about URI spam" } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/troubles-made-by-utm-variables-from-google-analytics/feed/</wfw:commentRss>
		</item>
		<item>
		<title>As if sloppy social media users ain&#8217;t bad enough &#8230; search engines support traffic theft</title>
		<link>http://sebastians-pamphlets.com/put-an-end-to-uri-shortening/</link>
		<comments>http://sebastians-pamphlets.com/put-an-end-to-uri-shortening/#comments</comments>
		<pubDate>Mon, 26 Oct 2009 19:18:54 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Social Web]]></category>

		<category><![CDATA[MSN]]></category>

		<category><![CDATA[URI shortening]]></category>

		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Spam]]></category>

		<category><![CDATA[Yahoo]]></category>

		<category><![CDATA[Risky Linkage]]></category>

		<category><![CDATA[Twitter]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/put-an-end-to-uri-shortening/</guid>
		<description><![CDATA[
Prepare for a dose of techy tin foil hattery. [Skip rant] Again, I&#8217;m going to rant about a nightmare that Twitter &#038; Co created with their crappy, thoughtless and shortsighted software designs: URI shorteners (yup, it&#8217;s URI, not URL).
Recap: Each and every 3rd party URI shortener is evil by design. Those questionable services do/will steal [...]]]></description>
			<content:encoded><![CDATA[
<p>Prepare for a dose of <a href="http://sebastians-pamphlets.com/dear-search-engines-please-rescue-our-shortened-urls/#comment-1832">techy tin foil hattery</a>. <span style="color:gray;">[<a href="http://sebastians-pamphlets.com/put-an-end-to-uri-shortening/#se-howto-suri"  style="color:gray;">Skip rant</a>]</span> Again, I&#8217;m going to rant about a <a href="http://sebastians-pamphlets.com/dear-search-engines-please-rescue-our-shortened-urls/">nightmare</a> that Twitter &#038; Co created with their <a href="http://tag.us.com/uri-shorteners-suck-ass.htm#twitter-crap">crappy, thoughtless and shortsighted software designs</a>: <a href="http://tag.us.com/uri-shorteners-suck-ass.htm">URI shorteners</a> (yup, it&#8217;s <a href="http://en.wikipedia.org/wiki/Uniform_Resource_Identifier" rel="nofollow">UR<b>I</b></a>, not <a href="http://en.wikipedia.org/wiki/Uniform_Resource_Locator" rel="nofollow">URL</a>).</p>
<p><img src="http://sebastians-pamphlets.com/img/posts/evil-urishorteners-seduce-social-media-users-.png" width="200" height="301" align="left" alt="don't get seduced by URI shorteners" style="margin-right:3px;" /><strong>Recap:</strong> Each and every 3rd party URI shortener is evil by design. Those questionable services do/will steal your traffic and your Google juice, mislead and piss off your potential <strike>visitors</strike> customers, and hurt you in countless other ways. If you consider yourself south of sanity, do not make use of shortened URIs you don&#8217;t own.</p>
<p>Actually, this pamphlet is not about sloppy social media users who shoot themselves in both feet, and it&#8217;s not about unscrupulous micro blogging platforms that force their users to hand over their assets to felonious traffic thieves. It&#8217;s about search engines that, in my humble opinion, handle the <b>sURL dilemma</b> totally wrong.</p>
<p>Some of my claims are based on experiments that I&#8217;m not willing to reveal (yet). For example I won&#8217;t explain sneaky URI hijacking or how I stole a portion of tinyurl.com&#8217;s search engine traffic with a shortened URI, passing searchers to a charity site, although it seems the search engine I&#8217;ve gamed has closed this particular loophole now. There&#8217;re still way too much playgrounds for deceptive tactics involving <a href="http://sebastians-pamphlets.com/links/categories/?cat=s-url">shortened URIs</a> &#8230; </p>
<h3 id="se-howto-suri">How should a search engine handle a shortened URI?</h3>
<p>Handling an URI as <i><span title="SEO copywriting. Of course that's URI, not URL.">shortened URL</span></i> requires a bullet proof method to detect shortened URIs. That&#8217;s a breeze.</p>
<ul>
<li>Redirect patterns: URI shorteners receive lots of external inbound links that get redirected to 3rd party sites. Linking pages, stopovers and destination pages usually reside on different domains. The method of redirection can vary. Most URI shorteners perform 301 redirects, some use 302 or 307 HTTP response codes, some frame the destination page displaying ads on the top frame, and I&#8217;ve seen even a few of them making use of meta refreshs and client sided redirects. Search engines can detect all those procedures.</li>
<li>Link appearance: redirecting URIs that belong to URI shorteners often appear on pages and in feeds hosted by social media services (Twitter, Facebook &#038; Co).</li>
<li>Seed: trusted sources like LongURL.org provide <a href="http://longurl.org/services">lists of domains</a> owned by URI shortening services. Social media outlets providing their own URI shorteners don&#8217;t hide server name patterns (like su.pr &#8230;).</li>
<li>Self exposure: the <a href="http://rickroll.it/" rel="crap nofollow">root index pages</a> of URI shorteners, as well as <a href="http://tag.us.com/_2h">other pages</a> on those domains that serve a 200 response code, usually mention explicit terms like &#8220;shorten your URL&#8221; et cetera.</li>
<li>URI length: the length of an URI string, if less or equal 20 characters, is an indicator at most, because some URI shortening services offer keyword rich short URIs, and many sites provide natural URIs this short.</li>
</ul>
<p>Search engine crawlers bouncing at short URIs should do a lookup, following  the complete chain of redirects. (Some whacky services shorten everything that looks like an URI, even shortened URIs, or do a lookup themselves replacing the original short URI with another short URI that they can track. Yup, that&#8217;s some crazy insanity.)</p>
<p>Each and every stopover (shortened URI) should get indexed as an alias of the destination page, but must not appear on SERPs unless the search query contains the short URI or the destination URI (that means not on [site:tinyurl.com] SERPs, but on a [site:tinyurl.com shortURI] or a [destinationURI] search result page). 3rd party stopovers mustn&#8217;t gain reputation (PageRank™, anchor text, or whatever), regardless the method of redirection. All the link juice belongs to the destination page.</p>
<p>In other words: search engines should make use of their knowledge of shortened URIs in response to navigational search queries. In fact, <a href="http://sebastians-pamphlets.com/dear-search-engines-please-rescue-our-shortened-urls">search engines could even solve the problem of vanished and abused short URIs</a>. </p>
<p>Now let&#8217;s see <strong>how major search engines handle shortened URIs</strong>, and how they could improve their SERPs.</p>
<h3 id="bing-surl-fail">Bing doesn&#8217;t get redirects at all</h3>
<p><img src="http://sebastians-pamphlets.com/img/posts/bing-301-only-uri-listings.png" width="121" height="138" align="left" style="margin-right:3px;" alt="Bing 301 messed up SERPs" />Oh what a mess. The candidate from Redmond fails totally on understanding the <a href="http://sebastians-pamphlets.com/the-anatomy-of-http-redirects-301-302-307/" title="with regard to redirects">HTTP protocol</a>. Their search index is flooded with a bazillion of URI-only listings that all do a 301 redirect, more than 200,000 from tinyurl.com alone. Also, you&#8217;ll find URIs that do a permanent redirect and have nothing to do with URI shortening in their index, too. </p>
<p>I can&#8217;t be bothered with checking what Bing does in response to other redirects, since the 301 test fails so badly. Clicking on their first results for [site:tinyurl.com], I&#8217;ve noticed that many lead to <code>mailto://working-email-addy</code> type of destinations. Dear Bing, please remove those search results as soon as possible, before anyone figures out how to use your SERPs/APIs to launch massive email spam campaigns. As for tips on how to improve your short-URI-SERPs, please learn more under <a href="http://sebastians-pamphlets.com/put-an-end-to-uri-shortening/#yahoo-surl-not-bad">Yahoo</a> and <a href="http://sebastians-pamphlets.com/put-an-end-to-uri-shortening/#google-surl-not-bad">Google</a>.</p>
<h3 id="yahoo-surl-not-bad">Yahoo does an awesome job, with a tiny exception</h3>
<p><img src="http://sebastians-pamphlets.com/img/posts/yahoo-surl-listing.png" width="200" height="29" align="left" style="margin-right:3px;" alt="Yahoo 301 somewhat Ok" />Yahoo has done a better job. They index short URIs and show the destination page, at least via their site explorer. When I search for a tinyURL, the SERP link points to the URI shortener, that could get improved by linking to the destination page. </p>
<p>By the way, Yahoo is the only search engine that handles abusive short-URIs totally right (I will not elaborate on this issue, so please don&#8217;t ask for detailled information if you&#8217;re not a SE engineer). Yahoo bravely passed the 301 test, as well as others (including pretty evil tactics). I so hope that MSN will adopt Yahoo&#8217;s bright logic before Bing overtakes Yahoo search. By the way, that can be accomplished without sending out spammy bots (hint2bing).</p>
<h3 id="google-surl-not-bad">Google does it by the book, but there&#8217;s room for improvements</h3>
<p><img src="http://sebastians-pamphlets.com/img/posts/google-surl-no-link-to-destination-page.png" width="366" height="72" align="left" style="margin-right:3px;" alt="Google fails with merits" />As for tinyURLs, Google indexes only pages on the tinyurl.com domain, including previews. Unfortunately, the snippets don&#8217;t provide a link to the destination page. Although that&#8217;s the expected behavior (those URIs aren&#8217;t linked on the crawled page), that&#8217;s sad. At least Google didn&#8217;t fail on the 301 test.</p>
<p>As for the somewhat evil tactis I&#8217;ve applied in my tests so far, Google fell in love with some abusive short-URIs. Google &#8211;under particular circumstances&#8211; indexes shortened URIs that game Googlebot, having sent SERP traffic to sneakily shortened URIs (that face the searcher with huge ads) instead of the destination page. Since I&#8217;ve begun to deploy sneaky sURLs, Google greatly improved their spam filters, but they&#8217;re not yet perfect.</p>
<p>Since Google is responsible for most of this planet&#8217;s SERP traffic, I&#8217;ve put better sURL handling at the very top of my xmas wish list.</p>
<h3 id="about-abusive-suris">About abusive short URIs</h3>
<p>Shortened URIs do poison the Internet. They vanish, alter their destination, mislead surfers &#8230; in other words they are abusive by definition. <b>There&#8217;s no such thing as a persistent short URI!</b></p>
<p>Long time ago <a href="http://www.w3.org/People/Berners-Lee/">Tim Berners-Lee</a> told you that <a href="http://www.w3.org/Provider/Style/URI?iseewhatyoudidthere"><strike>URI shorteners are evil</strike> fucking with URIs is a very bad habit</a>. Did you listen? Do you make use of shortened URIs? <strong>If you post URIs that get shortened at Twitter, or if you make use of 3rd party URI shorteners elsewhere, consider yourself trapped into a low-life traffic theft scam.</strong> Shame on you, and shame on Twitter &#038; Co.</p>
<p><img src="http://sebastians-pamphlets.com/img/posts/stop-evil-uri-shorteners-now.png" width="200" height="261" align="right" alt="fight evil URI shorteners" style="margin-left:2px;" />Besides my somewhat shady experiments that hijacked URIs, stole SERP positions, and converted &#8220;borrowed&#8221; SERP traffic, there are so many other ways to abuse shortened URIs. Many of them are outright evil. Many of them do hurt your kids, and mine. Basically, that&#8217;s not any search engine&#8217;s problem, but search engines could help us getting rid of the root of all sURL evil by handling shortened URIs with common sense, even when the last short URI has vanished.</p>
<h3 id="fight-suri">Fight shortened URIs!</h3>
<p><strong>It&#8217;s up to you. Go stop it. As long as you can&#8217;t avoid URI shortening, roll your own URI shortener and make sure it can&#8217;t <a href="http://www.davidnaylor.co.uk/dangers-of-custom-shortened-urls.html">get</a> <a href="http://www.davidnaylor.co.uk/dont-make-the-same-mistakes-as-bit-ly-and-tr-im.html">abused</a>. For the sake of our children, do not use or support 3rd party URI shorteners. Deprive the livelihood of these utterly useless scumbags.</strong></p>
<p>Unfortunately, as a father and as a webmaster, I don&#8217;t believe in common sense applied by social media services. Hence, I see a &#8220;Twitter actively bypasses safe-search filters tricking my children into viewing hardcore porn&#8221; post coming. Dear Twitter &#038; Co. &#8212; and that addresses all services that make use of or transport shortened URIs &#8212; put and end to shortened URIs. Now!</p>
<hr />Copyright &copy; 2010 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/put-an-end-to-uri-shortening/", "style": "big", "title": "As if sloppy social media users ain't bad enough ... search engines support traffic theft" } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/put-an-end-to-uri-shortening/feed/</wfw:commentRss>
		</item>
		<item>
		<title>How to handle a machine-readable pandemic that search engines cannot control</title>
		<link>http://sebastians-pamphlets.com/rip-rel-nofollow-funeral-party/</link>
		<comments>http://sebastians-pamphlets.com/rip-rel-nofollow-funeral-party/#comments</comments>
		<pubDate>Fri, 19 Jun 2009 20:59:51 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Web development]]></category>

		<category><![CDATA[X-Robots-Tag]]></category>

		<category><![CDATA[Blogging]]></category>

		<category><![CDATA[Risky Linkage]]></category>

		<category><![CDATA[Paid Links]]></category>

		<category><![CDATA[Microformats]]></category>

		<category><![CDATA[Google]]></category>

		<category><![CDATA[SEO]]></category>

		<category><![CDATA[Cloaking]]></category>

		<category><![CDATA[Nofollow]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/rip-rel-nofollow-funeral-party/</guid>
		<description><![CDATA[
When you&#8217;re familiar with my various rants on the ever morphing rel-nofollow microformat infectious link disease, don&#8217;t read further. This post is not polemic, ironic, insulting, or otherwise meant to entertain you. I&#8217;m just raving about a way to delay the downfall of the InterWeb.

Lets recap: The World Wide Web is based on hyperlinks. Hyperlinks [...]]]></description>
			<content:encoded><![CDATA[
<p><img src="http://sebastians-pamphlets.com/img/posts/rel-nofollow-rest-in-peace.png" width="200" height="248" align="right" style="margin-left:2px;" alt="R.I.P. rel-nofollow" title="Rest In Peace, rel-nofollow!" />When you&#8217;re familiar with my various rants on the ever morphing <strike>rel-nofollow microformat</strike> <a href="http://sebastians-pamphlets.com/links/categories/?cat=nofollow">infectious link disease</a>, don&#8217;t read further. This post is not polemic, ironic, insulting, or otherwise meant to entertain you. I&#8217;m just raving about a way to delay the downfall of the InterWeb.</p>
<div style="margin-left:5px; border-left:thin dotted red; pading-left:5px;">
<p style="margin-left:5px;"><b>Lets recap:</b> The World Wide Web is based on hyperlinks. Hyperlinks are supposed to lead humans to interesting stuff they want to consume. This simple and therefore brilliant concept worked great for years. The Internet grew up, bubbled a bit, but eventually it gained world domination. Internet traffic was counted, sold, bartered, purchased, and even exchanged for free in units called &#8220;hits&#8221;. (A &#8220;hit&#8221; means one human surfer landing on a sales pitch. That is a popup hell designed in a way that somebody involved just has to make a sale).</p>
<p style="margin-left:5px;">Then in the past century two smart guys discovered that links scraped from Web pages can be misused to provide humans with very accurate search results. They even created a new currency on the Web, and quickly assigned their price tags to Web pages. Naturally, folks began to trade green pixels instead of traffic. After a short while the Internet voluntarily transferred it&#8217;s world domination to the company founded by those two smart guys from Stanford.</p>
<p style="margin-left:5px;">Of course the huge amount of green pixel trades made the search results based on link popularity somewhat useless, because the webmasters gathering the most incoming links got the top 10 positions on the search result pages (SERPs). Search engines claimed that a few webmasters cheated on their way to the first SERPs, although lawyers say there&#8217;s no evidence of any illegal activities related to search engine optimization (SEO).</p>
<p style="margin-left:5px;">However, after suffering from heavy attacks from a whiny blogger, the Web&#8217;s dominating search engine got somewhat upset and required that all webmasters have to assign a machine-readable tag (link condom) to links sneakily inserted into their Web pages by other webmasters. &#8220;Sneakily inserted links&#8221; meant references to authors as well as links embedded in content supplied by users. All blogging platforms, CMS vendors and alike implemented the link condom, eliminating presumably 5.00% of the Web&#8217;s linkage at this time.</p>
<p style="margin-left:5px;">A couple of months later the world dominating search engine demanded that webmasters have to condomize their banner ads, intercompany linkage and other commercial links, as well as all hyperlinked references that do not count as pure academic citation (aka editorial links). The whole InterWeb complied, since this company controlled nearly all the free traffic available from Web search, as well as the Web&#8217;s purchasable traffic streams.</p>
<p style="margin-left:5px;">Roughly 3.00% of the Web&#8217;s links were condomized, as the search giant spotted that their users (searchers) missed out on lots and lots of valuable contents covered by link condoms. Ooops. Kinda dilemma. Taking back the link condom requirements was no option, because this would have flooded the search index with billions of unwanted links empowering commercial content to rank above boring academic stuff.</p>
<p style="margin-left:5px;">So the handling of link condoms in the search engine&#8217;s crawling engine as well as in it&#8217;s ranking algorithm was changed silently. Without telling anybody outside their campus, some condomized links gained power, whilst others were kept impotent. In fact they&#8217;ve developed a method to judge each and every link on the whole Web without a little help from their <strike>friends</strike> link condoms. In other words, the link condom became obsolete.</p>
<p style="margin-left:5px;">Of course that&#8217;s what they should have done in the first place, without asking the world&#8217;s webmasters for gazillions of free-of-charge man years producing shitloads of useless code bloat. Unfortunately, they didn&#8217;t have the balls to stand up and admit &#8220;sorry folks, we&#8217;ve failed miserably, link condoms are history&#8221;. Therefore the Web community still has to bother with an obsolete microformat. And if they &#8211;the link comdoms&#8211; are not dead, then they live today. In your markup. Hurting your rankings.</p>
</div>
<p ytele="margin-left:30px;"><small>If you, dear reader, are a Googler, then please don&#8217;t feel too annoyed. You may have thought that you didn&#8217;t do evil, but the above said reflects what webmasters outside the &#8216;Plex got from your actions. Don&#8217;t ignore it, please think about it from our point of view. Thanks.</small></p>
<p>Still here and attentive? Great. Now lets talk about scenarios in WebDev where you still can&#8217;t avoid rel-nofollow. If there are any &#8212; We&#8217;ll see.</p>
<h3>PageRank&trade; sculpting</h3>
<p>Dude, PageRank&trade; sculpting with rel-nofollow doesn&#8217;t work for the average webmaster. It might even fail when applied as high sophisticated SEO tactic. So don&#8217;t even think about it. Simply remove the <code>rel=nofollow</code> from links to your TOS, imprint, and contact page. Cloak away your links to signup pages, login pages, shopping carts and stuff like that.</p>
<h3>Link monkey business</h3>
<p>I leave this paragraph empty, because when you know what you do, you don&#8217;t need advice.</p>
<h3>Affiliate links</h3>
<p>There&#8217;s no point in serving <a href="http://www.smart-it-consulting.com/article.htm?node=155&#038;page=90">A elements</a> to Googlebot at all. If you haven&#8217;t cloaked your aff links yet, go see a SEO doctor.</p>
<h3>Advanced SEO purposes</h3>
<p>See above.</p>
<p><b>So what&#8217;s left?</b> User generated content. Lets concentrate our extremely superfluous condomizing efforts on the one and only occasion that might allow to apply rel-nofollow to a hyperlink on request of a major search engine, if there&#8217;s any good reason to paint shit brown at all.</p>
<h3>Blogging</h3>
<p>If you link out in a blog post, then you vouch for the link&#8217;s destination. In case you disagree with the link destination&#8217;s content, just put the link as</p>
<p><strong id="enemylink" title="http://example.com/"><code>&lt;strong class="blue_underlined" title="http://myworstenemy.org/" onclick="window.location=this.title;"&gt;<span onclick="window.location=document.getElementById('enemylink').title; return false;" style="color:blue; text-decoration:underlined;">My Worst Enemy</span>&lt;/strong&gt;</code></strong></p>
<p>or so. The surfer can click the link and lands at the estimated URI, but search engines don&#8217;t pass reputation. Also, they don&#8217;t evaporate link juice, because they don&#8217;t interpret the markup as hyperlink.</p>
<h3>Blog comments</h3>
<p>My rule of thumb is: <strong>Moderate, DoFollow quality, DoDelete crap</strong>. Install a conditional do-follow plug-in, set everything on moderation, use captchas or something similar, then let the comment&#8217;s link juice flow. You can maintain a white list that allows instant appearance of comments from your buddies.</p>
<h3>Forums, guestbooks and unmoderated stuff like that</h3>
<p>Separate all Web site areas that handle user generated content. Serve &#8220;index,nofollow&#8221; meta tags or x-robots-headers for all those pages, and link them from a site map or so. If you gather index-worthy content from users, then feed crawlers the content in a parallel &#8211;crawlable&#8211; structure, without submit buttons, perhaps with links from trusted users, and redirect human visitors to the interactive pages. Vice versa redirect crawlers requesting live pages to the spider fodder. All those redirects go with a 301 HTTP response code.</p>
<p>If you lack the technical skills to accomplish that, then edit your <code>/robots.txt</code> file as follows:</p>
<p><code>User-agent: Googlebot<br />
# Dear Googlebot, drop me a line when you can handle forum pages<br />
# w/o rel-nofollow crap. Then I'll allow crawling.<br />
# Treat that as conditional disallow:<br />
Disallow: /forum</code></p>
<p>As soon as Google can handle your user generated content naturally, they might send you a message in their Webmaster console.</p>
<h3>Anything else</h3>
<p>Judge yourself. Most probably you&#8217;ll find a way to avoid rel-nofollow.</p>
<h3>Conclusion</h3>
<p><strong>Absolutely nobody needs the rel-nofollow microformat. Not even search engines for the sake of their index.</strong> Hence webmasters as well as search engines can stop wasting resources. Farewell <code>rel="nofollow"</code>, rest in peace. We won&#8217;t miss you.</b></p>
<hr />Copyright &copy; 2010 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/rip-rel-nofollow-funeral-party/", "style": "big", "title": "How to handle a machine-readable pandemic that search engines cannot control" } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/rip-rel-nofollow-funeral-party/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Vaporize yourself before Google burns your linking power</title>
		<link>http://sebastians-pamphlets.com/dear-google-please-vaporize-yourself-and-dont-bother-us-webmasters/</link>
		<comments>http://sebastians-pamphlets.com/dear-google-please-vaporize-yourself-and-dont-bother-us-webmasters/#comments</comments>
		<pubDate>Tue, 16 Jun 2009 19:14:12 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Webspam]]></category>

		<category><![CDATA[Paid Links]]></category>

		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Web development]]></category>

		<category><![CDATA[Internet Marketing]]></category>

		<category><![CDATA[Crawler Directives]]></category>

		<category><![CDATA[Crap]]></category>

		<category><![CDATA[Google]]></category>

		<category><![CDATA[Microformats]]></category>

		<category><![CDATA[SEO]]></category>

		<category><![CDATA[Cloaking]]></category>

		<category><![CDATA[Anchor Text]]></category>

		<category><![CDATA[Nofollow]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/dear-google-please-vaporize-yourself-and-dont-bother-us-webmasters/</guid>
		<description><![CDATA[
I couldn&#8217;t care less about PageRank&#8482; sculpting, because a well thought out link architecture does the job with all search engines, not just Google. That&#8217;s where Google is right on the money.
They own PageRank&#8482;, hence they can burn, evaporate, nillify, and even divide by zero or multiply by -1 as much PageRank&#8482; as they like; [...]]]></description>
			<content:encoded><![CDATA[
<p><img id="pic1" src="http://sebastians-pamphlets.com/img/posts/google-page-rank-factory-2007.png" width="218" height="367" border="0" align="right" alt="PIC-1: Google PageRank(tm) 2007" title="Google PageRank(tm) 2007" />I couldn&#8217;t care less about PageRank&trade; sculpting, because a well thought out link architecture does the job with all search engines, not just Google. That&#8217;s where Google is right on the money.</p>
<p>They own PageRank&trade;, hence they can burn, evaporate, nillify, and even divide by zero or multiply by -1 as much PageRank&trade; as they like; of course as long as they rank my stuff nicely above my competitors.</p>
<p><a href="http://sebastians-pamphlets.com/dear-google-please-vaporize-yourself-and-dont-bother-us-webmasters/#pic1">Picture 1</a> shows Google&#8217;s PageRank&trade; factory as of 2007 or so. Actually, it&#8217;s a pretty simplified model, but since they&#8217;ve changed the PageRank&trade; algo anyway, you don&#8217;t need to bother with all the geeky details.</p>
<p>As a side note: you might ask why I don&#8217;t link to <a id="dullestlink" onmouseout="alert('Ok, you couldn`t resist ... and here you go. That is, if you`re able to click the darn link to Matt`s blog ...'); return true;" onclick="alert('Matt didn`t pay me to link out, so go search his blog on the Interweb'); window.location=document.getElementById('dullestlink').rev; return true;" rev="http://www.dullest.com/blog/pagerank-sculpting/" style="color:#0063dc;font-weight:400;text-decoration:none;border-bottom:1px solid #ccc;">Matt Cutts</a> and <a href="http://searchengineland.com/pagerank-sculpting-is-dead-long-live-pagerank-sculpting-21102" id="searchenginelandlink" onmouseout="alert('Why do you think that`s a link? Never trust underlined blue text anymore! Guess where you`ll land, search cowboy ...'); return true;">Danny Sullivan</a> discussing the whole mess on their blogs? Well, probably Matt can&#8217;t afford my advertising rates, and the whole SEO industry has linked to Danny anyway. If you&#8217;re nosy, check out my source code to learn more about state of the art linkage very compliant to <span onclick="alert('Gotcha!  That`s not a link, it`s a fucking fake as per Google`s request. I can`t link out to Google`s guidelines any more, coz they steal my link juice.'); return true;" style="color:#0063dc;font-weight:400;text-decoration:none;border-bottom:1px solid #ccc;" title="High quality nonsense">Google&#8217;s newest guidelines for advanced SEOs</span> (summary: &#8220;Don&#8217;t trust underlined blue text on Web pages any longer!&#8221;).</p>
<p><img id="pic2" src="http://sebastians-pamphlets.com/img/posts/google-page-rank-factory-2009.png" width="218" height="429" border="0" align="right" alt="PIC-2: Google PageRank(tm) 2009" title="Google PageRank(tm) 2009" />What really matters is <a href="http://sebastians-pamphlets.com/dear-google-please-vaporize-yourself-and-dont-bother-us-webmasters/#pic2">picture 2</a>, revealing Google&#8217;s new PageRank&trade; facilities, silently launched in 2008. Again, geeky details are of minor interest. If you really want to know everything, then search for  [<a href="http://www.seofaststart.com/blog/googles-operation-bendover-exposed-nofollow-pagerank-sculpting" rel="dofollow highly-recommended" style="text-decoration:none;"><code style="font-weight:bolder;font-size:1em;">operation bendover</code></a>] at !Yahoo (it&#8217;s still top secret, and therefore not searchable at Google).</p>
<p>Unfortunately, advanced SEO folks <small>(whatever that means, I use this term just because it seems to be an essential property assigned to the participants of the current PageRank&trade; <strike>uprising</strike> discussion)</small> always try to confuse you with <a href="http://www.seomoz.org/blog/google-says-yes-you-can-still-sculpt-pagerank-no-you-cant-do-it-with-nofollow">overcomplicated graphics and formulas</a> when it comes to PageRank&trade;. Instead, I ask you to focus on the (important) hard core stuff. So go grab a magnifier, and work out the differences:</p>
<ul>
<li><a href="http://sebastians-pamphlets.com/dear-google-please-vaporize-yourself-and-dont-bother-us-webmasters/#pic2">PageRank&trade; 2009</a> in comparision to <a href="http://sebastians-pamphlets.com/dear-google-please-vaporize-yourself-and-dont-bother-us-webmasters/#pic1">PageRank&trade; 2007</a> comes with a pipeline supplying unlimited fuel. Also, it seems they&#8217;ve implemented the green new deal, switching from gas to natural gas. That means they can vaporize way more link juice than ever before.</li>
<li>PageRank&trade; 2009 produces more steam, and the clouds look slightly different. Whilst PageRank&trade; 2007 ignored <a href="http://sebastians-pamphlets.com/links/categories/?cat=nofollow">nofollow crap</a> as well as links put with client sided scripting, PageRank&trade; 2009 evaporates not only juice covered with <a href="http://link-condom.com/">link condoms</a>, but also tons of other permutations of the <a href="http://www.smart-it-consulting.com/article.htm?node=155&#038;page=90">standard A element</a>. </li>
<li>To compensate the huge overall loss of PageRank&trade; caused by those changes, Google has decided to pass link juice from condomized links to their target URI hidden to Googlebot with JavaScript. Of course Google formerly has recommended the use of JavaScript-links to prevent the webmasters from penalties for so-called &#8220;questionable&#8221; outgoing links. Just as they&#8217;ve not only invented rel-nofollow, but heavily recommended the use of this microformat with all links disliked by Google, and now they take that back as if a gazillion links on the Web could magically change just because Google tweeks their algos. Doh! I really hope that the WebSpam-team checks the age of such links before they penalize everything implemented according to their guidelines before mid-2009 or the InterWeb&#8217;s downfall, whatever comes last. </li>
</ul>
<p>I guess in the meantime you&#8217;ve figured out that I&#8217;m somewhat pissed. Not that the secretly changed flow of PageRank&trade; a year ago in 2008 had any impact on my rankings, or SERP traffic. I&#8217;ve always designed my stuff with PageRank&trade; flow in mind, but without any misuses of rel=&#8221;nofollow&#8221;, so I&#8217;m still fine with Google.</p>
<p> What I can&#8217;t stand is when a search engine tries to tell me how I&#8217;ve to link (out). Google engineers are really smart folks, they&#8217;re perfectly able to develop a PageRank&trade; algo that can decide how much Google-juice a particular link should pass. So dear Googlers, please &#8211;WRT to the implementation of hyperlinks&#8211; leave us webmasters alone, dump the rel-nofollow crap and rank our stuff in the best interest of your searchers. No longer bother us with linking guidelines that change yearly. It&#8217;s not our job nor responsibility to act as your <strike>cannon fodder</strike> slavish code monkeys when you spot a loophole in your ranking- or spam-detection-algos.</p>
<p>Of course the above said is based on common sense, so Google won&#8217;t listen (remember: I&#8217;m really upset, hence polemic statements are absolutely appropriate). To prevent webmasters from irrational actions by misleaded search engines, I hereby introduce the</p>
<h3>Webmaster guidelines for search engine friendly links</h3>
<p>What follows is pseudo-code, implement it with your preferred server sided scripting language.</p>
<p><code>if (getAttribute($link, 'rel') matches '*nofollow*' &#038;&#038;<br />
&nbsp;&nbsp;&nbsp;&nbsp;$userAgent matches '*Googlebot*') {<br />
&nbsp;&nbsp;&nbsp;&nbsp;print '&lt;strong rev="' + getAttribute(link, 'href') + '"'<br />
&nbsp;&nbsp;&nbsp;&nbsp;+ ' style="color:blue; text-decoration:underlined;"'<br />
&nbsp;&nbsp;&nbsp;&nbsp;+ ' onmousedown="window.location=document.getElementById(this.id).rev; "'<br />
&nbsp;&nbsp;&nbsp;&nbsp;+ '&gt;' + getAnchorText($link) + '&lt;/strong&gt;';<br />
}<br />
else {<br />
&nbsp;&nbsp;&nbsp;&nbsp;print $link;<br />
}</code></p>
<p>Probably it&#8217;s a good idea to snip both the onmousedown trigger code as well as the rev attribute, when the script gets executed by Googlebot. Just because today Google states that they&#8217;re going to pass link juice to URIs grabbed from the onclick trigger, that doesn&#8217;t mean they&#8217;ll never look at the onmousedown event or misused (X)HTML attributes.</p>
<p>This way you can deliver Googlebot exactly the same stuff that the <strike>punter</strike> surfer gets. You&#8217;re perfectly compliant to Google&#8217;s cloaking restrictions. There&#8217;s no need to bother with complicated stuff like iFrames or even disabled blog comments, forums or guestbooks.</p>
<p>Just feed the crawlers with all the crap the search engines require, then concentrate all your efforts on your UI for human vistors. Web robots (bots, crawlers, spiders, &#8230;) don&#8217;t supply your signup-forms w/ credit card details. Humans do. If you find the time to upsell them while search engines keep you busy with thoughtless change requests all day long.</p>
<hr />Copyright &copy; 2010 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/dear-google-please-vaporize-yourself-and-dont-bother-us-webmasters/", "style": "big", "title": "Vaporize yourself before Google burns your linking power" } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/dear-google-please-vaporize-yourself-and-dont-bother-us-webmasters/feed/</wfw:commentRss>
		</item>
		<item>
		<title>@ALL: Give Google your feedback on NOINDEX, but read this pamphlet beforehand!</title>
		<link>http://sebastians-pamphlets.com/give-google-your-feedback-on-noindex/</link>
		<comments>http://sebastians-pamphlets.com/give-google-your-feedback-on-noindex/#comments</comments>
		<pubDate>Mon, 25 Feb 2008 11:08:34 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[URL removal]]></category>

		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[X-Robots-Tag]]></category>

		<category><![CDATA[Robots Meta Tags]]></category>

		<category><![CDATA[Crawler Directives]]></category>

		<category><![CDATA[SEO]]></category>

		<category><![CDATA[robots.txt]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/give-google-your-feedback-on-noindex/</guid>
		<description><![CDATA[
Matt Cutts asks us How should Google handle NOINDEX? That&#8217;s a tough question worth thinking twice before you submit a comment to Matt&#8217;s post. Here is Matt&#8217;s question, all the background information you need, and my opinion.
What is NOINDEX?
Noindex is an indexer directive defined in the Robots Exclusion Protocol (REP) from 1996 for use in [...]]]></description>
			<content:encoded><![CDATA[
<p><img src="http://sebastians-pamphlets.com/img/google/dear-google-please-respect-noindex.png" width="250" height="230" align="right" style="margin-left:4px;" alt="Dear Google, please respect NOINDEX" title="Dear Google, please respect NOINDEX, it means don't mention on SERPs!" />Matt Cutts asks us <a href="http://www.mattcutts.com/blog/google-noindex-behavior/">How should Google handle NOINDEX?</a> That&#8217;s a tough question worth thinking twice before you <a href="http://www.mattcutts.com/blog/google-noindex-behavior/#postcomment">submit a comment to Matt&#8217;s post</a>. Here is Matt&#8217;s question, all the background information you need, and my opinion.</p>
<h3>What is NOINDEX?</h3>
<p><a href="http://www.robotstxt.org/meta.html">Noindex</a> is an indexer directive defined in the <a href="http://sebastians-pamphlets.com/links/categories/?cat=crawler-directives">Robots Exclusion Protocol</a> (REP) from 1996 for use in <a href="http://sebastians-pamphlets.com/links/categories/?cat=robots-meta-tags">robots meta tags</a>. Putting a <b>NOINDEX</b> value in a page&#8217;s robots meta tag or <a href="http://sebastians-pamphlets.com/links/categories/?cat=x-robots-tag">X-Robots-Tag</a> <b>tells search engines that they shall not index the page content</b>, but may follow links provided on the page.</p>
<p>To <a href="http://sebastians-pamphlets.com/robots-exclusion-protocol-round-up-2008-01/">get a grip on NOINDEX&#8217;s role in the REP</a> please read my <a href="http://www.seomoz.org/blog/robots-exclusion-protocol-101">Robots Exclusion Protocol summary at SEOmoz</a>. Also, <a href="http://sebastians-pamphlets.com/stealthy-rep-experiments-google-jumping-the-shark/">Google experiments with NOINDEX as crawler directive</a> in <a href="http://sebastians-pamphlets.com/links/categories/?cat=robotstxt">robots.txt</a>, more on that later.</p>
<h3>How major search engines treat NOINDEX</h3>
<p>Of course you could <a href="http://sebastians-pamphlets.com/links/categories/?definitions=TRUE">read a ton of my pamphlets</a> to extract this information, but <a href="http://www.mattcutts.com/blog/handling-noindex-meta-tags/">Matt&#8217;s summary</a> is still accurate and easier to digest:</p>
<blockquote><ul>[Matt Cutts on August 30, 2006]
<li>Google doesn’t show the page in any way.</li>
<li>Ask doesn’t show the page in any way.</li>
<li>MSN shows a URL reference and cached link, but no snippet. Clicking the cached link doesn’t return anything.</li>
<li>Yahoo! shows a URL reference and cached link, but no snippet. Clicking on the cached link returns the cached page.</li>
</ul>
<p>Personally, I’d prefer it if every search engine treated the noindex meta tag by not showing a page in the search results at all. [Meanwhile Matt might have a slightly different opinion.]</p>
</blockquote>
<p>Google&#8217;s experimental support of NOINDEX as crawler directive in robots.txt also includes the DISALLOW functionality (an instruction that forbids crawling), and most probably URIs tagged with NOINDEX in robots.txt cannot accumulate PageRank. In my humble opinion <a href="http://sebastians-pamphlets.com/standardization-of-rep-tags-as-robots-txt-directives/#existing-rep-tags">the DISALLOW behavior of NOINDEX in robots.txt is completely wrong</a>, and without any doubt in no way compliant to the Robots Exclusion Protocol.</p>
<h3>Matt&#8217;s question: How should Google handle NOINDEX in the future?</h3>
<p>To simplify <a href="http://www.mattcutts.com/blog/wp-content/plugins/democracy/democracy.php?dem_action=show_vote_screen&#038;dem_poll_id=6">Matt&#8217;s poll</a>, lets assume he&#8217;s talking about NOINDEX as <b>indexer directive</b>, regardless where a Webmaster has put it (robots meta tag, X-Robots-Tag, or robots.txt).</p>
<blockquote><p>The question is whether Google should completely drop a NOINDEX’ed page from our search results vs. show a reference to the page, or something in between?</p>
</blockquote>
<p>Here are the arguments, or pros and cons, for each variant:</p>
<dl>
<dt>Google should completely drop a NOINDEX’ed page from their search results</dt>
<dd>
<p>Obviously that&#8217;s what most Webmasters would prefer:</p>
<blockquote><p>This is the behavior that we&#8217;ve done for the last several years, and webmasters are used to it. The NOINDEX meta tag gives a good way &#8212; in fact, one of the only ways &#8212; to completely remove all traces of a site from Google (another way is our <a href="http://www.google.com/webmasters/tools/removals">url removal tool</a>). That&#8217;s incredibly useful for webmasters.</p>
</blockquote>
<p><b>NOINDEX means don&#8217;t index</b>, search engines must respect such directives, even when the content isn&#8217;t <a href="http://sebastians-pamphlets.com/all-search-engines-except-msn-live-search-respect-the-401-barrier/">password protected</a> or <a href="http://sebastians-pamphlets.com/getting-urls-out-of-google-the-good-popular-definitive-way/">cloaked away</a> (redirected or hidden for crawlers but not for visitors). </p>
<p>The corner case that Google discovers a link and lists it on their SERPs before the page that carries a NOINDEX directive is crawled and deindexed isn&#8217;t crucial, and could be avoided by a (new) NOINDEX indexer directive in robots.txt, which is requested by search engines quite frequently. Ok, maybe Google&#8217;s <abbr title="Ms. Googlebot">BlitzCrawler&trade;</abbr> has to request robots.txt more often then.</p>
</dd>
<dt>Google should show a reference to NOINDEX&#8217;ed pages on their SERPs</dt>
<dd>
<p>Search quality and user experience are strong arguments:</p>
<blockquote><p>Our highest duty has to be to our users, not to an individual webmaster. When a user does a navigational query and we don&#8217;t return the right link because of a NOINDEX tag, it hurts the user experience (plus it looks like a Google issue). If a webmaster really wants to be out of Google without even a single trace, they can use Google&#8217;s url removal tool. The numbers are small, but we definitely see some sites accidentally remove themselves from Google. For example, if a webmaster adds a NOINDEX meta tag to finish a site and then forgets to remove the tag, the site will stay out of Google until the webmaster realizes what the problem is. In addition, we recently saw a spate of high-profile Korean sites not returned in Google because they all have a NOINDEX meta tag. If high-profile sites like [3 linked examples] aren&#8217;t showing up in Google because of the NOINDEX meta tag, that&#8217;s bad for users (and thus for Google).</p>
</blockquote>
<p>Search quality and searchers&#8217; user experience is also a strong argument for totally delisting NOINDEX&#8217;ed pages, because most Webmasters use this indexer directive to keep stuff that doesn&#8217;t provide value for searchers out of the search indexes. &lt;polemic&gt;I mean, how much weight have a few Korean sites when it comes to decisions that affect the whole Web?&lt;/polemic&gt;</p>
<p>If a Webmaster puts a NOINDEX directive by accident, that&#8217;s easy to spot in the site&#8217;s stats, considering the volume of traffic that Google controls. I highly doubt that a simple URI reference with an anchor text scrubbed from external links on Google SERPs would heal such a mistake. Also, Matt said that Google could add a NOINDEX check to the Webmaster Console.</p>
<p>The reference to the URI removal tools is out of context, because these tools remove an URI only for a short period of time and all removal requests have to be resubmitted repeatedly every few weeks. NOINDEX on the other hand is a way to keep an URI out of the index as long as this crawler directive is provided. </p>
<p>I&#8217;d say the sole argument for listing references to NOINDEX&#8217;ed pages that counts is misleading navigational searches. Of course that does not mean that Google may ignore the NOINDEX directive to show &#8211;with a linked reference&#8211; that they know a resource, despite the fact that the site owner has strictly forbidden such references on SERPs.</p>
</dd>
<dt>Something in between, Google should find a reasonable way to please both Webmasters and searchers</dt>
<dd>
<p>Quoting Matt again:</p>
<blockquote><p>The vast majority of webmasters who use NOINDEX do so deliberately and use the meta tag correctly (e.g. for parked domains that they don&#8217;t want to show up in Google). Users are most discouraged when they search for a well-known site and can&#8217;t find it. What if Google treated NOINDEX differently if the site was well-known? For example, if the site was in the Open Directory, then show a reference to the page even if the site used the NOINDEX meta tag. Otherwise, don&#8217;t show the site at all. The majority of webmasters could remove their site from Google, but Google would still return higher-profile sites when users searched for them.</p>
</blockquote>
<p>Whether or not a site is popular must not impact a search engine&#8217;s respect for a Webmaster&#8217;s decision to keep search engines, and their users, out of her realm. That reads like &#8220;Hey, Google is popular, so we&#8217;ve the right to go to Mountain View to pillage the Googleplex, acquiring everything we can steal for the public domain&#8221;. Neither Webmasters nor search engines should mimic Robin Hood. Also, lots of Webmasters highly doubt that Google&#8217;s idea of (link) popularity should rule the Web. <img src='http://sebastians-pamphlets.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>Whether or not a site is listed in the ODP directory is definitely not an indicator that can be applied here. Last time I looked the majority of the Web&#8217;s content wasn&#8217;t listed at DMOZ due to the lack of editors and various other reasons, and that includes gazillions of great and useful resources. I&#8217;m not bashing DMOZ here, but as a matter of fact it&#8217;s not comprehensive enough to serve as indicator for anything, especially not importance and popularity.</p>
<p>I strongly believe that there&#8217;s no such thing as a criterion suitable to mark out a two class Web.</p>
</dd>
</dl>
<h3>My take: Yes, No, Depends</h3>
<p>Google could enhance navigational queries &#8211;and even &#8220;I feel lucky&#8221; queries&#8211; that lead to a NOINDEX&#8217;ed page with a message like &#8220;The best matching result for this query was blocked by the site&#8221;. I wouldn&#8217;t mind if they mention the URI as long as it&#8217;s not linked.</p>
<p>In fact, the problem is the granularity of the existing indexer directives. NOINDEX is neither meant for nor capable of serving that many purposes. It is wrong to assign DISALLOW semantics to NOINDEX, and it is wrong to create two classes of NOINDEX support. Fortunately, we&#8217;ve more REP indexer directives that could play a role in this discussion.</p>
<p>NOODP, NOYDIR, NOARCHIVE and/or NOSNIPPET in combination with NOINDEX on a site&#8217;s home page, that is either a domain or subdomain, could indicate that search engines must not show references to the URI in question. Otherwise, if no other indexer directives elaborate NOINDEX, search engines could show references to NOINDEX&#8217;ed main pages. The majority of navigational search queries should lead to main pages, so that would solve the search quality issues.</p>
<p>Of course that&#8217;s not precise enough due to the lack of a specific directive that deals with references to forbidden URIs, but it&#8217;s way better than ignoring NOINDEX in its current meaning. </p>
<h3>A fair solution: NOREFERENCE</h3>
<p>If I&#8217;d make the decision at Google and couldn&#8217;t live with a <em>best matching search result blocked</em>&nbsp; message, I&#8217;d go for a new REP tag:</p>
<p>&#8220;NOINDEX, NOREFERENCE&#8221; in a robots meta tag &#8211;respectively Googlebot meta tag&#8211; or X-Robots-Tag forbids search engines to show a reference on their SERPs. In robots.txt this would look like <code><br />
<b>NOINDEX: /<br />
NOINDEX: /blog/<br />
NOINDEX: /members/<br />
&#8230;<br />
NOREFERENCE: /<br />
NOREFERENCE: /blog/<br />
NOREFERENCE: /members/<br />
&#8230;</b></code><br />
Search engines would crawl these URIs, and follow their links as long as there&#8217;s no NOFOLLOW directive either in robots.txt or a page specific instruction.</p>
<p>NOINDEX without a NOREFERENCE directive would instruct search engines not to index a page, but allows references on SERPs. Supporting this indexer directive both in robots.txt as well as on-the-page (respectively in the HTTP header for X-Robots-Tags) makes it easy to add NOREFERENCE on sites that hate search engine traffic. Also, a syntax variant like <code><b>NOINDEX=NOREFERENCE</b></code> for robots.txt could tell search eniges how they have to treat NOINDEX statements on site level, or even on site area level.</p>
<p>Even more appealing would be <code><b>NOINDEX=REFERENCE</b></code>, because only the very few Webmasters that would like to see their NOINDEX&#8217;ed URIs on Google&#8217;s SERPs would have to add a directive to their robots.txt at all. Unfortunately, that&#8217;s not doable for Google unless they can convice three well known Korean sites to edit their robots.txt. <img src='http://sebastians-pamphlets.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>&nbsp;</p>
<p>By the way, don&#8217;t miss out on my draft asking for <a href="http://sebastians-pamphlets.com/standardization-of-rep-tags-as-robots-txt-directives/">REP tag support in robots.txt</a>!</p>
<p>Anyway: <b>Dear Google, please don&#8217;t touch NOINDEX!</b> <img src='http://sebastians-pamphlets.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<hr />Copyright &copy; 2010 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/give-google-your-feedback-on-noindex/", "style": "big", "title": "@ALL: Give Google your feedback on NOINDEX, but read this pamphlet beforehand!" } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/give-google-your-feedback-on-noindex/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
