<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.2.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Sebastian's Pamphlets &#187; Search Quality</title>
	<link>http://sebastians-pamphlets.com</link>
	<description>If you've read my articles somewhere on the Internet, expect something different here.</description>
	<pubDate>Mon, 02 Jan 2012 20:39:44 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.3</generator>
	<language>en</language>
			<item>
		<title>Get the Google cop outta my shopping cart!</title>
		<link>http://sebastians-pamphlets.com/get-the-google-cop-out-of-my-shopping-cart/</link>
		<comments>http://sebastians-pamphlets.com/get-the-google-cop-out-of-my-shopping-cart/#comments</comments>
		<pubDate>Thu, 02 Dec 2010 16:46:06 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Crap]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/get-the-google-cop-out-of-my-shopping-cart/</guid>
		<description><![CDATA[
So now Google ranks my shopping SERPs by its opinion of customer service quality?
Do not want!
I&#8217;m perfectly satisfied with shopping search results ordered by relevancy and (link) popularity. I do not want Google to decide where I have to buy my stuff, just because an assclown treating his customers like shit got coverage in the [...]]]></description>
			<content:encoded><![CDATA[
<p>So now Google ranks my shopping <abbr title="Search Engine Result Page(s)">SERP</abbr>s by <a href="http://googleblog.blogspot.com/2010/12/being-bad-to-your-customers-is-bad-for.html">its opinion of customer service quality</a>?</p>
<h3><code style="color:red; font-size:16pt; font-weight:bolder; text-align:center;">Do not want!</code></h3>
<p><img  src="http://sebastians-pamphlets.com/img/posts/google-cop.png" width="239" height="371" align="right" alt="Google Cop" style="margin-left:5px;" />I&#8217;m perfectly satisfied with shopping search results ordered by relevancy and (link) popularity. I do not want Google to decide where I have to buy my stuff, just because an assclown treating his customers like shit got coverage in the <a rel="nofollow disgusting-story" href="http://www.nytimes.com/2010/11/28/business/28borker.html?_r=1&#038;pagewanted=all">NYT</a>.</p>
<p>If I&#8217;m old enough to have free access to the Internet and a credit card, then I&#8217;m capable of checking out a Web shop before I buy. I don&#8217;t need to be extremely Web savvy to fire up a search for [XXX sucks] before I click on &#8220;add to cart&#8221;. Hey, even my 13yo son applies way more sophisticated methods.  <strong>Google cannot and never will be able to create anything more reliable than my build-in bullshit-detector.</strong></p>
<p>Of course, it&#8217;s Google&#8217;s search engine. Matt&#8217;s right when he <a href="http://twitter.com/mattcutts/status/10066882308608000">states</a> &#8220;two different court cases have held that our search results are our <em>opinion</em> and protected under 1st amendment&#8221;. The problem is, sometimes I disagree with Google&#8217;s opinions. </p>
<p>Expressing an opinion about a site&#8217;s customer service by not showing it on the SERPs that more than 60% of this planet&#8217;s population use to find stuff is a slippery slope. A <a href="http://sphinn.com/story/163471#79994">very</a> slippery slope. It means that for example I cannot buy a pair of shoes for $40 (time of delivery 10 days, free shipping), because Google only points me to shops that sell the same pair of shoes for $100 (plus fedex overnight fees). Since when did Google&#8217;s mission statement change to &#8220;organize the world&#8217;s shopping expeditions&#8221;? Maybe I didn&#8217;t get an <a href="http://www.boutiques.com/about.py" title="A googly shopping experience">important memo</a>.</p>
<p>Not only that. Google is well known for producing heavy collateral damage when applying changes to commercial rankings. A simple software glitch could peculate the best deals on the Web, or ruin totally legit businesses suffering from fraudulent review spam spread by their competitors.</p>
<p>And finally, cross your heart, do <b>you</b> trust a search engine that far? Do you really expect Google to sort out the Web for you, not even asking how much of Google&#8217;s opinion you want to get applied when it comes to judging what appears on your personal search results? Not that Google will ever implement a <a href="http://www.seroundtable.com/archives/002003.html" title="like Yahoo's Mindset">slider</a> where you can tell how much of your common sense you&#8217;re willing to invest vs. Google&#8217;s choice of goog, er, good customer service &#8230; </p>
<p>Well, I could live with a warning put as an anchor text like &#8220;show what boatloads of ripped-off customers told Googlebot about XXX&#8221; or so, but I do want to get the whole picture, uncensored. </p>
<p>End of rant.</p>
<h3>Lets look at the algo change from a technical point of view:</h3>
<p>Credit where credit is <a href="http://twitter.com/NathanJohns/status/10395763276255232">due</a>, developing and deploying a filter that catches a fraudulent Web shop &#8220;gaming Google&#8221; out of billions of indexed pages within a few days is not trivial (what translates to &#8216;awesome job&#8217; coming from a geek).</p>
<p>It&#8217;s not so astonishing that this filter also picked 100 clones of the jerk <a href="http://searchengineland.com/googles-gold-standard-results-take-hit-new-york-times-57081">mentioned by the New York Times</a> for Google&#8217;s newish shitlist. Of course it didn&#8217;t catch just <a href="http://sphinn.com/story/163471#80001">another</a> fishy site, same <abbr title="Standard Operation Procedure">SOP</abbr>, owned by the same guy. That makes it kinda hand job, just executed by an algorithm. Explained in my <a href="http://twitter.com/mterenzio/status/10318118798753792">Twitter stream</a>: &#8220;<a href="http://twitter.com/davewiner/status/10277322070433792">@DaveWiner</a> I read that <a href="http://googleblog.blogspot.com/2010/12/being-bad-to-your-customers-is-bad-for.html">Google post</a> as &#8216;We realize there is a problem that we can&#8217;t solve yet. We have a short term fix for this jerk.&#8217;&#8221;, <a href="http://twitter.com/graywolf/status/10111710778105856">or</a> &#8220;so yeah, I stand by my statement: it&#8217;s a hand job to manipulate the press and keep the stock from moving.&#8221;</p>
<p>And that&#8217;s good news, at least for today&#8217;s shape of Google&#8217;s Web search. It means that Google does not yet rank the results of each and every search with commerial intent by Google&#8217;s rough estimate of the shop&#8217;s customer service quality.</p>
<p>Google&#8217;s ranking is still based on link popularity, <a href="http://sphinn.com/story/163471#79999">so</a> <strong>negative links are still a vote of confidence</strong>.</p>
<p>There are only so many not-totally-weak signals out there, and Google&#8217;s not to blame for heavily relying on one of the better ones: links. I don&#8217;t believe they&#8217;ll lower the importance of links anytime soon, at least not significantly. And why should they? I surely don&#8217;t want that. And I doubt it makes much sense, plus I doubt that Google can do that.</p>
<p>As for the meaning of links, well, I just hope that Google doesn&#8217;t try to guess intentions out of plain <a href="http://www.smart-it-consulting.com/article.htm?node=155&#038;page=90">A</a> elements and their context. That&#8217;s a must-fail project. I&#8217;ve developed some faith in the sanity and smartness of Google&#8217;s engineers over the years. I hope they won&#8217;t disappoint me now.</p>
<p>Of course one can express a link&#8217;s intention in a machine-readable way. For example with a microformat like <a href="http://microformats.org/wiki/vote-links" rel="nofollow">VoteLinks</a>. Unfortunately, nobody cares enough to actually make use of it.</p>
<p>Google&#8217;s very own <a href="http://sebastians-pamphlets.com/links/categories/?cat=nofollow">misconception</a>, er, microformat rel-nofollow, is even less reliable. Imagine a dead tired and overworked algo in the cellar of building 43 trying to figure out whether a particular link&#8217;s rel=&#8221;nofollow&#8221; was set</p>
<ul>
<li>to mark a <a href="http://sebastians-pamphlets.com/links/categories/?cat=paid-links">paid link</a></li>
<li>because the SEO next door said PageRank&reg; hoarding is cool</li>
<li>because at the webmaster&#8217;s preferred hangout nofollow&#8217;ing links was the topic of week 53/2005</li>
<li>because the webmaster <a href="http://www.seroundtable.com/archives/023329.html">bought Google&#8217;s FUD</a> and castrates all links except those leading to google.com just in case Google could penalize him for a badass one</li>
<li>to express that the link&#8217;s destination is a 404 page, so that the &#8220;PageRank&trade; leak&#8221;, er, link isn&#8217;t worth any link juice</li>
<li>because the author thankfully links back to a leading Web resource in his industry that linked to him as a honest recommendation, but is afraid of a <a href="http://sebastians-pamphlets.com/links/categories/?cat=reciprocal-links">reciprocal</a> <a href="http://sebastians-pamphlets.com/links/categories/?cat=risky-linkage">link penalty</a></li>
<li>because the author agrees with the linked page&#8217;s message, but doesn&#8217;t like the foul language used over there</li>
<li>because the author disagrees with the discussed, and therefore linked, destination page</li>
<li>just because some crappy CMS condomizes every 3rd link automatically for reasons not known to man</li>
<li>&#8230;</li>
</ul>
<p>Well, not even all Googlers like it. In fact, some teams decided to ignore it because of its weakness and widespread abuse.</p>
<p>The above said is only valid for links embedded in markup that allows machine-readable tagging of links. Even if such tags would be reliable, they don&#8217;t cover all references, aka hyperlinks, on the Web. Think of PDF, Flash, some client sided scripting, &#8230; and what about the gazillions of un-tagged links out there, put by folks who never heard of microformats?</p>
<p>Also, nobody links out anymore. We paste URIs into tiny textareas limited to 140 characters that don&#8217;t have room for meta data like microformats at all. And since Bing as well as Google use links in tweets for ranking purposes (<a href="http://searchengineland.com/what-social-signals-do-google-bing-really-count-55389">Web search and news</a>), how the fuck could even a smartass algo decide whether a tweet&#8217;s link points to crap or gold? Go figure.</p>
<p>And please don&#8217;t get me started on a possible use of <a href="http://textmap.com/company/google.htm">sentiment analysis</a> in rankings. To summarize, &#8220;FAIL&#8221; is printed in big bold letters all over Google&#8217;s (or any search engine for that matter) approach to rank search results by the quality of customer service based on signals scraped from unstructured data crawled on the Interwebs. So please, for the sake of my thin wallet, <strong>DEAR GOOGLE DON&#8217;T EVEN TRY IT!</strong> Thanks in advance.</p>
<hr />Copyright &copy; 2012 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/get-the-google-cop-out-of-my-shopping-cart/", "style": "big", "title": "Get the Google cop outta my shopping cart!" } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/get-the-google-cop-out-of-my-shopping-cart/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Buy Free VIAGRA® Online! No Shipping Costs!</title>
		<link>http://sebastians-pamphlets.com/buy-prescription-free-viagra-online-no-shipping-costs/</link>
		<comments>http://sebastians-pamphlets.com/buy-prescription-free-viagra-online-no-shipping-costs/#comments</comments>
		<pubDate>Tue, 30 Nov 2010 13:59:06 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Webspam]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/buy-prescription-free-viagra-online-no-shipping-costs/</guid>
		<description><![CDATA[
Your search for prescription free Viagra® ends here.

Pfizer just released the amazingly easy-to-understand Ultimate VIAGRA® DIY Guide (PDF, 30 illustrated pages). Look at the simple molecule on page one, cloning it is a breeze. Go brew your own! With a little help from your local alchemist, er, pharmacist, you can make even pills and paint [...]]]></description>
			<content:encoded><![CDATA[
<h3>Your search for prescription free <a href="http://www.viagra.com/">Viagra®</a> ends here.</h3>
<p><a href="http://buy.viagra.com/buy-real-viagra/buying-viagra-online.aspx" target="viagra"><img  src="http://sebastians-pamphlets.com/img/posts/original-viagra-pills.png" alt="Original VIAGRA® pills © viagra.com" width="500" height="216" /></a></p>
<p><a href="http://www.pfizer.com/">Pfizer</a> just released the amazingly easy-to-understand <a href="http://www.pfizer.com/files/products/uspi_viagra.pdf">Ultimate VIAGRA® DIY Guide</a> (PDF, 30 illustrated pages). Look at the simple molecule on page one, cloning it is a breeze. Go brew your own! With a little help from your local alchemist, er, pharmacist, you can make even pills and paint them blue. Next get an empty packet and glue, then print out six copies of the image above. As a seasoned DIY professional you&#8217;ll certainly manage to fake Pfizer&#8217;s pill box. Congrats. You&#8217;re awesome. </p>
<p>As for the promise of &#8220;no shipping costs&#8221;: Well, I don&#8217;t ship Viagra®, so it wouldn&#8217;t be fair to charge you with UPS costs * 7.5 (I&#8217;m such an angel sometimes!), don&#8217;t you agree?</p>
<p>By the way, if the above said sounds too complicated, there&#8217;s a shortcut: click on the image.</p>
<h3>Seriously</h3>
<p>Barry&#8217;s post about <a href="http://www.rustybrick.com/free-viagra-spam.html">Free Viagra® Links</a> inspired this pamphlet. Google&#8217;s [buy viagra online] <a href="http://google.com/search?q=buy+viagra+online">SERP</a> <a onclick="alert('It was never ever accurate!'); return false;" title="Your RSS reader will not execute this :(">still</a> is a <a href="http://sebastians-pamphlets.com/buying-cheap-viagra-algorithmically/">mess</a>. Obviously, Google doesn&#8217;t care about link spam influencing search results for money terms. Even low-life links can boost crap to the first SERP.</p>
<h3>About time to change that!</h3>
<p>Since Google doesn&#8217;t tidy up its Viagra® SERPs, lets help ourselves to the search quality we deserve. Most probably you&#8217;ve spotted that this pamphlet was created to funnel (search) traffic to Pfizer&#8217;s Viagra® outlet. Therefore, if you&#8217;re into search quality,  put up some links to this post. I promise there&#8217;s no better <del>way</del> <ins>magic</ins> to create clean Viagra® SERPs at Google.</p>
<p><textarea rows="3" cols="58" readonly><a href="http://sebastians-pamphlets.com/buy-prescription-free-viagra-online-no-shipping-costs/" rel="honest recommendation">Buy Cheap Viagra® Online</a></textarea></p>
<p>Dear reader, please copy the HTML code above and paste it onto your signatures, blog posts, social media profiles &#8230; everywhere. If you keep your links up forever, Google&#8217;s SERPs will remain useful until the Internet vanishes.</p>
<p>Disclaimer: No, I can&#8217;t even spiel &#8216;linkbait&#8217;. And no, I don&#8217;t promise not to replace this page with a sales pitch for some fake-ish Viagra®-clone once your link juice gained yours truly a top spot on said SERP. D&#8217;oh!</p>
<hr />Copyright &copy; 2012 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/buy-prescription-free-viagra-online-no-shipping-costs/", "style": "big", "title": "Buy Free VIAGRA® Online! No Shipping Costs!" } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/buy-prescription-free-viagra-online-no-shipping-costs/feed/</wfw:commentRss>
		</item>
		<item>
		<title>How to spam the hell out of Google&#8217;s new source attribution meta elements</title>
		<link>http://sebastians-pamphlets.com/spam-google-news-source-attribution-tags/</link>
		<comments>http://sebastians-pamphlets.com/spam-google-news-source-attribution-tags/#comments</comments>
		<pubDate>Tue, 16 Nov 2010 16:34:47 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Testing]]></category>

		<category><![CDATA[Webspam]]></category>

		<category><![CDATA[Spam]]></category>

		<category><![CDATA[Plagiarism]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/spam-google-news-source-attribution-tags/</guid>
		<description><![CDATA[
The moment you&#8217;ve read Google&#8217;s announcement and Matt&#8217;s question &#8220;What about spam?&#8221; you concluded &#8220;spamming it is a breeze&#8221;, right? You&#8217;re not alone.
Before we discuss how to abuse it, it might be a good idea to define it within its context, ok?
Playground
First of all, Google announced these meta tags on the official Google News blog&#160; [...]]]></description>
			<content:encoded><![CDATA[
<p>The moment you&#8217;ve read <a href="http://googlenewsblog.blogspot.com/2010/11/credit-where-credit-is-due.html">Google&#8217;s announcement</a> and <a href="http://searchengineland.com/google-creates-metatags-to-help-id-original-news-sources-56115">Matt&#8217;s question</a> &#8220;What about spam?&#8221; you concluded &#8220;spamming it is a breeze&#8221;, right? <a href="http://www.readwriteweb.com/archives/googles_new_honor_system_for_highlighting_original_journalism.php">You&#8217;re not</a> <a href="http://www.webpronews.com/topnews/2010/11/16/can-trust-in-journalism-be-boiled-down-to-meta-tags">alone</a>.</p>
<p>Before we discuss how to abuse it, it might be a good idea to define it within its context, ok?</p>
<h3>Playground</h3>
<p>First of all, Google announced these <span title="darn SEO copywriting, of course it's meta ELEMENTs">meta tags</span> on the official <i>Google <b>News</b> blog</i>&nbsp; for a reason. So when you plan to abuse it with your countless MFA proxies of Yahoo Answers, you most probably jumped on the wrong band wagon. Google supports the meta elements below in Google News only.</p>
<h3>syndication-source</h3>
<p>The first new indexer hint is <b>syndication-source</b>. It&#8217;s meant to tell Google the permalink of a particular news story, hence the author and all the folks spreading the word are asked to use it to point to the one &#8211;and only one&#8211; URI considered the source:</p>
<p><code>&lt;meta name="syndication-source" content="http://outerspace.com/news/ubercool-geeks-launched-google-hotpot.html" /&gt;</code></p>
<p>The meta element above is for instances of the story served from</code><br />
http://outerspace.com/breaking/page1.html<br />
http://outerspace.com/yyyy-mm-dd/page2.html<br />
http://outerspace.com/news/aliens-appreciate-google-hotpot.html<br />
http://outerspace.com/news/ubercool-geeks-launched-google-hotpot.html<br />
http://newspaper.com/main/breaking.html<br />
http://tabloid.tv/rehashed/from/rss/hot:alien-pot-in-your-bong.html<br />
</code>&#8230;</p>
<p>Don&#8217;t confuse it with the cross-domain rel-canonical link element. It&#8217;s not about canning duplicate content, it marks a particular story, regardless whether it&#8217;s somewhat rewritten or just reprinted with a different headline. It tells Google News to use the original URI when the story can be crawled from different URIs on the author&#8217;s server, and when syndicated stories on other servers are so similar to the initial piece that Google News prefers to use the original (the latter is my educated guess).</p>
<h3>original-source</h3>
<p>The second new indexer hint is <b>original-source</b>. It&#8217;s meant to tell Google the origin of the news itself, so the author/enterprise digging it out of the mud, as well as all the folks using it later on, are asked to declare who broke the story:</p>
<p><code>&lt;meta name="original-source" content="http://outerspace.com/news/ubercool-geeks-launched-google-hotpot.html" /&gt;</code></p>
<p>Say we&#8217;ve got two or more related news, like &#8220;Google fell from Mars&#8221; by cnn.com and &#8220;Google landed in Mountain View&#8221; by sfgate.com, it makes sense for latimes.com to publish a piece like &#8220;Google fell from Mars and landed in Mountain View&#8221;. Because latimes.com is a serious newspaper, they credit their sources not only with a mention or even embedded links, they do it machine-readable, too:</p>
<p><code>&lt;meta name="original-source" content="http://cnn.com/google-fell-from-mars.html" /&gt;</code><br />
<code>&lt;meta name="original-source" content="http://sfgate.com/google-landed-in-mountain-view.html" /&gt;</code></p>
<p>It&#8217;s a matter of course that both cnn.com and sfgate.com provide such an original-source meta element on their pages, in addition to the syndication-source meta element, both pointing to their very own coverage. </p>
<p>If a journalist grabbed his breaking news from a secondary source telling &#8220;CNN reported five minutes ago that Google&#8217;s mothership started from Venus, and the LA Times spotted it crashing on Jupiter&#8221;, he can&#8217;t be bothered with looking at the markup and locating those meta elements in the head section, he has a deadline for his piece &#8220;Why Web search left Planet Earth&#8221;. It&#8217;s just fine with Google News when he puts</p>
<p><code>&lt;meta name="original-source" content="http://cnn.com/" /&gt;</code><br />
<code>&lt;meta name="original-source" content="http://sfgate.com/" /&gt;</code></p>
<h3>Fine-prints</h3>
<p>As always, the most interesting stuff is hidden on a <a href="http://www.google.com/support/news_pub/bin/answer.py?answer=191283">help page</a>:</p>
<blockquote><p>At this time, Google News will not make any changes to article ranking based on this tags.</p></blockquote>
<blockquote><p>If we detect that a site is using these metatags inaccurately (e.g., only to promote their own content), we&#8217;ll reduce the importance we assign to their metatags. And, as always, we reserve the right to remove a site from Google News if, for example, we determine it to be spammy.</p></blockquote>
<blockquote><p>As with any other publisher-supplied metadata, we will be taking steps to ensure the integrity and reliability of this information.</p></blockquote>
<h3>It&#8217;s a field test</h3>
<blockquote><p>We think it is a promising method for detecting originality among a diverse set of news articles, but we won&#8217;t know for sure until we&#8217;ve seen a lot of data. By releasing this tag, we&#8217;re asking publishers to participate in an experiment that we hope will improve Google News and, ultimately, online journalism. [&#8230;] Eventually, if we believe they prove useful, these tags will be incorporated among the many other signals that go into ranking and grouping articles in Google News. <b>For now, syndication-source will only be used to distinguish among groups of duplicate identical articles, while original-source is only being studied and will not factor into ranking.</b> [emphasis mine]</p></blockquote>
<h3>Spam potential</h3>
<p>Well, we do know that Google Web search has a spam problem, IOW even a few so-1999-webspam-tactics still work to some extent. So we tend to classify a vague threat like &#8220;If we find sites abusing these tags, we may [&#8230;] remove [those] from Google News entirely&#8221; as FUD, and spam away. Common sense and experience tells us that a smart marketer will make money from everything spammable. </p>
<p>But: we&#8217;re not talking about Web search. Google News is a clearly laid out environment. There are only so many sites covered by Google News. Even if Google wouldn&#8217;t be able to develop algos analyzing all source attribution attributes out there, they do have the resources to identify abuse using manpower alone. Most probably they will do both.</p>
<p>They clearly told us that they will compare those meta data to <a href="http://www.seobythesea.com/?p=4609">other signals</a>. And that&#8217;s not only very weak indicators like &#8220;timestamp first crawled&#8221; or &#8220;first heard of via pubsubhubbub&#8221;. It&#8217;s not that hard to isolate particular news, gather each occurrence as well as source mentions within, and arrange those on a time line with clickable links for QC folks who most certainly will identify the actual source. Even a few spot tests daily will soon reveal the sites whose source attribution meta tags are questionable, or even spammy.</p>
<p>If you&#8217;re still not convinced, fair enough. Go spam away. Once you&#8217;ve lost your entry on the whitelist, your free traffic from Google News, as well as from news-one-box results on conventional SERPs, is toast.</p>
<h3>Last but not least, a fair warning</h3>
<p>Now, if you still want to use source attribution meta elements on your non-newsworthy MFA sites to claim owership of your scraped content, feel free to do so. Most probably Matt&#8217;s team will appreciate just another &#8220;I&#8217;m spamming Google&#8221; signal.</p>
<p>Not that reprinting scraped content is considered shady any more: <a href="http://www.huffingtonpost.com/2010/11/12/george-bush-book-decision-points_n_782731.html">even a former president does it shamelessly</a>. It&#8217;s just the almighty Google in all of its evilness that penalizes you for considering all on-line content public domain.</p>
<hr />Copyright &copy; 2012 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/spam-google-news-source-attribution-tags/", "style": "big", "title": "How to spam the hell out of Google's new source attribution meta elements" } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/spam-google-news-source-attribution-tags/feed/</wfw:commentRss>
		</item>
		<item>
		<title>WTF have Google, Bing, and Yahoo cooking?</title>
		<link>http://sebastians-pamphlets.com/wtf-seo-standard/</link>
		<comments>http://sebastians-pamphlets.com/wtf-seo-standard/#comments</comments>
		<pubDate>Fri, 17 Sep 2010 09:08:25 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[MSN]]></category>

		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[SEO]]></category>

		<category><![CDATA[Yahoo]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/wtf-seo-standard/</guid>
		<description><![CDATA[
Folks, I’ve got good news. As a matter of fact, they’re so good that they will revolutionize SEO. A little bird told me, that the major search engines secretly teamed up to solve the problem of context and meaning as a ranking factor. 
They’ve invented a new Web standard that allows content producers to steer [...]]]></description>
			<content:encoded><![CDATA[
<p>Folks, I’ve got good news. As a matter of fact, they’re so good that they will revolutionize SEO. A little bird told me, that the major search engines secretly teamed up to solve the problem of context and meaning as a ranking factor. </p>
<p>They’ve invented a new Web standard that allows content producers to steer search engine ranking algos. Its code name is <strong>ADL</strong>, probably standing for Aided Derivative Latch, a smart technology based on the groundwork of addressing tidbits of information developed by Hollerith and Neumann decades ago.</p>
<p>According to my sources, ADL will be launched next month at <a href="http://searchmarketingexpo.com/east">SMX East</a> in New York City. In order to get you guys primed in a timely manner, here I’m going to leak the specs:</p>
<h3>WTF - The official SEO standard, supported by Google, Yahoo &amp; Bing</h3>
<p>Word Targeting Funnel (<b>WTF</b>) is a set of indexer directives that get applied to Web resources as meta data. WTF comes with a few subsets for special use cases, details below.  Here’s an example:</p>
<p><code>&lt;meta name=&quot;WTF&quot; content=&quot;document context&quot;  href=&quot;http://google.com/search?q=WTF&quot;  /&gt;</code><br />
<br />This directive tells search engines, that the content of the page is closely related to the resource supplied in the META element’s HREF attribute.</p>
<p>As you’ve certainly noticed, you can target a specific SERP, too. That’s somewhat complicated, because the engineers couldn’t agree <i>which</i> search engine should define a document’s search query context. Fortunately, they finally found this compromise:<br />
<br /><code>&lt;meta name=&quot;WTF&quot; content=&quot;document context&quot;  href=&quot;http://google.com/search?q=WTF || http://www.bing.com/search?q=WTF || http://search.yahoo.com/search?q=WTF&quot;  /&gt;</code><br />
<br />As far as I know, this will even work if you change the order of URIs. That is, if you’re a Bing fanboy, you can mention Bing before Google and Yahoo.</p>
<p>A more practical example, taken from an affiliate’s sales pitch for viagra that participated in the BETA test, leads us to the first subset:</p>
<h3>Subset <b>WTFm</b> &#8212; Word Targeting Funnel for medical terms</h3>
<p><code>&lt;meta name=&quot;WTF&quot; content=&quot;document context&quot;  href=&quot;http://www.pfizer.com/files/products/uspi_viagra.pdf&quot;  /&gt;</code><br />
<br />This directive will convince search engines that the offered product indeed is not a clone like Cialis.</p>
<h3>Subset <b>WTFa</b> &#8212; Word Targeting Funnel for acronyms</h3>
<p><code>&lt;meta name=&quot;WTFa&quot; content=&quot;WTF&quot;  href=&quot;http://www.wtf.org/&quot;  /&gt;</code><br />
<br />When a Web resource contains the acronym &#8220;WTF&#8221;, search engines will link it to the <em>World Taekwondo Federation</em>,  not to <em>Your Ranting and Debating Resource</em> at www.wtf.com.</p>
<h3>Subset <b>WTFo</b> &#8212; Word Targeting Funnel for offensive language</h3>
<p><code>&lt;meta name=&quot;WTFo&quot; content=&quot;meaning of terms&quot;  href=&quot;http://www.noslang.com/&quot;  /&gt;</code><br />
<br />If a search engine doesn’t know the meaning of terms I really can’t quote here, it will lookup the <a href="http://www.noslang.com/">Internet Slang Directory</a>. You can define alternatives, though:<br />
<br /><code>&lt;meta name=&quot;WTFo&quot; content=&quot;alternate meaning of terms&quot;  href=&quot;http://dictionary.babylon.com/language/slang/low-life-glossary/&quot;  /&gt;</code></p>
<h3>WTF, even more?</h3>
<p>Of course we’ve got more subsets, like <b>WTFi</b> for instant searches. Because I appreciate unfair advantages, I won’t reveal more. Just one more goody: it works for PDF, Flash content and heavily ajax’ed stuff, too.</p>
<p>This is the very first newish indexer directive that search engines introduce with support for both META elements and HTTP headers as well. Like with the <a href="http://sebastians-pamphlets.com/handling-googles-neat-x-robots-tag-sending-rep-header-tags-with-php/">X-Robots-Tag</a>, you can use an <code><b>X-WTF-Tag</b></code> HTTP header:<br />
<code>X-WTF-Tag: Name: WTFb, Content: SEO Bullshit, Href: <a href="http://seobullshit.com/">http://seobullshit.com/</a></code> </p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>As for the little bird, well, that’s a lie. Sorry. There’s no such bird. It’s bugs I left last time I visited <a href="http://labs.google.com/">Google’s labs</a>:<br />
<code>&lt;meta name=&quot;WTF&quot; content=&quot;bug,bugs,bird,birds&quot;  href=&quot;<a href="http://www.spylife.com/keysnoop.html">http://www.spylife.com/keysnoop.html</a>&quot;  /&gt;</code></p>
<hr />Copyright &copy; 2012 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/wtf-seo-standard/", "style": "big", "title": "WTF have Google, Bing, and Yahoo cooking?" } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/wtf-seo-standard/feed/</wfw:commentRss>
		</item>
		<item>
		<title>OMFG - Google sends porn punters to my website &#8230;</title>
		<link>http://sebastians-pamphlets.com/make-risk-free-beer-money-from-porn-traffic/</link>
		<comments>http://sebastians-pamphlets.com/make-risk-free-beer-money-from-porn-traffic/#comments</comments>
		<pubDate>Wed, 11 Aug 2010 18:11:48 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Internet Marketing]]></category>

		<category><![CDATA[Redirects]]></category>

		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/make-risk-free-beer-money-from-porn-traffic/</guid>
		<description><![CDATA[
In todays GWC doctor&#8217;s office, the webmaster of an innocent orphanage website asks Google&#8217;s Matt Cutts:
[My site] is showing up for searches on &#8216;girls in bathrooms&#8217; because they have an article about renovating the girls bathroom! What do you think of the idea if a negative keyword meta tag to block irrelevant searches? [sic!]
Well, we [...]]]></description>
			<content:encoded><![CDATA[
<p>In todays GWC doctor&#8217;s office, the webmaster of an innocent orphanage website asks Google&#8217;s Matt Cutts:</p>
<blockquote><p>[My site] is showing up for searches on &#8216;girls in bathrooms&#8217; because they have an article about renovating the girls bathroom! What do you think of the idea if a negative keyword meta tag to block irrelevant searches? [sic!]</p></blockquote>
<p><b>Well, we don&#8217;t know what the friendly guy from Google recommends &#8230;</b></p>
<p><object style="display:inline; height: 100px; width: 133px;" onmouseover="this.height='344px'; this.width='425';" onmouseout="this.height='100px;'; this.width='133px;';">
<param name="movie" value="http://www.youtube.com/v/mvYZa3NZ1HE">
<param name="allowFullScreen" value="true">
<param name="allowScriptAccess" value="always"><embed src="http://www.youtube.com/v/mvYZa3NZ1HE" type="application/x-shockwave-flash" allowfullscreen="true" allowScriptAccess="always" ></object><img src="http://sebastians-pamphlets.com/img/posts/omfg-women.png" style="margin-left:50px; margin-bottom:25px;" /></p>
<p><b>&#8230; but my dear readers do know that my bullshit detector, faced with such a moronic idea, shouts out in agony:</b></p>
<h3>There&#8217;s no such thing as bad traffic, just weak monetizing!</h3>
<p>Ok, Ok, Ok &#8230; every now and then each and every webmaster out there suffers from misleaded search engine ranking algos, that send shitloads of totally unrelated search traffic. For example, when you search for [<a href="http://google.com/search?q=how+to+fuck+a+click&#038;safe=off">how to fuck a click</a>], you won&#8217;t expect that Google considers <a href="http://sebastians-pamphlets.com/how-to-turn-click-tracking-into-miserable-failure/">this geeky pamphlet</a> the very best search result. Of course Google should&#8217;ve detected your <a href="http://google.com/search?q=how+to+fuck+a+chick&#038;safe=off">NSFW-typo</a>. Shit happens. Deal with it.</p>
<p>On the other hand, search traffic is free, so there&#8217;s no valid reason to complain. Instead of asking Google for a minus-keyword REP directive, one should think of clever ways to monetize unrelated traffic without wasting bandwidth.</p>
<p>You want to monetize irrelevant traffic from searches for smut in a way that nobody can associate your site with porn. That&#8217;s doable. Here&#8217;s how it works:</p>
<h3>Make risk-free beer money from porn traffic with a non-adult site</h3>
<p>Copy those slimy phrases from your keyword stats and paste them into Google&#8217;s search box. Once you find an adult site that seems to match the smut surfer&#8217;s needs better than your site, click on the search result, and on the landing page search for a &#8220;webmasters&#8221; link that points to their affiliate program. Sign up and save your customized affiliate link.</p>
<p>Next add some PHP code to your scripts. Make absolutely sure it gets executed before you output any other content, even whitespace:</p>
<p><code>&lt;?php </code> &nbsp;<a onclick="showContent('code_getOffsiteUri');">Show all code</a></p>
<p id="code_getOffsiteUri" style="display:none;"><code>function getReferrer () {<br />
    return $_SERVER["HTTP_REFERER"];<br />
}<br />
function getOffsiteUri() {<br />
    $searchQuery = stristr(getReferrer(), "q=");<br />
    $trash = stristr($searchQuery, "&#038;");<br />
    $searchQuery = str_replace($trash, "", $searchQuery);<br />
    $searchQuery = str_replace("+", " ", $searchQuery);<br />
    $searchQuery = str_replace("&#038;", " ", $searchQuery);<br />
    $searchQuery = str_replace("%20", " ", $searchQuery);<br />
    while (stristr($searchQuery, "  ")) {<br />
        $searchQuery = str_replace("  ", " ", $searchQuery);<br />
    }<br />
    // map irrelevant search queries to sponsor URIs<br />
    if (stristr($searchQuery, "teens in bathroom")) {<br />
        return "http://someteenpornsite.com/landingpage?affID=4711";<br />
    }<br />
}</code></p>
<p><code>$betterMatch = getOffsiteUri();<br />
if ($betterMatch) {<br />
   header("HTTP/1.1 307 Here's your smut", TRUE, 307);<br />
   header("Location: $betterMatch");<br />
   exit;<br />
}<br />
?&gt;</code> Refine the simplified code above. Use a database table to store the mappings &#8230;</p>
<p>Now a surfer coming from a SERP like <code><br />http://google.com/search?num=100&#038;q=nude+teens+in+bathroom&#038;safe=off</code> <br />will get redirected to <code><br />http://someteenpornsite.com/landingpage?affID=4711</code><br /> You&#8217;re using a 307 redirect because it&#8217;s not cached by a user agent, so that when you later on find a porn site that converts your traffic better, you can redirect visitors to another URI.</p>
<p>As you probably know, search engines don&#8217;t approve duplicate content. Hence it wouldn&#8217;t be a bright idea to put up x-rated stuff (all smut is duplicate content by design) onto your site to fulfil the misleaded searcher&#8217;s needs.</p>
<p>Of course you can use the technique outlined above to protect searchers from landing on your contact/privacy page, too, when in fact your signup page is their desired destination.</p>
<h3>Shiny whitehat disclaimer</h3>
<p>If you&#8217;re afraid of the possibility that the allmighty Google might punish you for your well meant attempt to fix it&#8217;s bugs, relax.</p>
<p>A search engine misinterpreting your content so badly, failed miserably. Your bugfix actually improves their search quality. Search engines can&#8217;t force you to report such flaws, they just kindly ask for voluntary feedback.</p>
<p>If search engines dislike smart websites that find related content on the Interwebs in case the search engine delivers shitty search results, they can act themselves. Instead of penalizing webmasters that react to flaws in their algos, they&#8217;re well advised to adjust their scoring. I mean, if they stop sending smut traffic to non-porn sites, their users don&#8217;t get redirected any longer. It&#8217;s that simple.</p>
<hr />Copyright &copy; 2012 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/make-risk-free-beer-money-from-porn-traffic/", "style": "big", "title": "OMFG - Google sends porn punters to my website ..." } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/make-risk-free-beer-money-from-porn-traffic/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Cloaking is good for you. Just ignore Bing&#8217;s/Google&#8217;s guidelines.</title>
		<link>http://sebastians-pamphlets.com/cloaking-is-good-for-your-vistors/</link>
		<comments>http://sebastians-pamphlets.com/cloaking-is-good-for-your-vistors/#comments</comments>
		<pubDate>Mon, 05 Jul 2010 18:24:08 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Web development]]></category>

		<category><![CDATA[Usability]]></category>

		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Webspam]]></category>

		<category><![CDATA[SEO]]></category>

		<category><![CDATA[Cloaking]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/cloaking-is-good-for-your-vistors/</guid>
		<description><![CDATA[
Summary first: If you feel the need to cloak, just do it within reason. Don&#8217;t cloak because you can, but because it&#8217;s technically the most elegant procedure to accomplish a Web development task. Bing and Google can&#8217;t detect your (in no way deceptive) intend algorithmically. Don&#8217;t spam away, though, because you might leave trails besides [...]]]></description>
			<content:encoded><![CDATA[
<p>Summary first: If you feel the need to cloak, just do it within reason. Don&#8217;t cloak because you can, but because it&#8217;s technically the most elegant procedure to accomplish a Web development task. Bing and Google can&#8217;t detect your (in no way deceptive) intend algorithmically. Don&#8217;t spam away, though, because you might leave trails besides cloaking alone, if you aren&#8217;t good enough at spamming search engines. Keep your users interests in mind. Don&#8217;t comply to search engine guidelines as set in stone, but to a reasonable level, for example when those <a href="http://www.youtube.com/watch?v=XWfqyy7J34s">force you to comply to Web standards</a> that make more sense than the fancy idea you&#8217;ve developed on internationalization, based on detecting browser language settings or so.</p>
<p><img src="http://sebastians-pamphlets.com/img/posts/penalizing-cloaking-is--bullshit.png" width="250" height="376" align="right" alt="search engine guidelines are bullshit WRT cloaking" title="Search engines must not penalize cloaking" style="margin-left:5px;" />This pamphlet is an opinion piece. The above said should be considered best practice, even by search engines. Of course it&#8217;s not, because search engines can and do fail, just like a webmaster who takes my statement &#8220;go cloak away if it makes sense&#8221; as technical advice and gets his search engine visibility tanked the hard way.</p>
<h3>WTF is cloaking?</h3>
<p>Cloaking, also known as IP delivery, means delivering content tailored for specific users who are identified primarily by their IP addresses, but also by user agent (browser, crawler, screen reader&#8230;) names, and whatnot. Here&#8217;s a simple demonstration of this technique. The content of the next paragraph differs depending on the user requesting this page. Googlebot, Googlers, as well as Matt Cutts at work, will read a personalized message:</p>
<p><em>Dear visitor, thanks for your visit from 38.107.179.211 (38.107.179.211).</em></p>
<p>You surely can imagine that cloaking opens <del>a can of worms</del> <ins>lots of opportunities to enhance a user&#8217;s surfing experience</ins>, besides &#8220;stalking&#8221; particular users like Google&#8217;s head of WebSpam.</p>
<h3>Why do search engines dislike cloaking?</h3>
<p>Apparently they don&#8217;t. They use IP delivery themselves. When you&#8217;re traveling in europe, you&#8217;ll get hints like &#8220;go to Google.fr&#8221; or &#8220;go to Google.at&#8221; all the time. That&#8217;s google.com checking where you are, trying to lure you into their regional services.</p>
<p>More seriously, there&#8217;s a so-called &#8220;dark side of cloaking&#8221;. Say you&#8217;re a <a href="http://fantomaster.com/fantomNews/archives/2010/07/08/fantomas-shadowmaker/">seasoned Internet marketer</a>, then you could show Googlebot an educational page with compelling content under an URI like &#8220;/games/poker&#8221; with an X-Robots-Tag HTTP header telling &#8220;noarchive&#8221;, whilst surfers (search engine users) supplying an HTTP_REFERER and not coming from employee.google.com get redirected to poker dot com (simplified example).</p>
<p>That&#8217;s hard to detect for Google&#8217;s WebSpam team. Because they don&#8217;t do evil themselves, they can&#8217;t officially operate sneaky bots that use for example AOL as their ISP to compare your spider fodder to pages/redirects served to actual users.</p>
<p>Bing sends out spam bots that request your pages &#8220;as a surfer&#8221; in order to discover deceptive cloaking. Of course those bots can be identified, so professional spammers serve them their spider fodder. Besides burning the bandwidth of non-cloaking sites, Bing doesn&#8217;t accomplish anything useful in terms of search quality.</p>
<p>Because search engines can&#8217;t detect cloaking properly, not to speak of a cloaking webmaster&#8217;s intentions, they&#8217;ve launched webmaster guidelines (FUD) that forbid cloaking at all. All Google/Bing reps tell you that cloaking is an evil black hat tactic that will get your site penalized or even banned. By the way, the same goes for perfectly legit &#8220;hidden content&#8221; that&#8217;s invisible on page load, but viewable after a mouse click on a &#8220;learn more&#8221; widget/link or so.</p>
<h3>Bullshit.</h3>
<p>If your competitor makes creative use of IP delivery to enhance their visitors&#8217; surfing experience, you can file a spam report for cloaking and Google/Bing will ban the site eventually. Just because cloaking <em>can</em> be used with deceptive intent. And yes, it works this way. See below.</p>
<p>Actually, those spam reports trigger a review by a human, so maybe your competitor gets away with it. But search engines also use spam reports to develop spam filters that penalize crawled pages totally automatted. Such filters can fail, and &#8211;trust me&#8211; they do fail often. Once you must optimize your content delivery for particular users or user groups yourself, such a filter could tank your very own stuff by accident. So don&#8217;t snitch on your competitors, because tomorrow they&#8217;ll return the favor.</p>
<h3>Enforcing a &#8220;do not cloak&#8221; policy is evil</h3>
<p>At least Google&#8217;s WebSpam team comes with cojones. They&#8217;ve even <a href="http://searchengineland.com/google-adwords-help-cloaks-to-google-gets-banned-45541">banned their very own help pages</a> for &#8220;<a href="http://google.com/search?hl=en&#038;q=matt+cutts+cloaking&#038;num=13&#038;safe=off">cloaking</a>&#8220;, although those didn&#8217;t serve porn to minors searching for SpongeBob images with safe-search=on.</p>
<p>That&#8217;s overdrawn, because the help files of any Google product aren&#8217;t usable without a search facility. When I click &#8220;help&#8221; in any Google service like AdWords, I get either blank pages, and/or links within the help system are broken because the destination pages were deindexed for cloaking. Plain evil, and counter productive.</p>
<p>Just because Google&#8217;s help software doesn&#8217;t show ads and related links to Googlebot, those pages aren&#8217;t guilty of deceptive cloaking. Ms Googlebot won&#8217;t pull the plastic, so it makes no sense to serve her advertisements. Related links are context sensitive just like ads, so it makes no sense to persist them in Google&#8217;s crawling cache, or even in Google&#8217;s search index. Also, as a user I really don&#8217;t care whether Google has crawled the same heading I see on a help page or not, as long as I get directed to relevant content, that is a paragraph or more that answers my question.</p>
<p>When a search engine doesn&#8217;t deliver the very best search results intentionally, just because those pages violate an outdated and utterly useless policy that rules fraudulent tactics in a shape lastly used in the last century and doesn&#8217;t take into account how the Internet works today, I&#8217;m pissed.</p>
<p>Maybe that&#8217;s not bad at all when applied to Google products? Bullshit, again. The same happens to any other website that doesn&#8217;t fit Google&#8217;s weird idea of &#8220;serving the same content to users and crawlers&#8221;. I mean, as long as Google&#8217;s crawlers come from US IPs only, how can a US based webmaster serve the same content in German language to a user coming from Austria and Googlebot, both requesting a URI like &#8220;/shipping-costs?lang=de&#8221; that has to be different for each user because shipping a parcel to Germany costs $30.00 and a parcel of the same weight shipped to Vienna costs $40.00? Don&#8217;t tell me bothering a user with shipping fees for all regions in CH/AT/DE all on one page is a good idea, when I can reduce the information overflow to a tailored info of just one shipping fee that my user expects to see, followed by a link to a page that lists shipping costs for all european countries, or all countries where at least some folks might speak/understand German.</p>
<p>Back to Google&#8217;s ban of its very own help pages that hid AdSense code from Googlebot. Of course Google wants to see what surfers see in order to deliver relevant search results, and that might include advertisements. However, surrounding ads don&#8217;t necessarily obfuscate the page&#8217;s content. Ads served instead of content do. So when Google wants to detect ad laden thin pages, they need to become smarter. Penalizing pages that don&#8217;t show ads to search engine crawlers is a bad idea for a search engine, because not showing ads to crawlers is a good idea, not only bandwidth-wise, for a webmaster.</p>
<p>Managing this dichotomy is the search engine&#8217;s job. They shouldn&#8217;t expect webmasters to help them solving their very own problems (maintaining search quality). In fact, bothering webmasters with policies solely put because search engine algos are fallible and incapable is plain evil. The same applies to instruments like rel-nofollow (launched to help Google devaluing spammy links but backfiring enormously) or Google&#8217;s war on paid links (as if not each and every link on the whole Internet is paid/bartered for, somehow).</p>
<p>What do you think, should search engines ditch their way too restrictive &#8220;don&#8217;t cloak&#8221; policies? <a href="http://twitter.com/home?status=Hey+@Google+@Bing,+go+ditch+your+outdated+webmaster+guidelines!+http%3A%2F%2Fsebastians-pamphlets.com/cloaking-is-good-for-your-vistors/" target="twitter" title="Stop search engines that tyrannize webmasters!"><b>Click to vote:</b> <img src="http://sebastians-pamphlets.com/img/twitter-icon.gif" width="10" height="10" style="border:none;" alt="Stop search engines that tyrannize webmasters!"  /></a></p>
<p> </p>
<p><b>Update 2010-07-06:</b> Don&#8217;t miss out on Danny Sullivan&#8217;s &#8220;<strong>Google be fair!</strong>&#8221; appeal, posted today: <a href="http://searchengineland.com/why-google-should-ban-its-own-help-pages-45781">Why Google Should Ban Its Own Help Pages — But Also Shouldn’t</a></p>
<p> <!-- Processed by EzStatic --></p>
<hr />Copyright &copy; 2012 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/cloaking-is-good-for-your-vistors/", "style": "big", "title": "Cloaking is good for you. Just ignore Bing's/Google's guidelines." } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/cloaking-is-good-for-your-vistors/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Ditch the spam on SERPs, pretty please?</title>
		<link>http://sebastians-pamphlets.com/i-want-clean-serps/</link>
		<comments>http://sebastians-pamphlets.com/i-want-clean-serps/#comments</comments>
		<pubDate>Thu, 17 Jun 2010 21:15:06 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Webspam]]></category>

		<category><![CDATA[Spam Report]]></category>

		<category><![CDATA[Crap]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/i-want-clean-serps/</guid>
		<description><![CDATA[
Say there&#8217;s a search engine that tries very hard to serve relevant results for long tail search queries. Maybe it even accepted that an algo change &#8211;supposed to wipe out shitloads of thin pages from its long tail search result pages (SERPs)&#8211; is referred to as #MayDay. One should think that this search engine isn&#8217;t [...]]]></description>
			<content:encoded><![CDATA[
<p>Say there&#8217;s a search engine that tries very hard to serve relevant results for long tail search queries. Maybe it even accepted that an algo change &#8211;supposed to wipe out shitloads of thin pages from its long tail search result pages (SERPs)&#8211; is referred to as #MayDay. One should think that this search engine isn&#8217;t exactly eager to annoy its users with crappy mash-up pages consisting of shabby stuff scraped from all known sources of duplicate content on the whole InterWebs.</p>
<p>Wrong.</p>
<p>Prominent SE spammers like <a href="http://mahalo.com" rel="nofollow" onclick="alert('You didn\'t really believe that I let you go to view spam, right?'); return false;">Mahalo</a> still flood the visible part of search indexes with boatloads of crap that never should be able to cheat its way onto any SERP, not even via a [site:spam.com] search. Learn more from <a href="http://www.seobook.com/black-hat-seo-case-study">Aaron</a> and <a href="http://smackdown.blogsblogsblogs.com/category/scams/">Michael</a>, who&#8217;ve both invested their valuable time to craft out <a href="http://smackdown.blogsblogsblogs.com/2010/06/17/need-help-understanding-the-latest-mahalo-spam/">detailled spam reports</a>, <a href="http://twitter.com/mattcutts/status/16420030375">to no avail</a>. </p>
<p>Frustrating. </p>
<p>Wait. Why does a bunch of spammy Web pages creates such a fuss? Because they&#8217;re findable in the search index. Of course a search engine must crawl all the WebSpam out there, and its indexer has to judge the value of all the content it gets feeded with. But there&#8217;s absolutely no need to bother the query engine, that gathers and ranks the stuff presented on the SERPs, with crap like that.</p>
<p>Dear Google, why do you annoy your users with spam created by &#8220;a scheme that your automated system handles quite well&#8221; at all? Those awesome spam filters should just flag crappy pages as not-SERP-worthy, so that they can never see the daylight at google.com/search. I mean, why should any searcher be at risk of pulling useless search results from your index? Hopefully not because these misleaded searchers tend to click on lots of Google ads on said pages, right?</p>
<p>I&#8217;d rather enjoy an empty SERP for an exotic search query, than suffer from a single link to a useless page plastered with huge ads, even if it comes with a tiny portion of stolen content that might be helpful if pointing to the source.</p>
<p>Do you feel like me? Speak out!</p>
<p><a href="http://twitter.com/home?status=Hey+@GoogleWebspam,+I+dislike+Mahalo+spam+on+your+SERPs!+http%3A%2F%2Fsebastians-pamphlets.com/i-want-clean-serps/+%23spam-report" target="twitter" title="Tweet Your Spam Report!"><strong>Hey Google, I dislike spam on your SERPs! #spam-report</strong> <img src="http://sebastians-pamphlets.com/img/twitter-icon.gif" width="10" height="10" style="border:none;" alt="Tweet Your Plea For Clean SERPs!"  /></a></p>
<hr />Copyright &copy; 2012 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/i-want-clean-serps/", "style": "big", "title": "Ditch the spam on SERPs, pretty please?" } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/i-want-clean-serps/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Google went belly-up: SERPs sneakily redirect to FPAs</title>
		<link>http://sebastians-pamphlets.com/google-serps-sneakily-redirect-to-ads/</link>
		<comments>http://sebastians-pamphlets.com/google-serps-sneakily-redirect-to-ads/#comments</comments>
		<pubDate>Wed, 12 May 2010 17:06:19 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Redirects]]></category>

		<category><![CDATA[Webspam]]></category>

		<category><![CDATA[Spam Report]]></category>

		<category><![CDATA[Cloaking]]></category>

		<category><![CDATA[Crap]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/google-serps-sneakily-redirect-to-ads/</guid>
		<description><![CDATA[
I&#8217;m pissed. I do know I shouldn&#8217;t blog in rage, but Google redirecting search engine result pages to totally useless InternetExplorer ads just fires up my ranting machine.
What does the almighty Google say about URIs that should deliver useful content to searchers, but sneakily redirect to full page ads? Here you go. Google&#8217;s webmaster guidelines [...]]]></description>
			<content:encoded><![CDATA[
<p>I&#8217;m pissed. I do know I shouldn&#8217;t blog in rage, but Google redirecting search engine result pages to totally useless InternetExplorer ads just fires up my ranting machine.</p>
<p>What does the almighty Google say about URIs that should deliver useful content to searchers, but sneakily redirect to full page ads? <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=35769">Here you go</a>. Google&#8217;s webmaster guidelines explicitely forbid such black hat tactics: </p>
<p>&#8220;<strong>Don&#8217;t use cloaking or sneaky redirects.</strong>&#8221; Google just did the latter with its very own <a href="http://google.com/ie?q=buy+viagra+online">SERPs</a>. The search interface <a href="http://sebastians-pamphlets.com/google-serps-sneakily-redirect-to-ads/#goog-ie-ui">google.com/ie</a>, out in the wild for nearly a decade, redirects to a piece of sidebar HTML offering a download of IE8 optimized for Google. That&#8217;s a helpful redirect for some IE6 users who don&#8217;t suffer from an IT department stuck with this outdated browser, but it&#8217;s plain misleading in the eyes of all those searchers who appreciated this clean and totally uncluttered search interface. Interestingly, <abbr title="User Agent">UA</abbr> cloaking is the only way to heal this sneaky behavior.</p>
<p>&#8220;<strong>Don&#8217;t create pages with malicious behavior.</strong>&#8221; Google&#8217;s guilty, too. Instead of checking for the user&#8217;s browser, redirecting only IE6 requests from <a href="http://www.google.com/search?output=ie&#038;num=100&#038;hl=en&#038;safe=off&#038;q=google+discontinues+IE6+support">Google&#8217;s discontiued IE6 support</a> (IE6 toolbar &#8230;) to the IE8 advertisement, whilst all other user agents get their desired search box, respectively their SERPs, under a google.com/search?output=ie&amp;&#8230; URI, Google performs an unconditional redirect to a page that&#8217;s utterly useless and also totally unexpected for many searchers. I consider misleading redirects malicious.</p>
<p>&#8220;<strong>Avoid links to web spammers or &#8216;bad neighborhoods&#8217; on the web.</strong>&#8221; I consider the propaganda for IE that Google displays instead of the search results I&#8217;d expect a bad neighborhood on the Web, because IE constantly ignores Web standards, forcing developers and designers to implement superfluous work arounds. (Ok, ok, ok &#8230; Google&#8217;s lack of geekiness doesn&#8217;t exactly count as violation of their webmaster guidelines, but it sounds good, doesn&#8217;t it?)</p>
<p><a href="http://twitter.com/home?status=Hey+@MattCutts,+about+time+to+ban+google.com/ie?q=spam!+http%3A%2F%2Fsebastians-pamphlets.com/google-serps-sneakily-redirect-to-ads/" target="twitter" title="Tweet That!"><strong>Hey Matt Cutts, about time to ban google.com/ie!</strong> <img src="http://sebastians-pamphlets.com/img/twitter-icon.gif" width="10" height="10" style="border:none;" alt="Click to tweet that"  /></a></p>
<p id="goog-ie-ui"><a href="http://sebastians-pamphlets.com/rediscover-googles-free-ranking-checker/">Google&#8217;s very best search interface</a> is history. Here is what you got under<code><br />
<b>http://www.google.com/ie?num=100&#038;hl=en&#038;safe=off&#038;q=minimalistic</b></code>:</p>
<p><img  src="http://sebastians-pamphlets.com/img/posts/google-awesome-ie-serp.png" width="448" height="503" style="text-align:center; display:block;" align="middle" alt="Google's famous minimalistic search UI" title="Google's famous minimalistic search UI" /></p>
<p>And here is where Google sneakily redirects you to when you load the SERP link above (even with Chrome!):<code><br />
<b>http://www.google.com/toolbar/ie8/sidebar.html</b></code>:</p>
<p><img  src="http://sebastians-pamphlets.com/img/posts/google-fpa-ie8.png" width="268" height="569" style="border:dotted red 1px; text-align:center; display:block;" align="middle" alt="Google's sneaky IE8 propaganda" title="Google's sneaky IE8 propaganda" /></p>
<p id="goog-ie-spam-report">It&#8217;s sad that a browser vendor like Google (and yes, Google Chrome <b>is</b> my favorite browser) feels the need to mislead its users with propaganda for a competiting browser that&#8217;s slower and doesn&#8217;t render everything as it should render it. But when this particular browser vendor also leads Web search, and makes use of black hat techniques that it bans webmasters for, then that&#8217;s a scandal. So, if you agree, please submit a spam report to Google:</p>
<p><a href="http://twitter.com/home?status=Hey+@MattCutts,+about+time+to+ban+google.com/ie! %23spam-report+http%3A%2F%2Fsebastians-pamphlets.com/google-serps-sneakily-redirect-to-ads/" target="twitter" title="Tweet Your Spam Report!"><strong>Hey Matt Cutts, about time to ban google.com/ie! #spam-report</strong> <img src="http://sebastians-pamphlets.com/img/twitter-icon.gif" width="10" height="10" style="border:none;" alt="Tweet Your Spam Report"  /></a></p>
<p>2010-05-17 I&#8217;ve updated this pamphlet because it didn&#8217;t explain the &#8220;sneakiness&#8221; clear enough. As of today, the unconditional redirect is still sneaky IMHO. Google needs to deliver searchers their desired search results, and only stubborn IE6 users ads for a somewhat better browser.</p>
<p>2010-05-18 <b>Q:</b> You&#8217;re pissed solely because your SERP scraping scrips broke. <b>A:</b> Glad you&#8217;ve asked. Yes, I&#8217;ve <a href="http://www.scroogle.org/cgi-bin/scraper.htm" rel="crap nofollow">scraped Google&#8217;s /ie search</a> too. Not because I&#8217;m a <a href="http://www.google-watch.org/" rel="crap nofollow">privacy nazi</a> like Daniel Brandt. I&#8217;ve just checked (my) rankings. However, when I spotted the redirects I didn&#8217;t even remember the location of the scripts that scraped this service, because I didn&#8217;t look at ranking reports for years. I&#8217;m interested in actual traffic, and revenues. Ego food annoys me. I just love the /ie search interface. So the answer is a bold &#8220;no&#8221;. I don&#8217;t give a fucking dead rat&#8217;s ass what ranking reports based on scraped SERPs could tell.</p>
<hr />Copyright &copy; 2012 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/google-serps-sneakily-redirect-to-ads/", "style": "big", "title": "Google went belly-up: SERPs sneakily redirect to FPAs" } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/google-serps-sneakily-redirect-to-ads/feed/</wfw:commentRss>
		</item>
		<item>
		<title>The anatomy of a deceptive Tweet spamming Google Real-Time Search</title>
		<link>http://sebastians-pamphlets.com/how-to-spam-google-real-time-search-via-twitter/</link>
		<comments>http://sebastians-pamphlets.com/how-to-spam-google-real-time-search-via-twitter/#comments</comments>
		<pubDate>Thu, 10 Dec 2009 10:12:44 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Webspam]]></category>

		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Redirects]]></category>

		<category><![CDATA[Internet Marketing]]></category>

		<category><![CDATA[Spam]]></category>

		<category><![CDATA[Twitter]]></category>

		<category><![CDATA[SEO]]></category>

		<category><![CDATA[Cloaking]]></category>

		<category><![CDATA[Crap]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/how-to-spam-google-real-time-search-via-twitter/</guid>
		<description><![CDATA[
Minutes after the launch of Google&#8217;s famous Real Time Search, the Internet marketing community began to spam the scrolling SERPs. Google gave birth to a new spam industry.
I&#8217;m sure Google&#8217;s WebSpam team will pull the plug sooner or later, but as of today Google&#8217;s real time search results are extremely vulnerable to questionable content.
The somewhat [...]]]></description>
			<content:encoded><![CDATA[
<p><img  src="http://sebastians-pamphlets.com/img/posts/spamming-google-real-time-search.png" width="250" height="345" align="right" style="margin-left:5px;" alt="Google real time search spammed and abused" title=""  />Minutes after the <a href="http://googleblog.blogspot.com/2009/12/relevance-meets-real-time-web.html?utm_source=sebastian&#038;utm_medium=pamphlet&#038;utm_campaign=thou+shalt+not+fuck+with+my+uris">launch</a> of Google&#8217;s <a href="http://searchengineland.com/search-real-time-madness-31668">famous</a> Real Time Search, the Internet marketing community <a href="http://sphinn.com/story/135685">began</a> to <a href="http://outspokenmedia.com/seo/google-real-time-spam/">spam</a> the <a href="http://www.google.com/search?hl=en&#038;safe=off&#038;esrch=RTSearch&#038;tbo=1&#038;num=100&#038;q=spam&#038;tbs=rltm:1">scrolling SERPs</a>. Google gave birth to a <a href="http://www.seo-theory.com/2009/12/07/google-launches-a-new-spam-industry/">new spam industry</a>.</p>
<p>I&#8217;m sure Google&#8217;s <a href="http://friendfeed.com/dannysullivan/d973e438/real-time-spam-google-says-been-fighting-so-long">WebSpam</a> team will pull the plug sooner or later, but as of today Google&#8217;s real time search results are extremely vulnerable to questionable content.</p>
<p>The somewhat shady approach to make creative use of real time search I&#8217;m outlining below will not work forever. It can be used for really evil purposes,  and Google is aware of the problem. Frankly, if I&#8217;d be the Googler in charge, I&#8217;d dump the whole real-time thingy until the spam defense lines are rock solid.</p>
<p id="rtss-recipe"><strong>Here&#8217;s the recipe from Dr Evil&#8217;s WebSpam-Cook-Book:</strong></p>
<h3 id="rtss-ingredients">Ingredients</h3>
<ul>
<li>1 <a href="http://www.google.com/trends?q=spam+google">popular topic</a> that pulls lots of searches, but not so many that the results scroll down too fast.</li>
<li>1 <a href="http://www.google.com/products?q=spam+google&#038;hl=en&#038;aq=f">landing page</a> that makes the punter pull out the plastic in no time.</li>
<li>1 <a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&#038;answer=93713">trusted authority page</a> totally lacking commercial intentions. View its source code, it must have a valid TITLE element with an appealing call for action related to your topic in its HEAD section.</li>
<li>1 <a href="http://goo.gl/">short</a> domain, 1 cheap Web hosting plan (Apache, PHP), 1 plain text editor, 1 FTP client, 1 Twitter account, and a prize basic coding skills.</li>
</ul>
<h3 id="rtss-preparation">Preparation</h3>
<p>Create a new text file and name it <code>hot-topic.php</code> or so. Then code:<code><br />
&lt;?php<br />
$landingPageUri = "http://affiliate-program.com/?your-aff-id";<br />
$trustedPageUri = "http://google.com/something.py";<br />
if (stristr($_SERVER["HTTP_USER_AGENT"], "Googlebot")) {<br />
   header("HTTP/1.1 307 Here you go today", TRUE, 307);<br />
   header("Location: $trustedPageUri");<br />
}<br />
else {<br />
   header("HTTP/1.1 301 Happy shopping", TRUE, 301);<br />
   header("Location: $landingPageUri");<br />
}<br />
exit;<br />
?&gt;</code></p>
<p>Provided you&#8217;re a savvy spammer, your crawler detection routine will be a little more <a href="http://fantomaster.com/fasvsspy01.html">complex</a>.</p>
<p>Save the file and upload it, then test the URI <code>http://youspamaw.ay/hot-topic.php</code> in your browser.</p>
<h3 id="rtss-serving">Serving</h3>
<ul>
<li>Login to Twitter and submit lots of nicely crafted, not too much keyword stuffed messages carrying your spammy URI. Do not use obscene language, e.g. don&#8217;t swear, and sail around phrases like &#8216;buy cheap viagra&#8217; with synonyms like &#8216;brighten up your girl friend&#8217;s romantic moments&#8217;.</li>
<li>On their SERPs, Google will display the text from the trusted page&#8217;s TITLE element, linked to your URI that leads punters to a sales pitch of your choice.</li>
<li>Just for entertainment, closely monitor Google&#8217;s real time SERPs, and your real-time sales stats as well.</li>
<li>Be happy and get rich by end of the week.</li>
</ul>
<p>Google removes links to untrusted destinations, that&#8217;s why you need to abuse authority pages. As long as you don&#8217;t launch f-bombs, Google&#8217;s profanity filters make flooding their real time SERPs with all sorts of crap a breeze.</p>
<p>Hey <a href="http://twitter.com/GoogleWebspam">Google</a>, for the sake of our children, take that as a spam report!</p>
<hr />Copyright &copy; 2012 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/how-to-spam-google-real-time-search-via-twitter/", "style": "big", "title": "The anatomy of a deceptive Tweet spamming Google Real-Time Search" } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/how-to-spam-google-real-time-search-via-twitter/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Hard facts about URI spam</title>
		<link>http://sebastians-pamphlets.com/troubles-made-by-utm-variables-from-google-analytics/</link>
		<comments>http://sebastians-pamphlets.com/troubles-made-by-utm-variables-from-google-analytics/#comments</comments>
		<pubDate>Tue, 01 Dec 2009 20:00:33 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Search Quality]]></category>

		<category><![CDATA[Duplicate Content]]></category>

		<category><![CDATA[Analytics]]></category>

		<category><![CDATA[Internet Marketing]]></category>

		<category><![CDATA[Webspam]]></category>

		<category><![CDATA[Spam]]></category>

		<category><![CDATA[SEO]]></category>

		<category><![CDATA[Crap]]></category>

		<category><![CDATA[Copy+Paste-Penalties]]></category>

		<category><![CDATA[AdSense]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/troubles-made-by-utm-variables-from-google-analytics/</guid>
		<description><![CDATA[
I stole this pamphlet&#8217;s title (and more) from Google&#8217;s post Hard facts about comment spam for a reason. In fact, Google spams the Web with useless clutter, too. You doubt it? Read on. That&#8217;s the URI from the link above:
http://googlewebmastercentral.blogspot.com/2009/11/hard-facts-about-comment-spam.html?utm_source=feedburner&#038;utm_medium=feed&#038;utm_campaign=Feed%3A+blogspot%2FamDG+%28Official+Google+Webmaster+Central+Blog%29
I&#8217;ve bolded the canonical URI, everything after the questionmark is clutter added by Google.
When your Google [...]]]></description>
			<content:encoded><![CDATA[
<p>I stole this pamphlet&#8217;s title (and more) from Google&#8217;s post <a href="http://googlewebmastercentral.blogspot.com/2009/11/hard-facts-about-comment-spam.html?utm_source=feedburner&#038;utm_medium=feed&#038;utm_campaign=Feed%3A+blogspot%2FamDG+%28Official+Google+Webmaster+Central+Blog%29">Hard facts about comment spam</a> for a reason. In fact, Google spams the Web with useless clutter, too. You doubt it? Read on. That&#8217;s the URI from the link above:</p>
<p><code><b title="Canonical URI" style="color:black;">http://googlewebmastercentral.blogspot.com/2009/11/hard-facts-about-comment-spam.html</b><i title="Google's query string clutter" style="color:red;">?utm_source=feedburner&#038;utm_medium=feed<br />&#038;utm_campaign=Feed%3A+blogspot%2FamDG+%28Official+Google+Webmaster<r />+Central+Blog%29</i></code></p>
<p><img src="http://sebastians-pamphlets.com/img/posts/ga-kraken.png" width="260" height="301" style="margin-left:5px;" align="right" alt="GA Kraken" title="Google Analytics fucks your canonical URIs" />I&#8217;ve bolded the canonical URI, everything after the questionmark is <a href="http://analytics.blogspot.com/2009/11/integration-with-feedburner.html?utm_source=sebastian&#038;utm_medium=pamphlet&#038;utm_campaign=thou+shalt+not+fuck+with+my+uris">clutter added by Google</a>.</p>
<p>When your Google account lists both Feedburner and GoogleAnalytics as active services, Google will automatically screw your URIs when somebody clicks a link to your site in a feed reader (you can opt out, <a href="http://sebastians-pamphlets.com/troubles-made-by-utm-variables-from-google-analytics/#utm-opt-out">see below</a>).</p>
<h3 id="utm-bad">Why is it bad?</h3>
<p>FACT: <strong>Google&#8217;s method to track traffic from feeds to URIs creates new URIs.</strong> And lots of them. Depending on the number of possible values for each query string variable (<code>utm_source</code> <code>utm_medium</code> <code>utm_campaign</code> <code>utm_content</code> <code>utm_term</code>) the amount of cluttered URIs pointing to the same piece of content can sum up to dozens or more.</p>
<p>FACT: Bloggers (publishers, authors, anybody) naturally copy those cluttered URIs to paste them into their posts. The same goes for user link drops at Twitter and elsewhere. These links get crawled and indexed. Currently Google&#8217;s search index is flooded with <a href="http://www.google.com/search?hl=en&#038;q=inurl:utm_source&#038;utm_source=sebastian&#038;utm_medium=pamphlet&#038;utm_campaign=thou+shalt+not+fuck+with+my+uris">28,900,000 cluttered URIs</a> mostly originating from copy+paste links. <a href="http://www.bing.com/search?q=inurl:utm_source">Bing</a> and <a href="http://search.yahoo.com/search?p=inurl:utm_source">Yahoo</a> didn&#8217;t index GA tracking parameters yet.</p>
<p>That&#8217;s 29 million URIs with tracking variables that point to duplicate content as of today. With every link copied from a feed reader, this number will increase. <a href="http://mattcutts.com/blog/">Matt Cutts</a> <a href="http://friendfeed.com/mattcutts/6309e560/graywolf-i-think-johnmu-suggestions-were-solid">said</a> &#8220;I don&#8217;t think utm will cause dupe issues&#8221; and points to <a href="http://johnmu.com/">John Müller</a>&#8217;s <a href="http://www.seroundtable.com/archives/021170.html">helpful advice</a> (<a href="http://www.cre8asiteforums.com/forums/index.php?showtopic=73804">methods</a> a site owner can apply to tidy up Google&#8217;s mess).</p>
<p>Maybe Google can handle this growing duplicate content chaos in their very own search index. Lets forget that Google is the search engine that <a href="http://googlewebmastercentral.blogspot.com/2009/08/optimize-your-crawling-indexing.html?utm_source=sebastian&#038;utm_medium=pamphlet&#038;utm_campaign=thou+shalt+not+fuck+with+my+uris">advocated</a> URI canonicalization for ages, invented sitemaps, rel=canonical, and countless high sophisticated algos to merge indexed clutter under the canonical URI. It&#8217;s all water under the bridge now that Google is in the create-multiple-URIs-pointing-to-the-same-piece-of-content business itself.</p>
<p>So far that&#8217;s just disappointing. To understand why it&#8217;s downright evil, lets look at the implications from a technical point of view.</p>
<h3 id="utm-evil">Spamming URIs with <i>utm</i> tracking variables breaks lots of things</h3>
<p>Look at this URI: <code>http://www.<span title="This URI exists with another server name">example</spam>.com/search.aspx<b style="color:red;">?</b>Query=musical+mobile<b style="color:red;">?</b>utm_source=Referral&#038;utm_medium=Internet&#038;utm_campaign=celebritybabies</code></p>
<p>Google added a query string to a query string. Two URI segment delimiters (<a href="http://www.w3.org/Addressing/URL/4_URI_Recommentations.html">&#8220;?&#8221;</a>) can cause all sorts of troubles at the landing page.</p>
<p>Some scripts will process only variables from Google&#8217;s query string, because they extract GET input from the URI&#8217;s last questionmark to the fragment delimiter &#8220;#&#8221; or end of URI; some scripts expecting input variables in a particular sequence will be confused at least; some scripts might even use the same variable names &#8230; the number of possible errors caused by amateurish extended query strings is infinite. Even if there&#8217;s only one &#8220;?&#8221; delimiter in the URI.</p>
<p>In some cases the page the user gets faced with will lack the expected content, or will display a prominent error message like 404, or will consist of white space only because the underlying script failed so badly that the Web server couldn&#8217;t even show a 5xx error.</p>
<p>Regardless whether a landing page can handle query string parameters added to the original URI or not (most can), changing someone&#8217;s URI for tracking purposes is plain evil, IMHO, when implemented as opt-out instead of opt-in.</p>
<p>Appended UTM query strings can make trackbacks vanish, too. When a blog checks whether the trackback URI is carrying a link to the blog or not, for example with this <a href="http://sw-guide.de/wordpress/plugins/simple-trackback-validation/">plug-in</a>, the comparision can fail and the trackback gets deleted on arrival, without notice. If I&#8217;d dig a little deeper, most probably I could compile a huge list of other functionalities on the Internet that are broken by Google&#8217;s UTM clutter.</p>
<p>Finally, GoogleAnalytics is not the one and only stats tool out there, and it doesn&#8217;t fulfil all needs. Many webmasters rely on simple server reports, for example referrer stats or tools like awstats, for various technical purposes. Broken. Specialized content management tools feeded by real-time traffic data. Broken. Countless tools for linkpop analysis group inbound links by landing page URI. Broken. URI canonicalization routines. Broken, respecively now acting counterproductive with regard to GA reporting. Google&#8217;s UTM clutter has impact on lots of tools that make sense <em>in addition</em> to Google Analytics. All broken.</p>
<p>What a glorious mess. Frankly, I&#8217;m somewhat puzzled. Google has hired tens of thousands of this planet&#8217;s brightest minds &#8211;I really mean that, literally!&#8211;, and they came out with half-assed crap like that? Un-fucking-believable.</p>
<h3 id="utm-opt-out">What can I do to avoid URI spam on my site?</h3>
<p><strong>Boycott Google&#8217;s poor man&#8217;s approach to link feed traffic data to Web analytics.</strong> Go to <a href="http://feedburner.google.com/?utm_source=sebastian&#038;utm_medium=pamphlet&#038;utm_campaign=thou+shalt+not+fuck+with+my+uris">Feedburner</a>. For each of your feeds click on &#8220;Configure stats&#8221; and uncheck &#8220;Track clicks as a traffic source in Google Analytics&#8221;. Done. Wait for a suitable solution.</p>
<p>If you really can&#8217;t live with traffic sources gathered from a somewhat <a href="http://sebastians-pamphlets.com/webkit-please-rescue-the-http_referer/">unreliable HTTP_REFERER</a>, and you&#8217;ve deep pockets, then hire a WebDev crew to revamp all your affected code. Coward!</p>
<p>As a matter of fact, Google is responsible for this royal pain in the ass. Don&#8217;t fix Google&#8217;s errors on your site. Let Google do the fault recovery. They own the root of all UTM evil, so they have to fix it. There&#8217;s absolutely no reason why a gazillion of webmasters and developers should do Google&#8217;s job, <a href="http://sebastians-pamphlets.com/rip-rel-nofollow-funeral-party/">again and again</a>.</p>
<h3 id="utm-alternatives">What can Google do?</h3>
<p>Well, that&#8217;s quite simple. Instead of adding utterly useless crap to URIs found in feeds, Google can make use of a clever redirect script. When Feedburner serves feed items to anybody, the values of all GA tracking variables are available.</p>
<p>Instead of adding clutter to these URIs, Feedburner could replace them with a script URI that stores the timestamp, the user&#8217;s IP addy, and whatnot, then performs a 301 redirect to the canonical URI. The GA script invoked on the landing page can access and process these data quite accurately. </p>
<p>Perhaps this procedure would be even more accurate, because link drops can no longer mimick feed traffic.</p>
<h3 id="utm-speak-out">Speak out!</h3>
<p>So, if you don&#8217;t approve that Feedburner, GoogleReader, AdSense4Feeds, and GoogleAnalytics gang rape your well designed URIs, then link out to everything Google with a descriptive query string, like:</p>
<p><textarea readonly style="width:500px; height:55px; background:white; color:black; font-size:11pt;" wrap="virtual">?utm_source=sebastian&#038;utm_medium=pamphlet&#038;utm_campaign=thou+shalt+not+fuck+with+my+uris</textarea></p>
<p>I mean, nicely designed canonical URIs should be the search engineer&#8217;s porn, so perhaps somebody at Google will listen. Will ya?</p>
<p><b>Update:</b><a href="http://www.semmys.org/2010/search-tech-all-2010-nominees/"><img id="semmy2010" style="border:0;" align="right" src="http://www.semmys.org/dm/badges/10/LBnom.gif" alt="2010 SEMMY Nominee" /></a></p>
<p>I&#8217;ve just added a <a href="http://sebastians-pamphlets.com/stuff/utm-killer/">&#8220;UTM Killer&#8221; tool</a>, where you can enter a screwed URI and get a clean URI &#8212; all &#8216;utm_&#8217; crap and multiple &#8216;?&#8217; delimiters removed &#8212; in return. That&#8217;ll help when you copy URIs from your feedreader to use them in your blog posts.</p>
<p>By the way, please <a href="http://www.semmys.org/category/search-tech/">vote up this pamphlet</a> so that I get the 2010 SEMMY Award. Thanks in advance!</p>
<hr />Copyright &copy; 2012 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span><div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em;"><!-- { "url": "http://sebastians-pamphlets.com/troubles-made-by-utm-variables-from-google-analytics/", "style": "big", "title": "Hard facts about URI spam" } --></div>
]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/troubles-made-by-utm-variables-from-google-analytics/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>

