Archived posts from the 'AdSense' Category

Hard facts about URI spam

I stole this pamphlet’s title (and more) from Google’s post Hard facts about comment spam for a reason. In fact, Google spams the Web with useless clutter, too. You doubt it? Read on. That’s the URI from the link above:

http://googlewebmastercentral.blogspot.com/2009/11/hard-facts-about-comment-spam.html?utm_source=feedburner&utm_medium=feed
&utm_campaign=Feed%3A+blogspot%2FamDG+%28Official+Google+Webmaster+Central+Blog%29

GA KrakenI’ve bolded the canonical URI, everything after the questionmark is clutter added by Google.

When your Google account lists both Feedburner and GoogleAnalytics as active services, Google will automatically screw your URIs when somebody clicks a link to your site in a feed reader (you can opt out, see below).

Why is it bad?

FACT: Google’s method to track traffic from feeds to URIs creates new URIs. And lots of them. Depending on the number of possible values for each query string variable (utm_source utm_medium utm_campaign utm_content utm_term) the amount of cluttered URIs pointing to the same piece of content can sum up to dozens or more.

FACT: Bloggers (publishers, authors, anybody) naturally copy those cluttered URIs to paste them into their posts. The same goes for user link drops at Twitter and elsewhere. These links get crawled and indexed. Currently Google’s search index is flooded with 28,900,000 cluttered URIs mostly originating from copy+paste links. Bing and Yahoo didn’t index GA tracking parameters yet.

That’s 29 million URIs with tracking variables that point to duplicate content as of today. With every link copied from a feed reader, this number will increase. Matt Cutts said “I don’t think utm will cause dupe issues” and points to John Müller’s helpful advice (methods a site owner can apply to tidy up Google’s mess).

Maybe Google can handle this growing duplicate content chaos in their very own search index. Lets forget that Google is the search engine that advocated URI canonicalization for ages, invented sitemaps, rel=canonical, and countless high sophisticated algos to merge indexed clutter under the canonical URI. It’s all water under the bridge now that Google is in the create-multiple-URIs-pointing-to-the-same-piece-of-content business itself.

So far that’s just disappointing. To understand why it’s downright evil, lets look at the implications from a technical point of view.

Spamming URIs with utm tracking variables breaks lots of things

Look at this URI: http://www.example.com/search.aspx?Query=musical+mobile?utm_source=Referral&utm_medium=Internet&utm_campaign=celebritybabies

Google added a query string to a query string. Two URI segment delimiters (“?”) can cause all sorts of troubles at the landing page.

Some scripts will process only variables from Google’s query string, because they extract GET input from the URI’s last questionmark to the fragment delimiter “#” or end of URI; some scripts expecting input variables in a particular sequence will be confused at least; some scripts might even use the same variable names … the number of possible errors caused by amateurish extended query strings is infinite. Even if there’s only one “?” delimiter in the URI.

In some cases the page the user gets faced with will lack the expected content, or will display a prominent error message like 404, or will consist of white space only because the underlying script failed so badly that the Web server couldn’t even show a 5xx error.

Regardless whether a landing page can handle query string parameters added to the original URI or not (most can), changing someone’s URI for tracking purposes is plain evil, IMHO, when implemented as opt-out instead of opt-in.

Appended UTM query strings can make trackbacks vanish, too. When a blog checks whether the trackback URI is carrying a link to the blog or not, for example with this plug-in, the comparision can fail and the trackback gets deleted on arrival, without notice. If I’d dig a little deeper, most probably I could compile a huge list of other functionalities on the Internet that are broken by Google’s UTM clutter.

Finally, GoogleAnalytics is not the one and only stats tool out there, and it doesn’t fulfil all needs. Many webmasters rely on simple server reports, for example referrer stats or tools like awstats, for various technical purposes. Broken. Specialized content management tools feeded by real-time traffic data. Broken. Countless tools for linkpop analysis group inbound links by landing page URI. Broken. URI canonicalization routines. Broken, respecively now acting counterproductive with regard to GA reporting. Google’s UTM clutter has impact on lots of tools that make sense in addition to Google Analytics. All broken.

What a glorious mess. Frankly, I’m somewhat puzzled. Google has hired tens of thousands of this planet’s brightest minds –I really mean that, literally!–, and they came out with half-assed crap like that? Un-fucking-believable.

What can I do to avoid URI spam on my site?

Boycott Google’s poor man’s approach to link feed traffic data to Web analytics. Go to Feedburner. For each of your feeds click on “Configure stats” and uncheck “Track clicks as a traffic source in Google Analytics”. Done. Wait for a suitable solution.

If you really can’t live with traffic sources gathered from a somewhat unreliable HTTP_REFERER, and you’ve deep pockets, then hire a WebDev crew to revamp all your affected code. Coward!

As a matter of fact, Google is responsible for this royal pain in the ass. Don’t fix Google’s errors on your site. Let Google do the fault recovery. They own the root of all UTM evil, so they have to fix it. There’s absolutely no reason why a gazillion of webmasters and developers should do Google’s job, again and again.

What can Google do?

Well, that’s quite simple. Instead of adding utterly useless crap to URIs found in feeds, Google can make use of a clever redirect script. When Feedburner serves feed items to anybody, the values of all GA tracking variables are available.

Instead of adding clutter to these URIs, Feedburner could replace them with a script URI that stores the timestamp, the user’s IP addy, and whatnot, then performs a 301 redirect to the canonical URI. The GA script invoked on the landing page can access and process these data quite accurately.

Perhaps this procedure would be even more accurate, because link drops can no longer mimick feed traffic.

Speak out!

So, if you don’t approve that Feedburner, GoogleReader, AdSense4Feeds, and GoogleAnalytics gang rape your well designed URIs, then link out to everything Google with a descriptive query string, like:

I mean, nicely designed canonical URIs should be the search engineer’s porn, so perhaps somebody at Google will listen. Will ya?

Update:2010 SEMMY Nominee

I’ve just added a “UTM Killer” tool, where you can enter a screwed URI and get a clean URI — all ‘utm_’ crap and multiple ‘?’ delimiters removed — in return. That’ll help when you copy URIs from your feedreader to use them in your blog posts.

By the way, please vote up this pamphlet so that I get the 2010 SEMMY Award. Thanks in advance!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Which Sebastian Foss is a spammer?

Obviously pissed by my post Fraud from the desk of Sebastian Foss, Sebastian Foss sent this email to Smart-IT-Consulting.com:

Remove your insults from your blog about my products and sites… as you may know promote-biz.net is not registered to my name or my company.. just look it up in some whois service. This is some spammer who took my software and is now selling it on his spammer websites. Im only selling my programs under their original .com domains and you did not receive any email from me since im only using doube-optin lists.

You may not know it - but insulting persons and spreading lies is under penalty.

Sebastian Foss
Sebastian Foss e-trinity Marketing Inc.
sebastian@etrinity-mail.com

Well, that’s my personal blog, and I’ve a professional opinion about the software Sebastian Foss sells, more on that later. It’s public knowledge that spammers do register domains under several entities to obfuscate their activities. I’m not a fed, and I’m not willing to track down each and every multiple respectively virtual personality of a spammer, so I admit that there’s at least a slight possibility that the Sebastian Foss spamming my inbox from promote-biz.net is not the Sebastian Foss who wrote and sells the software promoted by the email spammer Sebastian Foss. Since I still receive email spam from the desk of Sebastian Foss at promote-biz.net, I think there’s no doubt that this Sebastian Foss is a spammer. Well, Sebastian Foss himself calls him a spammer, and so do I. Confused? So am I. I’ll update my other post to reflect that.

Now that we’ve covered the legal stuff, lets look at the software from the desk of Sebastian Foss.

  • Blog Blaster claims to submit “ads” to 2,000,000 sites. Translation: Blog Blaster automatically submits promotional comments to 2 million blogs. The common description of this kind of “advertising” is comment spam.
    Sebastian Foss tells us that “Blog Blaster will automatically create thousands of links to your website - which will rank your website in a top 10 position!”. The common description of this link building technique is link spam.
    The sales pitch signed by Sebastian Foss explains “I used it [Blog Blaster] to promote my other website called ezinebroadcast.com and Blog Blaster produced thousands of links to ezinebroadcast.com - resulting in a #1 position in Google for the term “ezine advertising service”. So I understand that Sebastian Foss admits that he is a comment spammer and a link spammer.
    I’d like to see the written permissions of 2,000,000 bloggers allowing Sebastian Foss and his customers to spam their blogs: “Advertising using Blog Blaster is 100% SPAM FREE advertising! You will never be accused of spamming. Your ads are submitted to blogs whose owners have agreed to receive your ads.” Laughable, and obviously a lie. Did Sebastian Foss remember that “spreading lies is under penalty”? Take care, Sebastian Foss!
  • Feed Blaster with a very similar sales pitch aims to create the term feed spam. Also, it seems that FeedBlaster™ is a registered trademark of DigitalGrit Inc. And I don’t think that Microsoft, Sun and IBM are happy to spot their logos on Sebastian Foss’ site e-trinity Internetmarketing GmbH
  • The Money License System aka Google Cash Machine seems to slip through a legal loophole. May be it’s not explicit illegal to sell software build to to trick Google Adwords respectively AdSense or ClickBank, but using it will result in account terminations and AFAIK legal actions too.
  • Instant Booster claims to spam search engines, and it does, according to many reports. The common term applied to those techniques is web spam.

All these domains (and there are countless more sites selling similar scams from the desk of Sebastian Foss) are registered by Sebastian Foss respectively his companies e-trinity Internetmarketing GmbH or e-trinity Marketing Inc.

He’s in the business of newsgroup spam, search engine spam, comment spam … probably there’s no target left out. Searching for Sebastian Foss scam and similar search terms leads to tons of rip-off reports.

He’s even too lazy to rephrase his sales pitches, click a few of the links provided above, then search for quoted phrases you saw on every sales pitch to get the big picture. All that may be legal in Germany, I couldn’t care less, but it’s not legit. Creating and selling software for the sole purpose of spamming makes the software vendor a spammer. And he’s proud of it. He openly admits that he uses his software to spam blogs, search engines, newsgroups and whatever. He may make use of affiliates and virtual entities who send out the email spam, perhaps he got screwed by a chinese copycat selling his software via email spam, but is that relevant when the product itself is spammy?

What do you think, is every instance of Sebastian Foss a spammer? Feel free to vote in the comments.

Update 08/01/2007 Here is the next email from the desk of Sebastian Foss:

Hi,
thanks for the changes on your blog entry - however like i mentioned if you look up the domains which were advertised in the spam mails you will notice that they are not registered to me or my company. You can also see that visiting the sites you will see some guy took my products and is selling them for a lower price on his own websites where he is also copying all of my graphic files. The german police told me that they are receiving spam from your forms and that it goes directly to their trash… however please remove your entries about me from your blog - There is no sense in me selling my own products for a lower price on some cheap, stolen websites - if that would make sense then why do i have my own .com domains for my products ? I just want to make clear that im not sending out any spam mails - please get back to me.

Thanks,
Sebastian

Sebastian Foss
e-trinity Internetmarketing GmbH
sebastian@etrinity-mail.com

It deserves just a short reply:

It makes perfect sense to have an offshore clone in China selling the same outdated and pretty much questionable stuff a little cheaper. This clone can do that because first there’s next to no costs like taxes and so on, and second he does it per spamming my inbox on a daily base, hence probably he sells a lot of the ‘borrowed’ stuff. Whether or not the multiple Sebastian Fosses are the same natural person is not my problem. I claim nothing but leave it up to you dear reader’s speculation, common sense, and probability calculation.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

AdSense asks me "Are You Gay?", but why?

John emailed me this screenshot:
Are you gay?

I wondered why the heck AdSense considers a post on Google’s new URL removal tool gay in nature. Warning! These links (Google search results) are not safe at work:

0.08% gay: “fell in love
0.11% gay: “neat
0.05% gay: “terminator
0.02% gay: “competitors
0.04% gay: “user
14.7% gay: “friendly
56.4% gay: “tool

Who can help me to figure out the remaining 28.6% gayness? I mean when putting two ads asking “Are you gay” above and below the post, Ol’ AdSense should be at least 100.0% certain that this question will not offend me. Actually, I’m not offended. I’m just curious to learn more about a possible coming out.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments