Plagiarism

Archived posts from the 'Plagiarism' Category

How to spam the hell out of Google’s new source attribution meta elements

Posted on 16 November, 2010

The moment you’ve read Google’s announcement and Matt’s question “What about spam?” you concluded “spamming it is a breeze”, right? You’re not alone.

Before we discuss how to abuse it, it might be a good idea to define it within its context, ok?

Playground

First of all, Google announced these meta tags on the official Google News blog for a reason. So when you plan to abuse it with your countless MFA proxies of Yahoo Answers, you most probably jumped on the wrong band wagon. Google supports the meta elements below in Google News only.

syndication-source

The first new indexer hint is syndication-source. It’s meant to tell Google the permalink of a particular news story, hence the author and all the folks spreading the word are asked to use it to point to the one -and only one- URI considered the source:

<meta name="syndication-source" content="http://outerspace.com/news/ubercool-geeks-launched-google-hotpot.html" />

The meta element above is for instances of the story served from
http://outerspace.com/breaking/page1.html
http://outerspace.com/yyyy-mm-dd/page2.html
http://outerspace.com/news/aliens-appreciate-google-hotpot.html
http://outerspace.com/news/ubercool-geeks-launched-google-hotpot.html
http://newspaper.com/main/breaking.html
http://tabloid.tv/rehashed/from/rss/hot:alien-pot-in-your-bong.html
…

Don’t confuse it with the cross-domain rel-canonical link element. It’s not about canning duplicate content, it marks a particular story, regardless whether it’s somewhat rewritten or just reprinted with a different headline. It tells Google News to use the original URI when the story can be crawled from different URIs on the author’s server, and when syndicated stories on other servers are so similar to the initial piece that Google News prefers to use the original (the latter is my educated guess).

original-source

The second new indexer hint is original-source. It’s meant to tell Google the origin of the news itself, so the author/enterprise digging it out of the mud, as well as all the folks using it later on, are asked to declare who broke the story:

<meta name="original-source" content="http://outerspace.com/news/ubercool-geeks-launched-google-hotpot.html" />

Say we’ve got two or more related news, like “Google fell from Mars” by cnn.com and “Google landed in Mountain View” by sfgate.com, it makes sense for latimes.com to publish a piece like “Google fell from Mars and landed in Mountain View”. Because latimes.com is a serious newspaper, they credit their sources not only with a mention or even embedded links, they do it machine-readable, too:

<meta name="original-source" content="http://cnn.com/google-fell-from-mars.html" />
<meta name="original-source" content="http://sfgate.com/google-landed-in-mountain-view.html" />

It’s a matter of course that both cnn.com and sfgate.com provide such an original-source meta element on their pages, in addition to the syndication-source meta element, both pointing to their very own coverage.

If a journalist grabbed his breaking news from a secondary source telling “CNN reported five minutes ago that Google’s mothership started from Venus, and the LA Times spotted it crashing on Jupiter”, he can’t be bothered with looking at the markup and locating those meta elements in the head section, he has a deadline for his piece “Why Web search left Planet Earth”. It’s just fine with Google News when he puts

<meta name="original-source" content="http://cnn.com/" />
<meta name="original-source" content="http://sfgate.com/" />

Fine-prints

As always, the most interesting stuff is hidden on a help page:

At this time, Google News will not make any changes to article ranking based on this tags.

If we detect that a site is using these metatags inaccurately (e.g., only to promote their own content), we’ll reduce the importance we assign to their metatags. And, as always, we reserve the right to remove a site from Google News if, for example, we determine it to be spammy.

As with any other publisher-supplied metadata, we will be taking steps to ensure the integrity and reliability of this information.

It’s a field test

We think it is a promising method for detecting originality among a diverse set of news articles, but we won’t know for sure until we’ve seen a lot of data. By releasing this tag, we’re asking publishers to participate in an experiment that we hope will improve Google News and, ultimately, online journalism. […] Eventually, if we believe they prove useful, these tags will be incorporated among the many other signals that go into ranking and grouping articles in Google News. For now, syndication-source will only be used to distinguish among groups of duplicate identical articles, while original-source is only being studied and will not factor into ranking. [emphasis mine]

Spam potential

Well, we do know that Google Web search has a spam problem, IOW even a few so-1999-webspam-tactics still work to some extent. So we tend to classify a vague threat like “If we find sites abusing these tags, we may […] remove [those] from Google News entirely” as FUD, and spam away. Common sense and experience tells us that a smart marketer will make money from everything spammable.

But: we’re not talking about Web search. Google News is a clearly laid out environment. There are only so many sites covered by Google News. Even if Google wouldn’t be able to develop algos analyzing all source attribution attributes out there, they do have the resources to identify abuse using manpower alone. Most probably they will do both.

They clearly told us that they will compare those meta data to other signals. And that’s not only very weak indicators like “timestamp first crawled” or “first heard of via pubsubhubbub”. It’s not that hard to isolate particular news, gather each occurrence as well as source mentions within, and arrange those on a time line with clickable links for QC folks who most certainly will identify the actual source. Even a few spot tests daily will soon reveal the sites whose source attribution meta tags are questionable, or even spammy.

If you’re still not convinced, fair enough. Go spam away. Once you’ve lost your entry on the whitelist, your free traffic from Google News, as well as from news-one-box results on conventional SERPs, is toast.

Last but not least, a fair warning

Now, if you still want to use source attribution meta elements on your non-newsworthy MFA sites to claim owership of your scraped content, feel free to do so. Most probably Matt’s team will appreciate just another “I’m spamming Google” signal.

Not that reprinting scraped content is considered shady any more: even a former president does it shamelessly. It’s just the almighty Google in all of its evilness that penalizes you for considering all on-line content public domain.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

8 comments Sebastian | Search Quality, Testing, Webspam, Spam, Plagiarism, Google

Dealing with spamming content thieves / plagiarists (oylinki.com)

Posted on 21 January, 2008

Dealing with plagiarists When it comes to crap like plagiarism you shouldn’t consider me a gentleman.

If assclowns like Veronica Domb steal my content and publish it along with likewise stolen comments on their blatantly spamming site oylinki.com, I’m somewhat upset.

Then when I leave a polite note asking the thief Veronica Domb from EmeryVille to remove my stuff asap, see my comment marked as “in moderation”, but neither my content gets removed nor my comment is published within 24 hours, I stay annoyed.

When I’m annoyed, I write blog posts like this one. I’m sure it will rank high enough for [Veronica Domb] when the assclown’s banker or taxman searches for her name. I’m sure it’ll be visible on any SERP that any other (potential) business partner submits at a major search engine.

Content Thieves Veronica Domb et al, P.O.BOX 99800, EmeryVille, 94662, CA are blatant spammers

Hey, outing content thieves is way more fun than filing boring DMCA complaints, and way more effective. Plagiarists do ego searches too, and from now on Veronica Domb from EmeryVille will find the footsteps of her criminal activities on the Web with each and every ego search. Isn’t that nice?

Not. Of course Veronica Domb is a pseudonym of Slade Kitchens, Jamil Akhtar, … However, some plagiarists and scam artists aren’t smart enough to hide their identity, so watch out.

Maybe I’ve done some companies a little favor, because they certainly don’t need to sent out money sneakily “earned” with Web spam and criminal activities that violate the TOS of most affiliate programs.

AdBrite will love to cancel the account for these affiliate links: http://ads.adbrite.com/mb/text_group.php?sid=448245&br=1 &dk=736d616c6c20627573696e6573735f355f315f776562 http://www.adbrite.com/mb/commerce/purchase_form.php?opid=448245&afsid=1

Google’s webspam team as well as other search engines will most likely delist oylinki.com that comes with 100% stolen text and links and faked whois info as well.

Spamcop and alike will happily blacklist oylinki.com (IP: 66.199.174.80 , cwh2.canadianwebhosting.com) because the assclown’s blog software sends out email spam masked as trackbacks.

If anybody is interested, here’s a track of the real “Veronica Domb” from Canada clicking the link to this post from her WP admin panel: 74.14.107.36 - - [21/Jan/2008:07:50:40 -0500] "GET /outing-plagiarist-2008-01-21/ HTTP/1.1" 200 9921 "http://oylinki.com/blog/wp-admin/edit-comments.php" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SU 3.005; .NET CLR 1.1.4322; InfoPath.1; Alexa Toolbar; .NET CLR 2.0.50727)"

Common sense is not as common as you think.

Disclaimer: I’ve outed plagiarists in the past, because it works. Whether you do that on ego-SERPs or not depends on your ethics. Some folks think that’s even worse than theft and spamming. I say that publishing plagiarisms in the first place deserves bad publicity.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

11 comments Sebastian | Webspam, Blogging, Spam, Copyrights, Plagiarism, Crap

Please don’t run your counter on my servers

Posted on 18 April, 2007

DO NOT HOTLINK I deeply understand that sharing other peoples resources makes sense sometimes. I just ask you to rethink your technical approach. Running your page view stats on my server comes with a serious disadvantage: my server logs and referrer reports are protected, hence you can’t read your stats. Rest assured I’m really not eager to know who views your pages.

So please: when you copy my HTML code, be so kind and steal the invisible 1×1px images too. It’s really not that hard to upload them to your server and edit my HTML in a way that your visitors’ user agents request these images from your server.

Signing up at a free counter service not adding hidden links to all your pages gives less hassles than my reaction when I get annoyed.

Disclaimer: I don’t like it when you steal my code coz for some reasons it’s often crappy enough to break your layout. Also copying code without permission is as bad as content theft. So don’t copy, but feel free to ask.

Go to HTML Basix to figure out how you can block hotlinking with .htaccess:
RewriteEngine on RewriteCond %{HTTP_REFERER} !^http://(www\.)?sebastianx.blogspot.com(/)?.*$ [NC] RewriteRule .*\.(gif|jpg|jpeg|bmp|png)$ http://www.smart-it-consulting.com/img/misc/do-not-hotlink-beauty.jpg [R,NC]
But please don’t steal or hotlink the offensive blonde beauty

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

8 comments Sebastian | Copyrights, Copy+Paste-Penalties, Plagiarism, Hotlinking, .htaccess

Can you trust a SEO who steals?

Posted on 6 January, 2006

Tahir J. Farooque runs a SEO service in Los Angeles: ~~Cresoft Corporation~~. Instead of writing up articles on search engine optimization for the company’s Web site at ~~cresoft dot com~~, he prefers to steal those from other Web sites.

Tahir J. Farooque is not only a content thief, T.J. Farooque is an incredible stupid plagiarist. When this not that savvy thief receives a cease & desist letter, the stolen content gets shortened a bit. Laughable to think a thief can get away with theft by rearranging the sales pitch thrown together from stolen content. But that’s Tahir Farooque’s poor mind. Putting a business at legal risk and asking for bad publicity seems to be easier than writing an own copy. Perhaps Tahir F. is not capable of doing his own research, but then he lacks a fundamental skill for a search engine optimizer. Would you assign SEO work to a stupid thief?

I’ve put up a page with screen shots, whois info etc. under Content Theft: Tahir J. Farooque’s plagiarism at ~~CRESOFT.COM (Cresoft Corporation)~~, feel free to link to it with a suitable anchor text.

UPDATE: The content theft sold the company, and the new owner apologized for the plagiarism. That’s why I’ve removed the info page linked above and crossed out the company name.

Tags: Search Engine Optimization (SEO) Content Theft Plagiarism

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

Be the first to comment Sebastian | Plagiarism, Crap, SEO, Uncategorized

Another Content Thief Caught

Posted on 22 July, 2005

Yesterday CopyScape alerted me to a content thief reprinting my stuff at [*]. This moron scraped a few paragraphs from my tutorial on Google Sitemaps, replaced a link to Google’s SEO page by a commercial call for action, and uploaded the plagiarism as sales pitch for his dubious and pretty useless SEO tools.

As usual, I’ve documented the case and sent it over to my lawyer. Then I thought I could do more with all the screen shots, WHOIS info etc., and developed a template for a page of evidence [*]. Now it takes me only a few minutes to publish everything others should know about a content thief. Entering a few variables and pushing a button creates a nice page documenting the copyright infringement.

Unfortunately I can’t post the template, because it works with my CMS only, but you’ll get the idea. Be creative yourself, put the thief’s name, company and personal data promitently nearby terms like ‘evil’ and ‘thief’ all over the page, including the META tags. Then link to the page and submit it to all search engines. After a while do a search for the thief and check out whether you’ve outranked the offending site. If not, consider reading a few of my articles on search engine optimizing

[*] My content was removed after my outing page has been picked up by the search engines and ranked fine within a few days. Thus I’ve removed the names and links. Here is another example of an outing page: Content Theft: Tahir J. Farooque’s plagiarism at CRESOFT.COM (Cresoft Corporation)

Tags: Search Engine Optimization (SEO) Content Theft Plagiarism

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

1 comment Sebastian | Plagiarism, Crap

Sebastian’s Pamphlets

Archived posts from the 'Plagiarism' Category

How to spam the hell out of Google’s new source attribution meta elements

Playground

syndication-source

original-source

Fine-prints

It’s a field test

Spam potential

Last but not least, a fair warning

Dealing with spamming content thieves / plagiarists (oylinki.com)

Content Thieves Veronica Domb et al, P.O.BOX 99800, EmeryVille, 94662, CA are blatant spammers

Please don’t run your counter on my servers

Can you trust a SEO who steals?

Another Content Thief Caught

Categories

Monthly Archives

Links

RSS Feeds