Before we discuss how to abuse it, it might be a good idea to define it within its context, ok?
First of all, Google announced these meta tags on the official Google News blog for a reason. So when you plan to abuse it with your countless MFA proxies of Yahoo Answers, you most probably jumped on the wrong band wagon. Google supports the meta elements below in Google News only.
The first new indexer hint is syndication-source. It’s meant to tell Google the permalink of a particular news story, hence the author and all the folks spreading the word are asked to use it to point to the one –and only one– URI considered the source:
<meta name="syndication-source" content="http://outerspace.com/news/ubercool-geeks-launched-google-hotpot.html" />
The meta element above is for instances of the story served from
Don’t confuse it with the cross-domain rel-canonical link element. It’s not about canning duplicate content, it marks a particular story, regardless whether it’s somewhat rewritten or just reprinted with a different headline. It tells Google News to use the original URI when the story can be crawled from different URIs on the author’s server, and when syndicated stories on other servers are so similar to the initial piece that Google News prefers to use the original (the latter is my educated guess).
The second new indexer hint is original-source. It’s meant to tell Google the origin of the news itself, so the author/enterprise digging it out of the mud, as well as all the folks using it later on, are asked to declare who broke the story:
<meta name="original-source" content="http://outerspace.com/news/ubercool-geeks-launched-google-hotpot.html" />
Say we’ve got two or more related news, like “Google fell from Mars” by cnn.com and “Google landed in Mountain View” by sfgate.com, it makes sense for latimes.com to publish a piece like “Google fell from Mars and landed in Mountain View”. Because latimes.com is a serious newspaper, they credit their sources not only with a mention or even embedded links, they do it machine-readable, too:
<meta name="original-source" content="http://cnn.com/google-fell-from-mars.html" />
<meta name="original-source" content="http://sfgate.com/google-landed-in-mountain-view.html" />
It’s a matter of course that both cnn.com and sfgate.com provide such an original-source meta element on their pages, in addition to the syndication-source meta element, both pointing to their very own coverage.
If a journalist grabbed his breaking news from a secondary source telling “CNN reported five minutes ago that Google’s mothership started from Venus, and the LA Times spotted it crashing on Jupiter”, he can’t be bothered with looking at the markup and locating those meta elements in the head section, he has a deadline for his piece “Why Web search left Planet Earth”. It’s just fine with Google News when he puts
<meta name="original-source" content="http://cnn.com/" />
<meta name="original-source" content="http://sfgate.com/" />
As always, the most interesting stuff is hidden on a help page:
At this time, Google News will not make any changes to article ranking based on this tags.
If we detect that a site is using these metatags inaccurately (e.g., only to promote their own content), we’ll reduce the importance we assign to their metatags. And, as always, we reserve the right to remove a site from Google News if, for example, we determine it to be spammy.
As with any other publisher-supplied metadata, we will be taking steps to ensure the integrity and reliability of this information.
It’s a field test
We think it is a promising method for detecting originality among a diverse set of news articles, but we won’t know for sure until we’ve seen a lot of data. By releasing this tag, we’re asking publishers to participate in an experiment that we hope will improve Google News and, ultimately, online journalism. […] Eventually, if we believe they prove useful, these tags will be incorporated among the many other signals that go into ranking and grouping articles in Google News. For now, syndication-source will only be used to distinguish among groups of duplicate identical articles, while original-source is only being studied and will not factor into ranking. [emphasis mine]
Well, we do know that Google Web search has a spam problem, IOW even a few so-1999-webspam-tactics still work to some extent. So we tend to classify a vague threat like “If we find sites abusing these tags, we may […] remove [those] from Google News entirely” as FUD, and spam away. Common sense and experience tells us that a smart marketer will make money from everything spammable.
But: we’re not talking about Web search. Google News is a clearly laid out environment. There are only so many sites covered by Google News. Even if Google wouldn’t be able to develop algos analyzing all source attribution attributes out there, they do have the resources to identify abuse using manpower alone. Most probably they will do both.
They clearly told us that they will compare those meta data to other signals. And that’s not only very weak indicators like “timestamp first crawled” or “first heard of via pubsubhubbub”. It’s not that hard to isolate particular news, gather each occurrence as well as source mentions within, and arrange those on a time line with clickable links for QC folks who most certainly will identify the actual source. Even a few spot tests daily will soon reveal the sites whose source attribution meta tags are questionable, or even spammy.
If you’re still not convinced, fair enough. Go spam away. Once you’ve lost your entry on the whitelist, your free traffic from Google News, as well as from news-one-box results on conventional SERPs, is toast.
Last but not least, a fair warning
Now, if you still want to use source attribution meta elements on your non-newsworthy MFA sites to claim owership of your scraped content, feel free to do so. Most probably Matt’s team will appreciate just another “I’m spamming Google” signal.
Not that reprinting scraped content is considered shady any more: even a former president does it shamelessly. It’s just the almighty Google in all of its evilness that penalizes you for considering all on-line content public domain.
Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to Entries Comments All Comments