Google’s Anchor Text Reports Encourage Spamming and Scraping

Despite the title-bait Google’s new Webmaster Toy is an interesting tool, thanks! I’ll look at it from a Webmaster’s perspective, all SEO hats in the wardrobe.

Danny tells us more, for example the data source:

A few more details about the anchor text data. First, it comes only from external links to your site. Anchor text you use on your own site isn’t counted.
Second, links to any subdomains you have are NOT included in the data.

From a quick view I doubted that Google filters internal links properly. So I’ve checked a few sites and found internal anchor text leading the new anchor text stats. Well, it’s not unusual that folks copy and paste links including anchor text. Using page titles as anchor text is also business as usual. But why the heck do page titles and shortcuts from vertical menu bars appear to be the most prominent external anchor text? With large sites having tons of inbounds that seems not so easy to investigate.

Thus I’ve looked at a tiny site of mine, this blog. It carries a few recent bollocks posts which were so useless that nobody bothered linking, I thought. Well, partly that’s the case, there were no links by humans, but enough fully automated links to get Ms. Googlebot’s attention. Thanks to scrapers, RSS aggregators, forums linking the author’s recent blog posts and so on, everything on this planet gets linked, so duplicated post titles make it in the anchor text stats.

Back to larger sites I found out that scraper sites and indexed search results were responsible for the extremely misleading ordering of Google’s anchor text excerpts. Both page types should not get indexed in the first place, and it’s ridiculous that crappy data, respectively irrelevantly accumulated data, dilute a well meant effort to report inbound linkage to Webmasters.

Unweighted ordering of inbound anchor text by commonness and limiting the number of listed phrases to 100 makes this report utterly useless for many sites, and so much the worse it sends a strong but wrong signal. Importance in the eye of the beholder gets expressed by the top of an ordered list or result set.

Joe Webmaster tracking down the sources of his top-10 inbound links finds a shitload of low-life pages, thinks hey, linkage is after all a game of large numbers when Google says those links are most important, launches a gazillion of doorways and joins every link farm out there. Bummer. Next week Joe Webmaster pops up in the Google Forum telling the world that his innocent site got tanked because he followed Google’s suggestions and we’re bothered with just another huge but useless thread discussing whether scraper links can damage rankings or not, finally inventing the “buffy blonde girl pointy stick penalty” on page 128.

Roundtrip to hat rack, SEO hat attached. I perfectly understand that Google is not keen on disclosing trust/quality/reputation scores. I can read these stats because I understand the anatomy (intention, data, methods, context), and perhaps I can even get something useful out of them. Explaining this to an impatient site owner who unwillingly bought the link quality counts argument but still believes that nofollow’ed links carry some mysterious weight because they appear in reversed citation results is a completely other story.

Dear Google, if you really can’t hand out information on link weighting by ordering inbound anchor text by trust or other signs of importance, then can you please at least filter out all the useless crap? This shouldn’t be that hard to accomplish, since even simple site-search queries for “scraping” reveal tons of definitely not index-worthy pages which already do not pass any search engine love to the link destinations. Thanks in advance!

When I write “Dear Google”, can Vanessa hear me? Please :)

Tags: ()

Update: Check your anchor text stats page every now and then, don’t miss out on updates :)

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Be the first to add a Comment to "Google's Anchor Text Reports Encourage Spamming and Scraping"

Leave a reply

[If you don't do the math, or the answer is wrong, you'd better have saved your comment before hitting submit. Here is why.]

Be nice and feel free to link out when a link adds value to your comment. More in my comment policy.