Duplicate Content Filters are Sensitive Plants

In their ever lasting war on link and index spam search engines produce way too much collateral damage. Especially hierarchically structured content suffers from over-sensitive spam filters. The crux at this is, that user friendly pages need to duplicate information from upper levels. The old rule “what’s good for users will be honored by the engines” no longer applies.

In fact the problem is not the legitimate duplication of key information from other pages, the problem is that duplicate content filters are sensitive plants not able to distinguish useful repetition from automated generation of artificial spider fodder. The engines won’t lower their spam threshold, that means they will not fix this persistent bug in the near future, so Web site owners have to live with decreasing search engine traffic, or react. The question is, what can a Webmaster do to escape the dilemma without converting the site to a useless nightmare for visitors, because all textual redundancies were eliminated?

The major fault of Google’s newer dupe filters is, that their block level analysis often fails in categorizing page areas. Web page elements in and near the body area, which contain duplicated key information from upper levels, are treated as content blocks, not as part of the page template where they logically belong to. As long as those text blocks reside in separated HTML block level elements, it should be quite easy to rearrange those elements in a way that the duplicated text becomes part of the page template, what should be safe at least with somewhat intelligent dupe filters.

Unfortunately, very often the raw data aren’t normalized, for example the text duplication happens within a description field in a database’s products table. That’s a major design flaw, and it must be corrected in order to manipulate block level elements properly, that is to declare them as part of the template vs. part of the page body.

My article Feed Duplicate Content Filters Properly explains a method to revamp page templates of eCommerce sites on the block level. The principle outlined there can be applied to other hierarchical content structures too.

Tags: ()

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Be the first to add a Comment to "Duplicate Content Filters are Sensitive Plants"

Leave a reply

[If you don't do the math, or the answer is wrong, you'd better have saved your comment before hitting submit. Here is why.]

Be nice and feel free to link out when a link adds value to your comment. More in my comment policy.