Folks try all sorts of naughty things when by accident a blog’s feed outranks the HTML version of a post. Usually that happened mostly to not that popular blogs, or with very old posts and categorized feeds that contain ancient articles.
The problem seems to be that Google’s Web search doesn’t understand the XML structure of feeds, so that a feed’s textual contents get indexed like stuff from text files. Due to “subscribe” buttons and other links, feeds can gather more PageRank than some HTML pages. Interestingly .xml is considered an unknown file type, and advanced search doesn’t provide a way to search within XML files.
As Webmasters many of you were probably worried that your RSS or Atom feeds could outrank the accompanying HTML pages in Google’s search results. The emergence of feeds in our search results could be a poor user experience:
1. Feeds increase the probability that the user gets the same search result twice.
2. Users who click on the feed link on a SERP may miss out on valuable content, which is only available on the HTML page referenced in the XML file.
For these reasons, we have removed feeds from our Web search results - with the exception of podcasts (feeds with media files).
[…] We are aware that in addition to the podcasts out there some feeds exist that are not linked with an HTML page, and that is why it is not quite ideal to remove all feeds from the search results. We’re still open for feedback and suggestions for improvements to the handling of feeds. We look forward to your comments and questions in the crawling, indexing and ranking section of our discussion forum for Webmasters. [Translation mine]
I’m not yet sure whether or not that’s ending in a ban of all/most XML documents. I hope they suppress RSS/Atom feeds only, and provide improved ways to search for and within other XML resources.
So what does that mean for blog SEO? Unless Google provides a procedure to prevent feeds from accumulating PageRank whilst allowing access for blog search crawlers that request feeds (I believe something like that is in the works), it’s still a good idea to nofollow all feed links, but there’s absolutely no reason to block them in robots.txt any more.
I think that’s a great move into the right direction, but a preliminary solution, though. The XML structure of feeds isn’t that hard to parse, and there are only so many ways to extract the URL of the HTML page. Then when a relevant feeds lands in a raw result set, Google should display a link to the HTML version on the SERP. What do you think?
2 24 hours later Google published the announcement in English language too.
Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to Entries Comments All Comments