No more RSS feeds in Google’s search results

Google killing RSS feedsFolks try all sorts of naughty things when by accident a blog’s feed outranks the HTML version of a post. Usually that happened mostly to not that popular blogs, or with very old posts and categorized feeds that contain ancient articles.

The problem seems to be that Google’s Web search doesn’t understand the XML structure of feeds, so that a feed’s textual contents get indexed like stuff from text files. Due to “subscribe” buttons and other links, feeds can gather more PageRank than some HTML pages. Interestingly .xml is considered an unknown file type, and advanced search doesn’t provide a way to search within XML files.

Now that has changed1. Googler Bogdan Stănescu posts on the German Webmaster blog2 We remove feeds from our search results:

As Webmasters many of you were probably worried that your RSS or Atom feeds could outrank the accompanying HTML pages in Google’s search results. The emergence of feeds in our search results could be a poor user experience:

1. Feeds increase the probability that the user gets the same search result twice.

2. Users who click on the feed link on a SERP may miss out on valuable content, which is only available on the HTML page referenced in the XML file.

For these reasons, we have removed feeds from our Web search results - with the exception of podcasts (feeds with media files).

[…] We are aware that in addition to the podcasts out there some feeds exist that are not linked with an HTML page, and that is why it is not quite ideal to remove all feeds from the search results. We’re still open for feedback and suggestions for improvements to the handling of feeds. We look forward to your comments and questions in the crawling, indexing and ranking section of our discussion forum for Webmasters. [Translation mine]

I’m not yet sure whether or not that’s ending in a ban of all/most XML documents. I hope they suppress RSS/Atom feeds only, and provide improved ways to search for and within other XML resources.

So what does that mean for blog SEO? Unless Google provides a procedure to prevent feeds from accumulating PageRank whilst allowing access for blog search crawlers that request feeds (I believe something like that is in the works), it’s still a good idea to nofollow all feed links, but there’s absolutely no reason to block them in robots.txt any more.

I think that’s a great move into the right direction, but a preliminary solution, though. The XML structure of feeds isn’t that hard to parse, and there are only so many ways to extract the URL of the HTML page. Then when a relevant feeds lands in a raw result set, Google should display a link to the HTML version on the SERP. What do you think?


1 Danny reminded me that according to Matt Cutts that’s going on for a few months now.

2 24 hours later Google published the announcement in English language too.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

10 Comments to "No more RSS feeds in Google's search results"

  1. Michael VanDeMar on 18 December, 2007  #link

    Sebastian, are you sure that nofollowing them is a good idea? I have a site that gets most of it’s spidering done from Gbot following the news feed on it. And wouldn’t that also impact Google Blog Search as well?

  2. Igor The Troll on 18 December, 2007  #link

    Great tip Sebastian, I am just getting my blog primed to go live and it was something I was conserned with.

    So I will disallow feeds in mo robot.txt but only for Googlebot. Probably Google has another spider for indexing feeds, do you know its name?

    I mean we do not want to leak PR to content that will not be indexed anyway. Maybe it maybe better to do a rel=”nofollow” on the feeds as well, or only do rel=”nofollow= I am just wondering why even bother leaving loose feeds to Googlebot if it will be removed. “We remove feeds from our search results!” To me this could mean they index them, and then remove them from the search results?

    So pertty much the question is which Google bot indexes feeds!

    Thank you,
    Igor

  3. Sebastian on 18 December, 2007  #link

    Michael, other than Technorati Google Blog Search concentrates on the feeds (that doesn’t mean they ignore the surrounding stuff, blogrolls and such). Those feed URLs can be gathered from ping services and auto-discovery links in the HEAD section. Google blog search doesn’t need on-page links to feeds to discover and process them.

    As for crawling, you could submit your RSS feeds as sitemaps to Google, Yahoo, and MSN, and add their URLs to your robots.txt for Ask.

    Feeds that are indexed but suppressed by the query engine are plain PageRank leaks, that’s the sole reason to nofollow them.

  4. Sebastian on 18 December, 2007  #link

    Igor, your shouldn’t block your feeds in robots.txt, but rel-nofollow on-page links to them.

    If you really want to block them in robots.txt, then try something like

    User-agent: Googlebot
    # block RSS feeds:
    Disallow: */feed$
    Disallow: */feed$
    # block totally and utterly useless archives
    # (provided your permalink URIs don’t begin with the year):
    Disallow: /2007
    Disallow: /2008

  5. Igor The Troll on 18 December, 2007  #link

    Thanks. Actually Googlebot has nothing to do with feeds, so might as well disallow him and rel=”nofollow” to him.

    Here is one back @You
    http://blogsearch.google.com/ping

  6. Sebastian on 18 December, 2007  #link

    Don’t mix up Feedfetcher and Googlebot. Feedfetcher doesn’t obey robots.txt, hence it’s worthless to add a Feedfetcher section.

  7. JLH on 18 December, 2007  #link

    I blocked and nofollowed mine just because I hate seeing a search result for something, click on it, only to find out its a feed. I didn’t want that to happen to my visitors. Actually Matt Cutts blog is what inspired me to do it, often I use Google to search his site to find something I thought I remember him writing about only to hit a feed. I’m glad they made a move on it. Since my German is about as good as well, any other language than English (I know, I’m an ugly American)thanks for pointing this out.

  8. SEO Canada on 18 December, 2007  #link

    Now if only they would remove Wiki from the SERPs, things just might be perfect :)

  9. SearchCap: The Day In Search, December 18, 2007…

    Below is what happened in search today, as reported on Search Engine Land and from other places across the web…….

  10. […] recently said that they are going to stop indexing RSS feeds (first seen here), but I’ve still seen RSS feeds in the search results as recently as a few days […]

Leave a reply


[If you don't do the math, or the answer is wrong, you'd better have saved your comment before hitting submit. Here is why.]

Be nice and feel free to link out when a link adds value to your comment. More in my comment policy.