Matt Cutts asks us How should Google handle NOINDEX? That’s a tough question worth thinking twice before you submit a comment to Matt’s post. Here is Matt’s question, all the background information you need, and my opinion.
What is NOINDEX?
Noindex is an indexer directive defined in the Robots Exclusion Protocol (REP) from 1996 for use in robots meta tags. Putting a NOINDEX value in a page’s robots meta tag or X-Robots-Tag tells search engines that they shall not index the page content, but may follow links provided on the page.
To get a grip on NOINDEX’s role in the REP please read my Robots Exclusion Protocol summary at SEOmoz. Also, Google experiments with NOINDEX as crawler directive in robots.txt, more on that later.
How major search engines treat NOINDEX
[Matt Cutts on August 30, 2006]
- Google doesn’t show the page in any way.
- Ask doesn’t show the page in any way.
- MSN shows a URL reference and cached link, but no snippet. Clicking the cached link doesn’t return anything.
- Yahoo! shows a URL reference and cached link, but no snippet. Clicking on the cached link returns the cached page.
Personally, I’d prefer it if every search engine treated the noindex meta tag by not showing a page in the search results at all. [Meanwhile Matt might have a slightly different opinion.]
Google’s experimental support of NOINDEX as crawler directive in robots.txt also includes the DISALLOW functionality (an instruction that forbids crawling), and most probably URIs tagged with NOINDEX in robots.txt cannot accumulate PageRank. In my humble opinion the DISALLOW behavior of NOINDEX in robots.txt is completely wrong, and without any doubt in no way compliant to the Robots Exclusion Protocol.
Matt’s question: How should Google handle NOINDEX in the future?
To simplify Matt’s poll, lets assume he’s talking about NOINDEX as indexer directive, regardless where a Webmaster has put it (robots meta tag, X-Robots-Tag, or robots.txt).
The question is whether Google should completely drop a NOINDEX’ed page from our search results vs. show a reference to the page, or something in between?
Here are the arguments, or pros and cons, for each variant:
- Google should completely drop a NOINDEX’ed page from their search results
Obviously that’s what most Webmasters would prefer:
This is the behavior that we’ve done for the last several years, and webmasters are used to it. The NOINDEX meta tag gives a good way — in fact, one of the only ways — to completely remove all traces of a site from Google (another way is our url removal tool). That’s incredibly useful for webmasters.
The corner case that Google discovers a link and lists it on their SERPs before the page that carries a NOINDEX directive is crawled and deindexed isn’t crucial, and could be avoided by a (new) NOINDEX indexer directive in robots.txt, which is requested by search engines quite frequently. Ok, maybe Google’s BlitzCrawler™ has to request robots.txt more often then.
- Google should show a reference to NOINDEX’ed pages on their SERPs
Search quality and user experience are strong arguments:
Our highest duty has to be to our users, not to an individual webmaster. When a user does a navigational query and we don’t return the right link because of a NOINDEX tag, it hurts the user experience (plus it looks like a Google issue). If a webmaster really wants to be out of Google without even a single trace, they can use Google’s url removal tool. The numbers are small, but we definitely see some sites accidentally remove themselves from Google. For example, if a webmaster adds a NOINDEX meta tag to finish a site and then forgets to remove the tag, the site will stay out of Google until the webmaster realizes what the problem is. In addition, we recently saw a spate of high-profile Korean sites not returned in Google because they all have a NOINDEX meta tag. If high-profile sites like [3 linked examples] aren’t showing up in Google because of the NOINDEX meta tag, that’s bad for users (and thus for Google).
Search quality and searchers’ user experience is also a strong argument for totally delisting NOINDEX’ed pages, because most Webmasters use this indexer directive to keep stuff that doesn’t provide value for searchers out of the search indexes. <polemic>I mean, how much weight have a few Korean sites when it comes to decisions that affect the whole Web?</polemic>
If a Webmaster puts a NOINDEX directive by accident, that’s easy to spot in the site’s stats, considering the volume of traffic that Google controls. I highly doubt that a simple URI reference with an anchor text scrubbed from external links on Google SERPs would heal such a mistake. Also, Matt said that Google could add a NOINDEX check to the Webmaster Console.
The reference to the URI removal tools is out of context, because these tools remove an URI only for a short period of time and all removal requests have to be resubmitted repeatedly every few weeks. NOINDEX on the other hand is a way to keep an URI out of the index as long as this crawler directive is provided.
I’d say the sole argument for listing references to NOINDEX’ed pages that counts is misleading navigational searches. Of course that does not mean that Google may ignore the NOINDEX directive to show –with a linked reference– that they know a resource, despite the fact that the site owner has strictly forbidden such references on SERPs.
- Something in between, Google should find a reasonable way to please both Webmasters and searchers
Quoting Matt again:
The vast majority of webmasters who use NOINDEX do so deliberately and use the meta tag correctly (e.g. for parked domains that they don’t want to show up in Google). Users are most discouraged when they search for a well-known site and can’t find it. What if Google treated NOINDEX differently if the site was well-known? For example, if the site was in the Open Directory, then show a reference to the page even if the site used the NOINDEX meta tag. Otherwise, don’t show the site at all. The majority of webmasters could remove their site from Google, but Google would still return higher-profile sites when users searched for them.
Whether or not a site is popular must not impact a search engine’s respect for a Webmaster’s decision to keep search engines, and their users, out of her realm. That reads like “Hey, Google is popular, so we’ve the right to go to Mountain View to pillage the Googleplex, acquiring everything we can steal for the public domain”. Neither Webmasters nor search engines should mimic Robin Hood. Also, lots of Webmasters highly doubt that Google’s idea of (link) popularity should rule the Web.
Whether or not a site is listed in the ODP directory is definitely not an indicator that can be applied here. Last time I looked the majority of the Web’s content wasn’t listed at DMOZ due to the lack of editors and various other reasons, and that includes gazillions of great and useful resources. I’m not bashing DMOZ here, but as a matter of fact it’s not comprehensive enough to serve as indicator for anything, especially not importance and popularity.
I strongly believe that there’s no such thing as a criterion suitable to mark out a two class Web.
My take: Yes, No, Depends
Google could enhance navigational queries –and even “I feel lucky” queries– that lead to a NOINDEX’ed page with a message like “The best matching result for this query was blocked by the site”. I wouldn’t mind if they mention the URI as long as it’s not linked.
In fact, the problem is the granularity of the existing indexer directives. NOINDEX is neither meant for nor capable of serving that many purposes. It is wrong to assign DISALLOW semantics to NOINDEX, and it is wrong to create two classes of NOINDEX support. Fortunately, we’ve more REP indexer directives that could play a role in this discussion.
NOODP, NOYDIR, NOARCHIVE and/or NOSNIPPET in combination with NOINDEX on a site’s home page, that is either a domain or subdomain, could indicate that search engines must not show references to the URI in question. Otherwise, if no other indexer directives elaborate NOINDEX, search engines could show references to NOINDEX’ed main pages. The majority of navigational search queries should lead to main pages, so that would solve the search quality issues.
Of course that’s not precise enough due to the lack of a specific directive that deals with references to forbidden URIs, but it’s way better than ignoring NOINDEX in its current meaning.
A fair solution: NOREFERENCE
If I’d make the decision at Google and couldn’t live with a best matching search result blocked message, I’d go for a new REP tag:
“NOINDEX, NOREFERENCE” in a robots meta tag –respectively Googlebot meta tag– or X-Robots-Tag forbids search engines to show a reference on their SERPs. In robots.txt this would look like
Search engines would crawl these URIs, and follow their links as long as there’s no NOFOLLOW directive either in robots.txt or a page specific instruction.
NOINDEX without a NOREFERENCE directive would instruct search engines not to index a page, but allows references on SERPs. Supporting this indexer directive both in robots.txt as well as on-the-page (respectively in the HTTP header for X-Robots-Tags) makes it easy to add NOREFERENCE on sites that hate search engine traffic. Also, a syntax variant like
NOINDEX=NOREFERENCE for robots.txt could tell search eniges how they have to treat NOINDEX statements on site level, or even on site area level.
Even more appealing would be
NOINDEX=REFERENCE, because only the very few Webmasters that would like to see their NOINDEX’ed URIs on Google’s SERPs would have to add a directive to their robots.txt at all. Unfortunately, that’s not doable for Google unless they can convice three well known Korean sites to edit their robots.txt.
By the way, don’t miss out on my draft asking for REP tag support in robots.txt!
Anyway: Dear Google, please don’t touch NOINDEX!
Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to Entries Comments All Comments