NOPREVIEW - The missing X-Robots-Tag

Google provides previews of non-HTML resources listed on their SERPs:View PDF as HTML document
These “view as text” and “view as HTML” links are pretty useful when you for example want to scan a PDF document before you clutter your machine’s RAM with 30 megs of useless digital rights management (aka Adobe Reader). You can view contents even when the corresponding application is not installed, Google’s transformed previews should not stuff your maiden box with unwanted malware, etcetera. However, under some circumstances it would make sound sense to have a NOPREVIEW X-Robots-Tag, but unfortunately Google forgot to introduce it yet.

Google is rightfully proud of their capability to transform various file formats to readable HTML or plain text: Adobe Portable Document Format (pdf), Adobe PostScript (ps), Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku), Lotus WordPro (lwp), MacWrite (mw), Microsoft Excel (xls), Microsoft PowerPoint (ppt), Microsoft Word (doc), Microsoft Works (wks, wps, wdb), Microsoft Write (wri), Rich Text Format (rtf), Shockwave Flash (swf), of course Text (ans, txt) plus a couple of “unrecognized” file types like XML. New formats are added from time to time.

According to Adam Lasnik currently there is no way for Webmasters to tell Google not to include the “View as HTML” option. You can try to fool Google’s converters by messing up the non-HTML resource in a way that a sane parser can’t interpret it. Actually, when you search a few minutes you’ll find e.g. PDF files without the preview links on Google’s SERPs. I wouldn’t consider this attempt a bullet-proof nor future-proof tactic though, because Google is pretty intent on improving their conversion/interpretation process.

I like the previews not only because sometimes they allow me to read documents behind a login screen. That’s a loophole Google should close as soon as possible. When for example PDF documents or Excel sheets are crawlable but not viewable for searchers (at least not with the second click) that’s plain annoying both for the site as well as for the search engine user.

With HTML documents the Webmaster can apply a NOARCHIVE crawler directive to prevent non paying visitors from lurking via Google’s cached page copies. Thanks to the newish REP header tags one can do that with non-HTML resources too, but neither NOARCHIVE nor NOSNIPPET etch away the “view-as HTML” link.

<speculation>Is the lack of a NOPREVIEW crawler directive just an oversight, or is it stuck in the pipeline because Google is working on supplemental components and concepts? Google’s yet inconsistent handling of subscription content comes to mind as an ideal playground for such a robots directive in combination with a policy change.</speculation>

Anyways, there is a need for a NOPREVIEW robots tag, so why not implement it now? Thanks in advance.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

4 Comments to "NOPREVIEW - The missing X-Robots-Tag"

  1. […] addition to the robots exclusion protocol. Of course we’d then need Noarchive:, Nofollow: and Nopreview: too, probably more but I’m not really in a greedy mood […]

  2. […] suppress these helpful preview links, which can be pretty annoying in some cases. See NOPREVIEW: The missing X-Robots-Tag. Google’s robots.txt validator: Line 33: Nopreview: /repstuff/nopreview.pdf Syntax not […]

  3. […] Don’t create/link an HTML preview of this resource. That’s interesting for subscriptions sites and applies mostly to PDFs, Word documents, spread sheets, presentations, and other non-HTML resources. More information here. […]

  4. Sebastian on 14 September, 2009  #link

    Microsoft supports the NOPREVIEW directive now, both in META elements and X-Robots-Tag headers as well, to disable the “hover preview” feature on Bing SERPs.

Leave a reply


[If you don't do the math, or the answer is wrong, you'd better have saved your comment before hitting submit. Here is why.]

Be nice and feel free to link out when a link adds value to your comment. More in my comment policy.