URI canonicalization with an X-Canonical-URI HTTP header

X-Canonical-URI HTTO HeaderDear search engines, you owe me one for persistently nagging you on your bugs, flaws and faults. In other words, I’m desperately in need of a good reason to praise your wisdom and whatnot. From this year’s x-mas wish list:

All search engines obey the X-Canonical-URI HTTP header

The rel=canonical link element is a great tool, at least if applied properly, but sometimes it’s a royal pain in the ass.

Inserting rel=canonical link elements into huge conglomerates of cluttered scripts and static files is a nightmare. Sometimes the scripts creating the most URI clutter are compiled, and there’s no way to get a hand on the source code to change them.

Also, lots of resources can’t be stuffed with HTML’s link elements, for example dynamically created PDFs, plain text files, or images.

It’s not always possible to revamp old scripts, some projects just lack a suitable budget. And in some cases 301 redirects aren’t a doable option, for example when the destination URI is #5 in a redirect chain that can’t get shortened because the redirects are performed by a 3rd party that doesn’t cooperate.

This one, on the other hand, is elegant and scalable:

if (messedUp($_SERVER["REQUEST_URI"])) {
header(”X-Canonical-URI: $canonicalUri”);
}

Or:
header(”Link: <http://example.com/canonical-uri/>; rel=canonical”);

Coding an HTTP request handler that takes care of URI canonicalization before any script gets invoked, and before any static file gets served, is the way to go for such fuddy-duddy sites.

By the way, having all URI canonicalization routines in one piece of code is way more transparent, and way better manageable, than a bazillion of isolated link elements spread over tons of resources. So that might be a feasible procedure for non-ancient sites, too.

red crab blackmailing search enginesDear search engines, if you make that happen, I promise that I don’t tweet your products with a “#crap” hashtag for the whole rest of this year. Deal?

And yes, I know I’m somewhat late, two days before x-mas, but you’ve got smart developers, haven’t you? So please, go get your ‘code monkeys’ to work and surprise me. Thanks.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

11 Comments to "URI canonicalization with an X-Canonical-URI HTTP header"

  1. Clint Lenard on 22 December, 2009  #link

    That’s an awesome idea, actually! I’m not quite sure how long it would take them to implement it, seeing as they’re much too busy adding more garbage to their SERPS lately ;)

  2. Vitorio on 22 December, 2009  #link

    Since the canonical URL was implemented as a LINK element, you can use the already-existing LINK header instead of making a new one.

    The latest spec is still in draft, but it’s been around since before HTTP/1.1.

    With the LINK header, a canonical link could look like this if the current URL was the canonical version:

    Link: <http://example.com/canonical-uri/>; rel=canonical

    Or like this if it wasn’t:

    Link: <http://example.com/canonical-uri/>; rel=”canonical”

    In theory, this should already work (for example, you can include style sheets via your headers in this fashion).

    (P.S. Your comment form doesn’t support plus signs in email addresses.)

  3. mohsin on 22 December, 2009  #link

    You are right,specially the issue of “dynamically created PDFs, plain text files, or images” is very common in various CMS like Joomla that offer buttons to view the content in PDF or Print Format etc. And most of the CMS users are not technically savvy enough to implement these “best practices” on their own. I think Search Engines,specially, Google is getting tired of crawling of the web and they want to limit things by useless technical implementations.

  4. […] URI canonicalization with an X-Canonical-URI HTTP header, sebastians-pamphlets.com […]

  5. Tinus on 29 December, 2009  #link

    Want.

  6. Sebastian on 18 May, 2010  #link

    FYI: Google considers rel-canonical HTTP headers, but is not eager to implement them anytime soon. Please learn more at Maile Ohye’s blog and speak out!

  7. Sebastian on 9 September, 2010  #link

    Another candidate: rel-alternate (unifying content under multilingual templates):

    If you have a global site containing pages where the:
    template (i.e. side navigation, footer) is machine-translated into various languages,
    main content remains unchanged, creating largely duplicate pages,
    and sometimes search results direct users to the wrong language, we’d like to help you better target your international/multilingual audience through:


    <link rel=”alternate” hreflang="a-different-language" href="http://url-of-the-different-language-page" />

  8. Sebastian on 17 June, 2011  #link

    SUCCESS

    Google Web Search supporting rel=”canonical” HTTP Headers
    Friday, June 17, 2011 at 1:05 AM
    Webmaster level: Advanced
    Posted by Pierre Far, Webmaster Trends Analyst on the Google Webmaster Blog

    Based on your feedback, we’re happy to announce that Google web search now supports link rel=”canonical” relationships specified in HTTP headers as per the syntax described in section 5 of IETF RFC 5988.

    Webmasters can use rel=”canonical” HTTP headers to signal the canonical URL for both HTML documents and other types of content such as PDF files.
    To see the rel=”canonical” HTTP header in action, let’s look at the scenario of a website offering a white paper both as an HTML page and as a downloadable PDF alternative, under these two URLs:

    http://www.example.com/white-paper.html

    http://www.example.com/white-paper.pdf

    In this case, the webmaster can signal to Google that the canonical URL for the PDF download is the HTML document by using a rel=”canonical” HTTP header when the PDF file is requested; for example:

    Request Header

    GET /white-paper.pdf HTTP/1.1

    Host: www.example.com

    (...rest of HTTP request headers...)

    Response Header

    HTTP/1.1 200 OK

    Content-Type: application/pdf

    Link: <http://www.example.com/white-paper.html>; rel="canonical"

    Content-Length: 785710

    (... rest of HTTP response headers...)

    Another common situation in which rel=”canonical” HTTP headers may help is when a website serves the same file from multiple URLs (for example when using a content distribution network) and the webmaster wishes to signal to Google the preferred URL.

    We currently support these link header elements for web search only. As we see how webmasters are using these elements, we’re hoping to add support for them in our other properties. For more information, please see our Help Center articles about canonicalization and the rel=”canonical” element. If you have any questions, please ask in our Webmaster Help Forum.


    Thank You!

  9. Doc Sheldon on 17 June, 2011  #link

    Maybe the squeaky wheel really DOES get the oil, eh, Sebastian?

    Anyhow… Merry Christmas early… or late… whatever! At least they did it!

  10. […] richiesta di questa funzionalità era stata in qualche modo palesata dal buon Sebastian Pamphlets nel suo blog nel lontano dicembre del 2009, dove chiedeva a Google una funzionalità simile come […]

  11. […] after introducing the tag in 2009 there was some feedback (including this one) saying that it would have been a better idea to also allow HTTP headers as a way of implementing […]

Leave a reply


[If you don't do the math, or the answer is wrong, you'd better have saved your comment before hitting submit. Here is why.]

Be nice and feel free to link out when a link adds value to your comment. More in my comment policy.