URI canonicalization with an X-Canonical-URI HTTP header
Dear search engines, you owe me one for persistently nagging you on your bugs, flaws and faults. In other words, I’m desperately in need of a good reason to praise your wisdom and whatnot. From this year’s x-mas wish list:
All search engines obey the X-Canonical-URI HTTP header
The rel=canonical link element is a great tool, at least if applied properly, but sometimes it’s a royal pain in the ass.
Inserting rel=canonical link elements into huge conglomerates of cluttered scripts and static files is a nightmare. Sometimes the scripts creating the most URI clutter are compiled, and there’s no way to get a hand on the source code to change them.
Also, lots of resources can’t be stuffed with HTML’s link elements, for example dynamically created PDFs, plain text files, or images.
It’s not always possible to revamp old scripts, some projects just lack a suitable budget. And in some cases 301 redirects aren’t a doable option, for example when the destination URI is #5 in a redirect chain that can’t get shortened because the redirects are performed by a 3rd party that doesn’t cooperate.
This one, on the other hand, is elegant and scalable:
if (messedUp($_SERVER["REQUEST_URI"])) {
header(”X-Canonical-URI: $canonicalUri”);
}
Or:
header(”Link: <http://example.com/canonical-uri/>; rel=canonical”);
Coding an HTTP request handler that takes care of URI canonicalization before any script gets invoked, and before any static file gets served, is the way to go for such fuddy-duddy sites.
By the way, having all URI canonicalization routines in one piece of code is way more transparent, and way better manageable, than a bazillion of isolated link elements spread over tons of resources. So that might be a feasible procedure for non-ancient sites, too.
Dear search engines, if you make that happen, I promise that I don’t tweet your products with a “#crap” hashtag for the whole rest of this year. Deal?
And yes, I know I’m somewhat late, two days before x-mas, but you’ve got smart developers, haven’t you? So please, go get your ‘code monkeys’ to work and surprise me. Thanks.
Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb Subscribe to ![]() ![]() ![]() |
Sebastian | Web development, MSN, Crawler Directives, SEO, Yahoo, Google | Related posts
That’s an awesome idea, actually! I’m not quite sure how long it would take them to implement it, seeing as they’re much too busy adding more garbage to their SERPS lately
Since the canonical URL was implemented as a LINK element, you can use the already-existing LINK header instead of making a new one.
The latest spec is still in draft, but it’s been around since before HTTP/1.1.
With the LINK header, a canonical link could look like this if the current URL was the canonical version:
Link: <http://example.com/canonical-uri/>; rel=canonical
Or like this if it wasn’t:
Link: <http://example.com/canonical-uri/>; rel=”canonical”
In theory, this should already work (for example, you can include style sheets via your headers in this fashion).
(P.S. Your comment form doesn’t support plus signs in email addresses.)
You are right,specially the issue of “dynamically created PDFs, plain text files, or images” is very common in various CMS like Joomla that offer buttons to view the content in PDF or Print Format etc. And most of the CMS users are not technically savvy enough to implement these “best practices” on their own. I think Search Engines,specially, Google is getting tired of crawling of the web and they want to limit things by useless technical implementations.
[…] URI canonicalization with an X-Canonical-URI HTTP header, sebastians-pamphlets.com […]
Want.
FYI: Google considers rel-canonical HTTP headers, but is not eager to implement them anytime soon. Please learn more at Maile Ohye’s blog and speak out!
Another candidate: rel-alternate (unifying content under multilingual templates):
SUCCESS
Google Web Search supporting rel=”canonical” HTTP Headers
Friday, June 17, 2011 at 1:05 AM
Webmaster level: Advanced
Posted by Pierre Far, Webmaster Trends Analyst on the Google Webmaster Blog
Based on your feedback, we’re happy to announce that Google web search now supports link rel=”canonical” relationships specified in HTTP headers as per the syntax described in section 5 of IETF RFC 5988.
Webmasters can use rel=”canonical” HTTP headers to signal the canonical URL for both HTML documents and other types of content such as PDF files.
To see the rel=”canonical” HTTP header in action, let’s look at the scenario of a website offering a white paper both as an HTML page and as a downloadable PDF alternative, under these two URLs:
http://www.example.com/white-paper.html
http://www.example.com/white-paper.pdf
In this case, the webmaster can signal to Google that the canonical URL for the PDF download is the HTML document by using a rel=”canonical” HTTP header when the PDF file is requested; for example:
Request Header
GET /white-paper.pdf HTTP/1.1
Host: www.example.com
(...rest of HTTP request headers...)
Response Header
HTTP/1.1 200 OK
Content-Type: application/pdf
Link: <http://www.example.com/white-paper.html>; rel="canonical"
Content-Length: 785710
(... rest of HTTP response headers...)
Another common situation in which rel=”canonical” HTTP headers may help is when a website serves the same file from multiple URLs (for example when using a content distribution network) and the webmaster wishes to signal to Google the preferred URL.
We currently support these link header elements for web search only. As we see how webmasters are using these elements, we’re hoping to add support for them in our other properties. For more information, please see our Help Center articles about canonicalization and the rel=”canonical” element. If you have any questions, please ask in our Webmaster Help Forum.
Thank You!
Maybe the squeaky wheel really DOES get the oil, eh, Sebastian?
Anyhow… Merry Christmas early… or late… whatever! At least they did it!
[…] richiesta di questa funzionalità era stata in qualche modo palesata dal buon Sebastian Pamphlets nel suo blog nel lontano dicembre del 2009, dove chiedeva a Google una funzionalità simile come […]
[…] after introducing the tag in 2009 there was some feedback (including this one) saying that it would have been a better idea to also allow HTTP headers as a way of implementing […]