Prepare for a dose of techy tin foil hattery. [Skip rant] Again, I’m going to rant about a nightmare that Twitter & Co created with their crappy, thoughtless and shortsighted software designs: URI shorteners (yup, it’s URI, not URL).
Recap: Each and every 3rd party URI shortener is evil by design. Those questionable services do/will steal your traffic and your Google juice, mislead and piss off your potential
visitors customers, and hurt you in countless other ways. If you consider yourself south of sanity, do not make use of shortened URIs you don’t own.
Actually, this pamphlet is not about sloppy social media users who shoot themselves in both feet, and it’s not about unscrupulous micro blogging platforms that force their users to hand over their assets to felonious traffic thieves. It’s about search engines that, in my humble opinion, handle the sURL dilemma totally wrong.
Some of my claims are based on experiments that I’m not willing to reveal (yet). For example I won’t explain sneaky URI hijacking or how I stole a portion of tinyurl.com’s search engine traffic with a shortened URI, passing searchers to a charity site, although it seems the search engine I’ve gamed has closed this particular loophole now. There’re still way too much playgrounds for deceptive tactics involving shortened URIs …
How should a search engine handle a shortened URI?
Handling an URI as shortened URL requires a bullet proof method to detect shortened URIs. That’s a breeze.
- Redirect patterns: URI shorteners receive lots of external inbound links that get redirected to 3rd party sites. Linking pages, stopovers and destination pages usually reside on different domains. The method of redirection can vary. Most URI shorteners perform 301 redirects, some use 302 or 307 HTTP response codes, some frame the destination page displaying ads on the top frame, and I’ve seen even a few of them making use of meta refreshs and client sided redirects. Search engines can detect all those procedures.
- Link appearance: redirecting URIs that belong to URI shorteners often appear on pages and in feeds hosted by social media services (Twitter, Facebook & Co).
- Seed: trusted sources like LongURL.org provide lists of domains owned by URI shortening services. Social media outlets providing their own URI shorteners don’t hide server name patterns (like su.pr …).
- Self exposure: the root index pages of URI shorteners, as well as other pages on those domains that serve a 200 response code, usually mention explicit terms like “shorten your URL” et cetera.
- URI length: the length of an URI string, if less or equal 20 characters, is an indicator at most, because some URI shortening services offer keyword rich short URIs, and many sites provide natural URIs this short.
Search engine crawlers bouncing at short URIs should do a lookup, following the complete chain of redirects. (Some whacky services shorten everything that looks like an URI, even shortened URIs, or do a lookup themselves replacing the original short URI with another short URI that they can track. Yup, that’s some crazy insanity.)
Each and every stopover (shortened URI) should get indexed as an alias of the destination page, but must not appear on SERPs unless the search query contains the short URI or the destination URI (that means not on [site:tinyurl.com] SERPs, but on a [site:tinyurl.com shortURI] or a [destinationURI] search result page). 3rd party stopovers mustn’t gain reputation (PageRank™, anchor text, or whatever), regardless the method of redirection. All the link juice belongs to the destination page.
In other words: search engines should make use of their knowledge of shortened URIs in response to navigational search queries. In fact, search engines could even solve the problem of vanished and abused short URIs.
Now let’s see how major search engines handle shortened URIs, and how they could improve their SERPs.
Bing doesn’t get redirects at all
Oh what a mess. The candidate from Redmond fails totally on understanding the HTTP protocol. Their search index is flooded with a bazillion of URI-only listings that all do a 301 redirect, more than 200,000 from tinyurl.com alone. Also, you’ll find URIs that do a permanent redirect and have nothing to do with URI shortening in their index, too.
I can’t be bothered with checking what Bing does in response to other redirects, since the 301 test fails so badly. Clicking on their first results for [site:tinyurl.com], I’ve noticed that many lead to
mailto://working-email-addy type of destinations. Dear Bing, please remove those search results as soon as possible, before anyone figures out how to use your SERPs/APIs to launch massive email spam campaigns. As for tips on how to improve your short-URI-SERPs, please learn more under Yahoo and Google.
Yahoo does an awesome job, with a tiny exception
Yahoo has done a better job. They index short URIs and show the destination page, at least via their site explorer. When I search for a tinyURL, the SERP link points to the URI shortener, that could get improved by linking to the destination page.
By the way, Yahoo is the only search engine that handles abusive short-URIs totally right (I will not elaborate on this issue, so please don’t ask for detailled information if you’re not a SE engineer). Yahoo bravely passed the 301 test, as well as others (including pretty evil tactics). I so hope that MSN will adopt Yahoo’s bright logic before Bing overtakes Yahoo search. By the way, that can be accomplished without sending out spammy bots (hint2bing).
Google does it by the book, but there’s room for improvements
As for tinyURLs, Google indexes only pages on the tinyurl.com domain, including previews. Unfortunately, the snippets don’t provide a link to the destination page. Although that’s the expected behavior (those URIs aren’t linked on the crawled page), that’s sad. At least Google didn’t fail on the 301 test.
As for the somewhat evil tactis I’ve applied in my tests so far, Google fell in love with some abusive short-URIs. Google –under particular circumstances– indexes shortened URIs that game Googlebot, having sent SERP traffic to sneakily shortened URIs (that face the searcher with huge ads) instead of the destination page. Since I’ve begun to deploy sneaky sURLs, Google greatly improved their spam filters, but they’re not yet perfect.
Since Google is responsible for most of this planet’s SERP traffic, I’ve put better sURL handling at the very top of my xmas wish list.
About abusive short URIs
Shortened URIs do poison the Internet. They vanish, alter their destination, mislead surfers … in other words they are abusive by definition. There’s no such thing as a persistent short URI!
Long time ago Tim Berners-Lee told you that
URI shorteners are evil fucking with URIs is a very bad habit. Did you listen? Do you make use of shortened URIs? If you post URIs that get shortened at Twitter, or if you make use of 3rd party URI shorteners elsewhere, consider yourself trapped into a low-life traffic theft scam. Shame on you, and shame on Twitter & Co.
Besides my somewhat shady experiments that hijacked URIs, stole SERP positions, and converted “borrowed” SERP traffic, there are so many other ways to abuse shortened URIs. Many of them are outright evil. Many of them do hurt your kids, and mine. Basically, that’s not any search engine’s problem, but search engines could help us getting rid of the root of all sURL evil by handling shortened URIs with common sense, even when the last short URI has vanished.
Fight shortened URIs!
It’s up to you. Go stop it. As long as you can’t avoid URI shortening, roll your own URI shortener and make sure it can’t get abused. For the sake of our children, do not use or support 3rd party URI shorteners. Deprive the livelihood of these utterly useless scumbags.
Unfortunately, as a father and as a webmaster, I don’t believe in common sense applied by social media services. Hence, I see a “Twitter actively bypasses safe-search filters tricking my children into viewing hardcore porn” post coming. Dear Twitter & Co. — and that addresses all services that make use of or transport shortened URIs — put and end to shortened URIs. Now!
Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to Entries Comments All Comments