SEO Bullshit: Mimicking a file system in URIs

Posted on 5 February, 2010

file system like URIs Way back in the WWW’s early Jurassic, micro computer based Web development tools sneakily begun poisoning the formerly ideal world of the Internet. All of a sudden we saw ‘.htm’ URIs, because CP/M and later on PC-DOS file extensions were limited to 3 characters. Truncating the ‘language’ part of HTML was bad enough. Actually, fucking with well established naming conventions wasn’t just a malady, but a symptom of a worse world wide pandemic.

Unfortunately, in order to bring Web publishing to the mere mortals (folks who could afford a micro computer), software developers invented DOS-like restrictions the Web wasn’t designed for. Web design tools maintained files on DOS file systems. FTP clients managed to convert backslashes originating from DOS file systems to slashes on UNIX servers, and vice versa (long before NT 3.51 and IIS). Directory names / file names equalled URIs. Most Web sites were static.

None of those cheap but fancy PC based Web design tools came with a mapping of objects (locally stored as files back then) to URIs pointing to Web resources. Despite Tim Berners-Lee’s warnings (like “It is the the duty of a Webmaster to allocate URIs which you will be able to stand by in 2 years, in 20 years, in 200 years. This needs thought, and organization, and commitment.“). The technology used to create a resource named its unique identifier (URI). That’s as absurd as wearing diapers a whole live long.

Newbie Web designers grew up with this flawed concept, and never bothered to research the Web’s fundamentals. In their limited view of the Web, a URI was a mirrored version of a file name and its location on their local machine, and everything served from /cgi-bin/ had to be blocked in robots.txt, because all dynamic stuff was evil.

Today, those former newbies consider themselves oldtimers. Actually, they’re still greenhorns, because they’ve never learned that URIs have nothing to do with files, directories, or a Web resources’s (current) underlying technology (as in .php3 for PHP version 3.x, .shtml for SSI, …).

Technology evolves, even changes, but (valuable) contents tend to stay. URIs should solely address a piece of content, they must not change when the technology used to serve those contents changes. That means strings like ‘.html’ or folder names must not be used in URIs.

Many of those notorious greenhorns offer their equally ignorant clients Web development and SEO services today. They might have managed to handle dynamic contents by now (thanks to osCommerce, WordPress and other CMSs), but they’re still stuck with ancient paradigms that were never meant to exist on the Internet.

They might have discovered that search engines are capable of crawling and indexing dynamic contents (URIs with query strings) nowadays, but they still treat them as dumb bots — as if Googlebot or Slurp weren’t more sophisticated than Altavista’s Scooter of 1998.

They might even develop trendy crap (version 2.0 with nifty rounded corners) today, but they still don’t get IT. Whatever IT is, it doesn’t deserve an URI like /category/vendor/product/color/size/crap.htm.

Why hierarchical URIs (expressing breadcrumbs or whatnot) are utter crap (SEO-wise as well as from a developer’s POV) is explained here:

SEO Toxin

I’ve published my rant “Directory-Like URI Structures Are SEO Bullshit” on SEO Bullshit dot com for a reason.

You should keep an eye on this new blog. Subscribe to its RSS feed. Watch its Twitter account.

If it’s about SEO and it’s there, it’s most probably bullshit. If it’s bullshit, avoid it.

If you plan to spam the SEO blogosphere with your half-assed newbie thoughts (especially when you’re an unconvinceable ‘oldtimer’), consider obeying this rule of thumb:

The top minus one reason to publish SEO stupidity is: You’ll end up here.

Of course that doesn’t mean newbies shouldn’t speak out. I’m just sick of newbies who sell their half-assed brain farts as SEO advice to anyone. Noobs should read, ask, listen, learn, practice, evolve. Until they become pros. As a plain Web developer, I can tell from my own experience that listening to SEO professionals is worth every minute of your time.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

Sebastian | Web development, Crap, SEO | Related posts

17 Comments to "SEO Bullshit: Mimicking a file system in URIs"

SEO Mofo on 5 February, 2010 #link

As a plain Web developer, I can tell from my own experience that listening to SEO professionals is worth every minute of your time.

Therein lies the problem: I’ve never met an SEO who hesitates to call themselves a professional. The newbs need a way to distinguish between the SEO professionals they should listen to, and the “SEO professionals” they should ignore.

Hopefully SEObullshit.com will help them see the difference.
Sebastian on 6 February, 2010 #link

Darren, given the huge amount of crap produced by the SEO blogosphere, on webmaster hangouts, and wherenot, most probably the only way to separate bullshit from wisdom and worthy advice would be a white list.
SearchCap: The Day In Search, February 8, 2010 on 8 February, 2010 #link

[…] SEO Bullshit: Mimicking a file system in URIs, sebastians-pamphlets.com […]
jason on 8 February, 2010 #link

and here i thought i was going to read something about virtual folders… instead i read that /category/vendor/product/color/size/crap.htm is crap, yet i’ll argue that such is ok for something truly static and temporary while something dynamic and permanent gets built, with thanks for some knowledge gained here

just here to say your content is the shit, care to weigh in on the “best practice” for categories found here?

http://www.seomoz.org/img/upload/Cheat Sheet - SE Indexing Limits.png

[I’d say that’s a fucking boring 404 page]
cliff on 18 February, 2010 #link

This looks like it will become a new fav of mine. I have been a fan of Sebastian for a good while now and SEOBS is now up there as well.

While I have been doing some amateur league programming for years, mostly demo type stuff to sell a concept and then hire a pro to do the real work, I’m not new to SEO as a concept but very new to it in practice.

Sebastian you hit it right on the head in this article, I as a user don’t want to traverse more than 2-3 layers of a site before I get where I want to be, but as a designer find myself time and again creating categorization hierarchies that are many layers deep and totally unnecessary.

Google definitely promotes this problem still, even in the most recent version of the Google SEO Starter Guide, v1.1 November 2008, under the heading of ‘Make your site easier to navigate’ a categorized hierarchical structure is illustrated up to 4 layers deep and described as “The directory structure for our small website…”

While the text is clearly speaking about page flow and navigation, calling the illustration a directory structure and inferring that the URI implementation should match only further propagates this problem.

Keep up the good work guys, this article has certainly opened my eyes to a problem I had never really considered
Your right Sebastian, I never really dug into the intended usage docs, time to go see what else I missed.
cliff on 18 February, 2010 #link

Even the most recent version of Google’s SEO Starter Guide promotes matching the directory structure to URI implementation under the heading “Make your site easier to navigate”

The text is clearly speaking about page flow and leading to a sitemap topic, but the image shows and is titled “the directory structure for our small website…”

Thanks for the work guys, I’m guilty myself of not researching as deeply as I should have, time to go find out exactly what was intended, not just what Google said to do.
jason on 18 February, 2010 #link

i’m blaming their formatting model:

so what do you think about the best practice?
mark rushworth on 11 March, 2010 #link

I dont get it, google in their recently leaked document go on and on about folder based structures and even point to it being a way of determining site links?

are you advocating everything being flat 1 tier?

whats the benefits of this?

Theres no doubt that sites like the bbc make good use of folder like structures??
Sebastian on 12 March, 2010 #link

Mark, I didn’t say you can’t use subdirectories to store files. I said that storage location and URIs have nothing to do with each other. You can even store your stuff outside the Web server’s reach and provide meaningful URIs. Just because a webmaster manages content in an hierarchical directory structure, that doesn’t mean that using this hierarchy in URIs is a good idea.

Having everything “flat”, that is the URI’s path identifies the resource without using slashes, is an option, and just that. You can do it, combine it with slash delimited paths, query strings, … whatever. Always do what’s best for the actual site, and don’t listen to crappy advice that tells you otherwise, even when it comes from a major search engine and is beginner level stuff trying to bring a first understanding of what information architecture is to the noobs.

Just bear in mind that neither search engines nor human vistors care much about your URIs - they follow links, and only links. So if it’s difficult to create a hierarchical structure (usually it’s raping common sense at some point), then don’t do it, or do it just for parts of your content where it makes sense.
mark rushworth on 12 March, 2010 #link

LOL thanks for clearing that up… must have been the time in the morning and maybe i needed a coffee to fully comprehend the post lol.
Jonah Stein on 13 March, 2010 #link

Sebastian

I have to disagree with you for several reasons.
1. Google is now displaying breadcrumbs in SERP and they are more likely to do so based on hierarchies.
2. Hierarchically designed IA makes it MUCH easier to analyze indexation and web traffic.
3. While internal linking structures are clearly driving SERP, it is much easier to understand and visualize your internal linking structures if you have a hierarchical structure.
4. Someday, you may need to move some content around or do other tricky and inconvenient things. You will thank yourself if you took the time to build out a really solid IA first.
5. Believe it or not, some users do see the URL and it helps establish information scent.
Sebastian on 15 March, 2010 #link
Jonah, you don’t need to disagree. Just get familiar with the concept, and think a bit further.
1. Google scrapes breadcrumbs from links, not from directory structures in URIs. For example Google can perfectly understand a path to the root on a page with a query string like ?category=widgets&state=ny&country=us, or even /item=11, provided there’s a meaningful breadcrumb navigation given on this page. Breadcrumbs describe a default navigation path (back to the root, usually), regardless whether there’s a hierarchy or not. Often there’s more than one logical way from the root to a particular page. That’s why setting one of many possible paths in stone is a bad idea.
2. When you put that as a SOP, it’s simply not true. As long as there’s a hierarchy in real life, then it can make sense, but there are more ways to implement it than pseudo file system hierarchies. Lots of things in real life aren’t organized in natural hierarchies, though. Artificial hierarchies aren’t helpful, esp. not in analytics.
3. If that would be true, you could provide me with a model to organize all Twitter users in a hierarchy. You can’t do that.
4. The opposite is true. When you have to move content, it helps when the content isn’t organized in a hierarchy that has impact on navigation and whatnot. I agree that a solid IA helps, but a solid IA is an IA that doesn’t rely on (artificial) hierarchies, but works with networks of intermeshed nodes and similar concepts instead.
5. A URI like /tom-clancy-all-titles is way more meaningful, more useful, more bookmarker friendly… than /books/authors/us/clancy-tom/titles/all. The page served from /tom-clancy-all-titles can very well have a breadcrumb navigation like Books:Authors:USA:Tom Clancy:All titles.
Jonah Stein on 15 March, 2010 #link

Whether or not you actually have “/” to mimic a file structure, if your parameters don’t have a hierarchy and a consistent order, you are asking for trouble. If some urls say state=ca&city=san-francisco and others say city=san-francisco&state=ca, you are going to create havoc and canonicalization nightmares. Meanwhile, many tools are much easier to use when you have hierarchies in place and it is certainly easier to handle things like robots.txt, htaccess, etc.

While I agree that a site like twitter doesn’t really confirm to this approach, it is still very solid SEO for 90% of larger sites to build URLs based on a hierarchical IA.
Sebastian on 15 March, 2010 #link

Consistent order, yes, absolutely. Hierarchy? Nope, technically you can do it right using just one query string parameter with totally meaningless values. As for tools only working with URI hierarchies … well, if I’d build IAs based on the requirements of crappy reporting tools, I’d play in the wrong game.
Sean Ruiz on 9 April, 2010 #link

April 9, 2010
Sebastian,
I respect the insight you have on a variety of meaningful topics. The point the article made about URIs not needing to match underlying technology is irrefutable. My opinion is also that URIs must be concise*. I think otherwise about hierarchical URIs.

-Sean Ruiz

* http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=76329
Chris Whyles on 21 July, 2010 #link

Hmmm. I’m a big believer in the clickthrough power of a “friendly” URL in the SERPs vs. a messy dynamic/query string URL.

Would it not be best to have a dynamic system but one that re-writes URLs to “show” a friendlier structure. The architecture can be as flat as you like but she’d still look pretty.

Chris Whyles
While doing evil, reluctantly: Size, er trust matters. on 3 November, 2010 #link

[…] shit happen is by no means a dogma. We shouldn’t throw away common sense […]

Sebastian’s Pamphlets