.htaccess

Archived posts from the '.htaccess' Category

Please don’t run your counter on my servers

Posted on 18 April, 2007

DO NOT HOTLINK I deeply understand that sharing other peoples resources makes sense sometimes. I just ask you to rethink your technical approach. Running your page view stats on my server comes with a serious disadvantage: my server logs and referrer reports are protected, hence you can’t read your stats. Rest assured I’m really not eager to know who views your pages.

So please: when you copy my HTML code, be so kind and steal the invisible 1×1px images too. It’s really not that hard to upload them to your server and edit my HTML in a way that your visitors’ user agents request these images from your server.

Signing up at a free counter service not adding hidden links to all your pages gives less hassles than my reaction when I get annoyed.

Disclaimer: I don’t like it when you steal my code coz for some reasons it’s often crappy enough to break your layout. Also copying code without permission is as bad as content theft. So don’t copy, but feel free to ask.

Go to HTML Basix to figure out how you can block hotlinking with .htaccess:
RewriteEngine on RewriteCond %{HTTP_REFERER} !^http://(www\.)?sebastianx.blogspot.com(/)?.*$ [NC] RewriteRule .*\.(gif|jpg|jpeg|bmp|png)$ http://www.smart-it-consulting.com/img/misc/do-not-hotlink-beauty.jpg [R,NC]
But please don’t steal or hotlink the offensive blonde beauty

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

8 comments Sebastian | Copyrights, Copy+Paste-Penalties, Plagiarism, Hotlinking, .htaccess

Why proper error handling is important

Posted on 28 February, 2007

Misconfigured servers can prevent search engines from crawling and indexing. I admit that’s news of yesterday. However, standard setups and code copied from low quality resources are underestimated -but very popular- points of failure. According to Google a missing robots.txt file in combination with amateurish error handling can result in invisibility on Google’s SERPs. That’s a very common setup by the way.

Googler Jonathon Simon said:

This way [correct setup] when the Google crawler or other search engine checks for a robots.txt file, they get a 200 response if the file is found and a 404 response if it is not found. If they get a 200 response for both cases then it is ambiguous if your site has blocked search engines or not, reducing the likelihood your site will be fully crawled and indexed.

That’s a very carefully written warning, so I try to rephrase the message between the lines:

If you have no robots.txt and your server responds “Ok” (or 302 on a request of robots.txt followed by a 200 response on request of the error page) when Googlebot tries to fetch it, Googlebot might not be willing to crawl your stuff further, hence your pages will not make it in Google’s search index.

If you don’t suffer from IIS (Windows hosting is a horrible nightmare coming with more pitfalls than countable objects in the universe: go find a reliable host) here is a bullet-proof setup.

If you don’t have a robots.txt file yet, create one and upload it today:

User-agent: * Disallow:

This tells crawlers that your whole domain is spiderable. If you want to exclude particular pages, file-types or areas of your site, refer to the robots.txt manual.

Next look at the .htaccess file in your server’s Web root directory. If your FTP client doesn’t show it, add “-a” to “external mask” in the settings and reconnect. If you find complete URLs in lines starting with “ErrorDocument”, your error handling is screwed up. What happens is that your server does a soft redirect to the given URL, which probably responds with “200-Ok”, and the actual error code gets lost in cyberspace. Sending 401 errors to absolute URLs will slow your server down to the performance of a single IBM-XT hosting Google.com, all other error directives pointing to absolute URLs result in crap. Here is a well formed .htaccess sample:

ErrorDocument 401 /get-the-fuck-outta-here.html ErrorDocument 403 /get-the-fudge-outta-here.html ErrorDocument 404 /404-not-found.html ErrorDocument 410 /410-gone-forever.html Options -Indexes <Files “.ht*”> deny from all </Files> RewriteEngine On RewriteCond %{HTTP_HOST} !^www\.canonical-server-name\.com [NC] RewriteRule (.*) http://www.canonical-server-name.com/$1 [R=301,L]

With “ErrorDocument” directives you can capture other clumsiness as well, for example 500 errors with /server-too-buzzy.html or so. Or make the error handling comfortable using /error.php?errno=[insert err#]. In any case avoid relative URLs (src attribute in IMG elements, CSS/feed links, href attributes of A elements …) on all landing pages. You can test actual HTTP response codes with online header checkers.

The other statements above do different neat things. Options -Indexes disallows directory browsing, the next block makes sure that nobody can read your server directives, and the last three lines redirect invalid server names to your canonical server address.

.htaccess is a plain ASCII file, it can get screwed when you upload it in binary mode or when you change it with a word processor. Best edit it with an ASCII/ANSI editor (vi, notepad) as htaccess.txt on your local machine (most FTP clients choose ASCII mode for text files) and rename it to “.htaccess” on the server. Keep in mind that file names are case sensitive.

Tags: Search Engine Optimization (SEO) robots.txt .htaccess Error handling Web development

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

14 comments Sebastian | .htaccess, robots.txt, SEO

« Previous Page 1 | 2