Upgrading from IIS/ASP to Apache/PHP

Upgrade from Windows/IIS/ASP to Unix/Apache/PHPOnce you’re sick of IIS/ASP maladies you want to upgrade your Web site to utilize standardized technologies and reliable OpenSource software. On an Apache Web server with PHP your .asp scripts won’t work, and you can’t run MS-Access “databases” and such stuff under Apache.

Here is my idea of a smoothly migration from IIS/ASP to Apache/PHP. Grab any Unix box from your hoster’s portfolio and start over.

(Recently I got a tiny IIS/ASP site about uses & abuses of link condoms and moved it to an Apache server. I’m well known for brutal IIS rants, but so far I didn’t discuss a way out of such a dilemma, so I thought blogging this move could be a good idea.)

I don’t want to make this piece too complex, so I skip database and code migration strategies. Read Mike Hillyer’s article Migrating from Microsoft Access/MS-SQL to MySQL, and try tools like ASP to PHP. (With my tiny link condom site I overwrote the ASP code with PHP statements in my primitive text editor.)

From an SEO perspective such an upgrade comes with pitfalls:

  • Changing file extensions from .asp to .php is not an option. We want to keep the number of unavoidable redirects as low as possible.
  • Default.asp is usually not configured as a valid default document under Apache, hence requests of http://example.com/ run into 404 errors.
  • Basic server name canonicalization routines (www vs. non-www) from ASP scripts are not convertible.
  • IIS-URIs are not case sensitive, that means that /Default.asp will 404 on Apache when the filename is /default.asp. Usually there are lowercase/uppercase issues with query string variables and values as well.
  • Most probably search engines have URL variants in their indexes, so we want to adapt their URL canonicalization, at least where possible.
  • HTML editors like Microsoft Visual Studio tend to duplicate the HTML code of templated page areas. Instead of editing menus or footers in all scripts we want to encapsulate them.
  • If the navigation makes use of relative links, we need to convert those to absolute URLs.
  • Error handling isn’t convertible. Improper error handling can cause decreasing search engine traffic.

Running /default.asp, /home.asp etc. as PHP scripts

When you upload an .asp file to an Apache Web server, most user agents can’t handle it. Browsers treat them as unknown file types and force downloads instead of rendering them. Next those files aren’t parsed for PHP statements, provided you’ve rewritten the ASP code already.

To tell Apache that .asp files are valid PHP scripts outputting X/HTML, add this code to your server config or your .htaccess file in the root:
AddType text/html .asp
AddHandler application/x-httpd-php .asp

The first line says that .asp files shall be treated as HTML documents, and should force the server to send a Content-Type: text/html HTTP header. The second line tells Apache that it must parse .asp files for PHP code.

Just in case the AddType statement above doesn’t produce a Content-Type: text/html header, here is another way to tell all user agents requesting .asp files from your server that the content type for .asp is text/html. If you’ve mod_headers available, you can accomplish that with this .htaccess code:
<IfModule mod_headers.c>
SetEnvIf Request_URI \.asp is_asp=is_asp
Header set "Content-type" "text/html" env=is_asp
Header set imagetoolbar "no"
</IfModule>

(The imagetoolbar=no header tells IE to behave nicely; you can use this directive in a meta tag too.)
If for some reason mod_headers doesn’t work well with mod_setenvif, giving 500 error codes or so, then you can set the content-type with PHP too. Add this to a PHP script file which is included in all your scripts at the very top:
@header("Content-type: text/html", TRUE);

Instead of “text/html” alone, you can define the character set too: “text/html; charset=UTF-8″

Sanitizing the home page URL by eliminating “default.asp”

Instead of slowing down Apache by defining just another default document name (DirectoryIndex index.html index.shtml index.htm index.php [...] default.asp), we get rid of “/default.asp” with this “/index.php” script:
<?php
@require("default.asp");
?>

Now every request of http://example.com/ executes /index.php which includes /default.asp. This works with subdirectories too.

Just in case someone requests /default.asp directly (search engines keep forgotten links!), we perform a permanent redirect in .htaccess:
Redirect 301 /default.asp http://example.com/
Redirect 301 /Default.asp http://example.com/

Converting the ASP code for server name canonicalization

If you find ASP canonicalization routines like
<%@ Language=VBScript %>
<%
if strcomp(Request.ServerVariables("SERVER_NAME"), "www.example.com", vbCompareText) = 0 then
Response.Clear
Response.Status = "301 Moved Permanently"
strNewUrl = Request.ServerVariables("URL")
if instr(1,strNewUrl, "/default.asp", vbCompareText) > 0 then
strNewUrl = replace(strNewUrl, "/Default.asp", "/")
strNewUrl = replace(strNewUrl, "/default.asp", "/")
end if
if Request.QueryString <> "" then
Response.AddHeader "Location","http://example.com" & strNewUrl & "?" & Request.QueryString
else
Response.AddHeader "Location","http://example.com" & strNewUrl
end if
Response.End
end if
%>

(or the other way round) at the top of all scripts, just select and delete. This .htaccess code works way better, because it takes care of other server name garbage too:
RewriteEngine On
RewriteCond %{HTTP_HOST} !^example\.com [NC]
RewriteRule (.*) http://example.com/$1 [R=301,L]

(you need mod_rewrite, that’s usually enabled with the default configuration of Apache Web servers).

Fixing case issues like /script.asp?id=value vs. /Script.asp?ID=Value

Probably a M$ developer didn’t read more than the scheme and server name chapter of the URL/URI standards, at least I’ve no better explanation for the fact that these clowns made the path and query string segment of URIs case-insensitive. (Ok, I have an idea, but nobody wants to read about M$ world domination plans.)

Just because –contrary to Web standards– M$ finds it funny to serve the same contents on request of /Home.asp as well as /home.ASP, such crap doesn’t fly on the World Wide Web. Search engines –and other Web services which store URLs– treat them as different URLs, and consider everything except one version duplicate content.

Creating hyperlinks in HTML editors by picking the script files from the Windows Explorer can result in HREF values like “/Script.asp”, although the file itself is stored with an all-lowercase name, and the FTP client uploads “/script.asp” to the Web server. There are more ways to fuck up file names with improper use of (leading) uppercase characters. Typos like that are somewhat undetectable with IIS, because the developer surfing the site won’t get 404-Not found responses.

Don’t misunderstand me, you’re free to camel-case file names for improved readability, but then make sure that the file system’s notation matches the URIs in HREF/SRC values. (Of course hyphened file names like “buy-cheap-viagra.asp” top the CamelCased version “BuyCheapViagra.asp” when it comes to search engine rankings, but don’t freak out about keywords in URLs, that’s ranking factor #202 or so.)

Technically spoken, converting all file names, variable names and values as well to all-lowercase is the simplest solution. This way it’s quite easy to 301-redirect all invalid requests to the canonical URLs.

However, each redirect puts search engine traffic at risk. Not all search engines process 301 redirects as they should (MSN Live Search for example doesn’t follow permanent redirects and doesn’t pass the reputation earned by the old URL over to the new URL). So if you’ve good SERP positions for “misspelled” URLs, it might make sense to stick with ugly directory/file names. Check your search engine rankings, perform [site:example.com] search queries on all major engines, and read the SERP referrer reports from the old site’s server stats to identify all URLs you don’t want to redirect. By the way, the link reports in Google’s Webmaster Console and Yahoo’s Site Explorer reveal invalid URLs with (internal as well as external) inbound links too.

Whatever strategy fits your needs best, you’ve to call a script handling invalid URLs from your .htaccess file. You can do that with the ErrorDocument directive:
ErrorDocument 404 /404handler.php

That’s safe with static URLs without parameters and should work with dynamic URIs too. When you –in some cases– deal with query strings and/or virtual URIs, the .htaccess code becomes more complex, but handling virtual paths and query string parameters in the PHP scripts might be easier:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /404handler.php [L]
</IfModule>

In both cases Apache will process /404handler.php if the requested URI is invalid, that is if the path segment (/directory/file.extension) points to a file that doesn’t exist.

And here is the PHP script /404handler.php:
View|hide PHP code. (If you’ve disabled JavaScript you can’t grab the PHP source code!)
(Edit the values in all lines marked with “// change this”.)

This script doesn’t handle case issues with query string variables and values. Query string canonicalization must be developed for each individual site. Also, capturing misspelled URLs with nice search engine rankings should be implemented utilizing a database table when you’ve more than a dozen or so.

Lets see what the /404handler.php script does with requests of non-existing files.

First we test the requested URI for invalid URLs which are nicely ranked at search engines. We don’t care much about duplicate content issues when the engines deliver targeted traffic. Here is an example (which admittedly doesn’t rank for anything but illustrates the functionality): both /sample.asp as well as /Sample.asp deliver the same content, although there’s no /Sample.asp script. Of course a better procedure would be renaming /sample.asp to /Sample.asp, permanently redirecting /sample.asp to /Sample.asp in .htaccess, and changing all internal links accordinly.

Next we lookup the all lowercase version of the requested path. If such a file exists, we perform a permanent redirect to it. Example: /About.asp 301-redirects to /about.asp, which is the file that exists.

Finally, if everything we tried to find a suitable URI for the actual request failed, we send the client a 404 error code and output the error page. Example: /gimme404.asp doesn’t exist, hence /404handler.php responds with a 404-Not Found header and displays /error.asp, but /error.asp directly requested responds with a 200-OK.

You can easily refine the script with other algorithms and mappings to adapt its somewhat primitive functionality to your project’s needs.

Tweaking code for future maintenance

Legacy code comes with repetition, redundancy and duplication caused by developers who love copy+paste respectively copy+paste+modify, or Web design software that generates static files from templates. Even when you’re not willing to do a complete revamp by shoving your contents into a CMS, you must replace the ASP code anyway, what gives you the opportunity to encapsulate all templated page areas.

Say your design tool created a bunch of .asp files which all contain the same sidebars, headers and footers. When you move those files to your new server, create PHP include files from each templated page area, then replace the duplicated HTML code with <?php @include("header.php"); ?>, <?php @include("sidebar.php"); ?>, <?php @include("footer.php"); ?> and so on. Note that when you’ve HTML code in a PHP include file, you must add <?php ?> before the first line of HTML code or contents in included files. Also, leading spaces, empty lines and such which don’t hurt in HTML, can result in errors with PHP statements like header(), because those fail when the server has sent anything to the user agent (even a single space, new line or tab is too much).

It’s a good idea to use PHP scripts that are included at the very top and bottom of all scripts, even when you currently have no idea what to put into those. Trust me and create top.php and bottom.php, then add the calls (<?php @include("top.php"); ?> […] <?php @include("bottom.php"); ?>) to all scripts. Tomorrow you’ll write a generic routine that you must have in all scripts, and you’ll happily do that in top.php. The day after tomorrow you’ll paste the GoogleAnalytics tracking code into bottom.php. With complex sites you need more hooks.

Using absolute URLs on different systems

Another weak point is the use of relative URIs in links, image sources or references to feeds or external scripts. The lame excuse of most developers is that they need to test the site on their local machine, and that doesn’t work with absolute URLs. Crap. Of course it works. The first statement in top.php is
@require($_SERVER["SERVER_NAME"] .".php");

This way you can set the base URL for each environment and your code runs everywhere. For development purposes on a subdomain you’ve a “dev.example.com.php” include file, on the production system example.com the file name resolves to “www.example.com.php”:
<?php
$baseUrl = “http://example.com”;
?>

Then the menu in sidebar.php looks like:
<?php
$classVMenu = "vmenu";
print "
<img src=\"$baseUrl/vmenuheader.png\" width=\"128\" height=\"16\" alt=\"MENU\" />
<ul>
<li><a class=\"$classVMenu\" href=\"$baseUrl/\">Home</a></li>
<li><a class=\"$classVMenu\" href=\"$baseUrl/contact.asp\">Contact</a></li>
<li><a class=\"$classVMenu\" href=\"$baseUrl/sitemap.asp\">Sitemap</a></li>

</ul>
";
?>

Mixing X/HTML with server sided scripting languages is fault-prone and makes maintenance a nightmare. Don’t make the same mistake as WordPress. Avoid crap like that:
<li><a class="<?php print $classVMenu; ?>" href="<?php print $baseUrl; ?>/contact.asp"></a></li>

Error handling

I refuse to discuss IIS error handling. On Apache servers you simply put ErrorDocument directives in your root’s .htaccess file:
ErrorDocument 401 /get-the-fuck-outta-here.asp
ErrorDocument 403 /get-the-fudge-outta-here.asp
ErrorDocument 404 /404handler.php
ErrorDocument 410 /410-gone-forever.asp
ErrorDocument 503 /410-down-for-maintenance.asp
# …
Options -Indexes

Then create neat pages for each HTTP response code which explain the error to the visitor and offer alternatives. Of course you can handle all response codes with one single script:
ErrorDocument 401 /error.php?errno=401
ErrorDocument 403 /error.php?errno=403
ErrorDocument 404 /404handler.php
ErrorDocument 410 /error.php?errno=410
ErrorDocument 503 /error.php?errno=503
# …
Options -Indexes

Note that relative URLs in pages or scripts called by ErrorDocument directives don’t work. Don’t use absolute URLs in ErrorDocument directives itself, because this way you get 302 response codes for 404 errors and crap like that. If you cover the 401 response code with a fully qualified URL, your server will explode. (Ok, it will just hang but that’s bad enough.) For more information please read my pamphlet Why error handling is important.

Last but not least create a robots.txt file in the root. If you’ve nothing to hide from search engine crawlers, this one will suffice:
User-agent: *
Disallow:
Allow: /

I’m aware that this tiny guide can’t cover everything. It should give you an idea of the pitfalls and possible solutions. If you’re somewhat code-savvy my code snippets will get you started, but hire an expert when you plan to migrate a large site. And don’t view the source code of link-condom.com pages where I didn’t implement all tips from this tutorial. ;)



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

22 Comments to "Upgrading from IIS/ASP to Apache/PHP"

  1. […] Upgrading from IIS/ASP to Apache/PHP By Sebastian Read Mike Hillyer’s article Migrating from Microsoft Access/MS-SQL to MySQL, and try tools like ASP to PHP. (With my tiny link condom site I overwrote the ASP code with PHP statements in my primitive text editor.) … Sebastian’s Pamphlets - http://sebastians-pamphlets.com […]

  2. JLH on 11 December, 2007  #link

    Nice work. Again. I’ll be using some of this one of these days, but I’ve got to convert the whole shopping system first, which I’ve just about have done.

  3. g1smd on 11 December, 2007  #link

    Important warning for the ErrorDocument directives.

    Do not put the domain name in those, only put the server path and filename, otherwise those error conditions will not return the correct HTTP status codes.

    The above examples are correct. This was just a reminder.

  4. Hamlet Batista on 11 December, 2007  #link

    As usual, excellent post! You just reminded me of a large scale migration I did from Windows to Linux several years ago. It was not from asp to php, but from a proprietary Java servlet container to Apache Tomcat. The case sensitivity issue was the biggest problem. Updating the source files, database contents, etc. was a real pain. I am glad I haven’t touch Windows servers again in many years!

  5. Sebastian on 12 December, 2007  #link

    Thanks guys :)
    @John Did you try a tool like ASP2PHP?

    @g1smd Good point! I’ve bolded my warning in the post.

    @Hamlet Yup, don’t touch them, not even with a pitchfork, except when you help someone to escape. ;)

  6. Marc on 12 December, 2007  #link

    A lot of these rules apply to every kind of migration under the sun. The 404 handler is amazing! I’ve been looking for a method for this for a long time.

    When you send the, “301 moved permanently” header does that tell search engines to update their crawlers and database accordingly? What about the incoming links that are pointing to bad URLs - can you send all of those links to the same URL and up that page’s registered incoming links and search engine rankings?

  7. Sebastian on 12 December, 2007  #link

    Marc, 301 redirects tell most search engines to update their indexes, and they’re supposed to transfer SE brownie points from the redirecting URLs to the redirect’s destination. From the major engines only MSN can’t handle them properly. For more information on 301 response codes and their handling by search engines try my tiny book on HTTP redirects. Also, most user agents understand those redirects, so that visitors coming from outdated links land on the right pages. I wouldn’t redirect a bunch of unrelated pages to one URL, better do that page by page and try to find a good match for each outdated URL.

  8. Kimber on 12 December, 2007  #link

    wow! you make my head spin with these “over my head” posts! i did however find a gem in here that i think will fix a canonicalization problem i’ve been having on IIS. The ASP canonicalization routines example you provided may be exactly what i’ve been looking for to fix a redirect loop with default.asp. thanks!

  9. Sebastian on 12 December, 2007  #link

    Thanks Kimber, I’ve stolen the ASP code, so I’ll pass your comment to the author.

  10. JLH on 12 December, 2007  #link

    Kimber, That won’t fix the default.asp loop, it only makes sure that if it’s a home page redirect it doesn’t send it to /default.asp but rather “/” when correcting the www/non-www thing. If you have the canonical URL with default.asp it will still show up with the /default.asp. It’s tough to do with out double redirects as the Request.ServerVariables(”URL”) always returns default.asp on the home page.

  11. CVOS man on 13 December, 2007  #link

    you can leave a .asp extension on pages moved to a linux server but its better to bite the bullet and change the extension to .php or .html prevent future confusion.

  12. Sebastian on 13 December, 2007  #link

    Kimber, John beats me, respectively the author of the ASP code. Here is what he contributed:

    The only problem is that it doesn’t work :-(

    This:

    if instr(1,strNewUrl, "/default.asp", vbCompareText) > 0 then
    strNewUrl = replace(strNewUrl, "/Default.asp", "/")
    strNewUrl = replace(strNewUrl, "/default.asp", "/")
    end if

    does absolutely nothing. IIS always acts like it has “/default.asp” on the end of the URL if the script is called that and acts as the default handler of the subdirectory. You can’t 301 from /default.asp to / on IIS without doing really crazy tricks (eg set up xzy.asp as the default handler and keep default.asp as just another script; never link xyz.asp of course).

    There are only 2 things that you can do to get it close:
    1. use a javascript redirect to make sure people never see the other URL if they try it on accident
    2. don’t link to it ;-)

    Better don’t run a website under Windows.

  13. Sebastian on 13 December, 2007  #link

    CVOS Man, I disagree. When search engines have indexed lots of .asp URLs it’s better to stick with this extension. Redirecting all pages comes with traffic losses and although 301 redirects are supposed to transfer gained link juice, that’s not guaranteed, hence avoid redirects where you can. If all files of a site have an .asp extension, why is this confusing? I’d even go so far and stick with .asp for new pages.

  14. Kimber on 13 December, 2007  #link

    thanks, john and sebastian. sorry if i got off topic there. i appreciate your feedback. i’ve been going crazy trying to figure how to fix this but it looks like i’m out of luck. thanks for letting me know i wasn’t actually crazy for not being able to get it to work. :)

  15. JLH on 13 December, 2007  #link

    I have to agree with sebastian. The handling of 301s is flaky at best. If its a large site I wouldn’t switch the extensions all at once if you really need to, do it slowly, let the engines catch up, then do another batch. Changing the whole site from .asp to .php all at once could leave you with nothing in the indexes for a while (except those old supplemental pages that won’t be crawled for eternity).

    In my preparation of converting an ecommerce store with about 40,000 products, I’ve added some logic to let me switch the products over to php by category, one at a time. , starting with the least popular first to see how it works.

    This is on my to-do list when everything else is done, so I won’t be giving any updates too soon. :)

  16. Igor The Troll on 19 December, 2007  #link

    Yeak, start over that got to hurt!
    I started on Apache 7 years ago, doing Perl, and never looked back.
    Now part of the Apache developers team…

  17. […] One such example can be found on Sebastian’s Pamphlets blog where he discusses how to change your IIS/ASP pages to Apache/PHP. […]

  18. Yyrkoon on 2 February, 2009  #link

    Why would i leave a working ground to start learning a new ground?

    Maybe i want to know 2 different worlds. But know this.. these are 2 worlds who have the same sickness.. none are better then the other. Instead of making peace they still war each other with arguments like “my penis is longer then yours”.

    When someone actually works for a free world where i, the developer can create my own platform to work i will gladly come back and try out different languages and servers. I am not happy with using a linux platform with php and mysql just coz i cant use those functions satisfying on my windows platform where i have ASP and IIS.

    When its possible to use PHP, ASP and MySQL on 1 single platform without slizing my wrists everytime there´s a patch/upgrade i will celebrate.

    But until then dont come and brag about your penis coz you´re still impotent enough to try to change things.

  19. Sebastian on 27 April, 2009  #link

    Yyrkoon, since you’ve decided to remain an anon guy and I can’t check your properties, here’s my general reply: I’ve never met a $MS developer with a big dick. Hence there’s no need for willy whacking. $MS stuff comes with way too many pitfalls, so savvy Web developers wouldn’t touch it with a barge pole.

  20. Guide Media SEO on 7 August, 2009  #link

    I’m trying to move a website from ASP/windows to PHP/apache, different hosting, the works (I know, a pain). My problem is that I have about a dozen or so dynamic URL strings that I would like to keep since they have a pretty good amount of links pointing into them. Here is one of the dynamic asp strings /linkcreate.asp?catid=10019 and I’d like to have it redirected to /whatever.html….Is there a way to do this or am I out of luck? Any help would be appreciated. Thanks all.

  21. Guide Media SEO on 7 August, 2009  #link

    Found the answer here…. Help 301 Redirect old asp site to PHP ….Seems to work

    Here is the code in the .htaccess file

    RewriteCond %{QUERY_STRING} &?catid=10019&?
    RewriteRule ^linkcreate\.asp$ http://www.example.com/new-page.html? [R=301,L]

    Hope this helps someone. I’ve been working on this for 4 hours now. Yessss. Going to bed.

  22. g1smd on 22 February, 2011  #link

    This is a great article but there’s one “gotcha” still lurking.

    Directives in the .htaccess file are not processed in the order they are listed, but instead are processed on a per-module basis, one module at a time.

    The module loading order is decided elsewhere in the Apache configuration.

    Since you cannot guarantee the order that Apache modules will be processed, do not mix Redirect and RewriteRule directives in the same site configuration.

    If a Redirect directive should happen to be processed after a RewriteRule directive, then those redirects will expose previously rewritten internal file paths back out on to the web as URLs.

    It is best to use RewriteRule for all of the rules and ensure that:
    - all external redirects are listed in order of precedence, from most selective to least selective,
    - all internal rewrites are listed in order of precedence, from most selective to least selective,
    - all external redirects are listed before any of the internal rewrites.

Leave a reply


[If you don't do the math, or the answer is wrong, you'd better have saved your comment before hitting submit. Here is why.]

Be nice and feel free to link out when a link adds value to your comment. More in my comment policy.