Welcome to part three of our guide about how to migrate your website! This guide is a five part series describing some high level techniques that are common to website and server migrations. Don't forget to check out part two of this guide for information website log monitoring using Linux.
This is part three of a five part guide concerning website and server migration. In this part, we will be discussing the creation of rewrite rules and tracking changed URLs between the old site and the new site.
Migrating from one CMS to the other has become easier as many of these applications are starting to follow the same idea with friendly urls and database driven architecture. But if you are migrating from an older CMS which relied on URL parameters in the query string or differences in the friendly url parameters between CMSs you will need to make sure those urls are being routed to the correct new pages. This same idea applies if your new website site has a different menu structure or filenames have changed. I will go over how to make changes in the rewrite rules for query string matching and full filename matching.
There are a couple ways to go about this, and using the combination of the two is probably a good plan regardless. If your site has been on the web for more then a week you probably have existing information is web search engine caches, or links from other websites. These links that lead to 404 pages will hurt your page rank and other factors of SEO, customer loyalty and retention.
We have directions for the two main Search Engines, Google and Bing.
Login to your Google Webmaster Tools Account.
If you do not have a Google Webmasters Tools Account, sign up for one now or skip to the next option.
Select Your site from the Site listings
Click "Your Site on the Web" Then Click "Internal Links". This screen shows you a list of links that are used internally in your website and also are found in the Google Search Engine.
Go to Google and using the Advanced Search we are going to search for all links related to your website.
site:atws.ca
Login to your Bing Webmaster Tools Account. If you do not have one sign up for one now or skip to the next option.
Select Your site from the Site listings
Click on the "Index" tab Then Click "Index Explorer" This screen shows you a list of links that are used internally in your website and also are found in the Bing Search Engine.
Go to http://www.bing.com. Using the advanced search keywords, we are going to search for all links related to your website.
site:atws.ca -index
Due to caching on the internet, search engine caching, transparent proxies, proxy caching servers and a host of other related technologies, there may be servers containing old information and we need to make sure that we are redirecting people to the proper new pages on our website even if they go to the old URL. There are far more reasons why this is important (SEO, etc) but we are not going to get in to that on this posting.
There are two places you can put Rewrite Rules in Apache:
We are going to work with our rules in the virtual host entries for the purpose of this guide.
First we need to make sure the Rewrite Engine is Loaded in Apache
sudo a2enmod rewrite sudo apache2ctl graceful
This will load the rewrite rule engine in to Apache, then we need to reload Apache to load the module in the configuration.
The good thing is that Apache will only see what is sent to it, so we do not have to guess what could be happening in the background. Also, keep in mind that these rewrite rules could be used in conjunction, or you may have to come up with your own combination to work with your previous setup.
This rewrite rule is probably one of the easiest to start and to work with because Apache sees only what is being passed so we can match against the file name completely to get exact matches.
In this setup, we have these URLs to match against:
We want it to go to: http://www.atws.ca/
Creating the Rule
Turn On Rewrite Engine if it has not already been started in this Virtual Host Entry.
This needs to always be above all of the rules.
RewriteEngine On
Match against the base URL to make sure we are getting Exactly the URL we want to rewrite.
RewriteRule ^/(index|home)\.(htm|html)$ /? [NC,R=301,L]
The Results:
RewriteEngine On RewriteRule ^/(index|home)\.(htm|html)?$ /? [NC,R=301,L]
Using the () grouping with the pipe (|) allows us to match against a bunch of different values that could be going to the same destinations. This way, you don't need to make up a separate Rewrite rule for each filename. Keep in mind that this would make the resulting Rule hard to read and understand.
Rewrite Rules for Friendly URLs based on Directories
Matching against directories is very similar to the file based matching but without the filename. This is typically a result of an existing rewrite rule creating friendly URLs. This is very similar to the file matching.
In this setup, we have the URLs:
With our new CMS, we want it to go to: http://www.atws.ca/services
Creating the Rule
Turn On Rewrite Engine if it has not already been started in this Virtual Host Entry.
This needs to always be above all of the rules.
RewriteEngine On
Match against the base url to make sure we are getting Exactly the URL we want to rewrite.
RewriteRule ^/(old-services|new-services)(/)?$ /services? [NC,R=301,L]
The Results
RewriteEngine On RewriteRule ^/(old-services|new-services)(/)?$ /services? [NC,R=301,L]
This rewrite rules are typically used in conjunction with the flat file matching but contain some additional conditions. With URL query string parameters, you need to request them from the Apache environmental variables as they are not part of the base URL that is available to the rewrite rules.
On our old setup, we have the URLs:
With our new CMS, we want it to go to: http://www.atws.ca/services
Creating the Rule
Turn On Rewrite Engine if it has not already been started in this Virtual Host Entry.
This needs to always be above all of the rules.
RewriteEngine On
Create the QUERY_STRING Condition for the Rule.
RewriteCond %{QUERY_STRING} ^p=(services|old-services)$Match against the base URL to make sure we are getting exactly the URL we want to rewrite.
RewriteRule ^/(index\.php)?$ /services? [R=301,L]
The Results
RewriteEngine On
RewriteCond %{QUERY_STRING} ^p=(services|old-services)$
RewriteRule ^/(index\.php)?$ /services? [R=301,L]Fire up your web browser and test out all of the rewrite rules you just created. Watch the logs for errors; however, most of them will be very apparent on the web browser--it is not going to where you want it to go.
To follow up on the progress the search engines are making on re-indexing your changes in to their indexes we can use the webmaster tools.
Using Google's Webmaster Tools to Check for Crawl Errors
Diagnostics > Crawl errors
Using Bing's Webmaster Tools to Check for Crawl Errors
Crawl > Crawl Details > Click on the HTTP Code you want to view.
If you find any broken URL's go back to your rewrite rules and add or updates your rules.
Next Week on Site Migration ... updating the big "3" ...