URL Canonicalization

December 9, 2008

URLs are used by search engines to delineate pages, which can be a problem because multiple URLs could return the same page. For example, www.example.com/directory, www.example.com/directory/, and www.example.com/directory/index.html could return the same page from your website. Search engines would see each of those pages as being a separate page. Search engines attempt to identify URLs that are the same page and pick the best one to use in search results, a process known as canonicalization. If search engines are unable determine that the pages are the same, the pagerank that the page receives is divided between the different URL variants. By making sure that you use the same format consistently across your website, you lessen the risk of this problem occurring.

www vs. non-www

Most websites are configured so that the website can be viewed with or without the www before the domain name in the URL (www.example.com vs. example.com). Because it is possible for the content to be different on each of those versions, search engines treat those as being separate entities. To insure that this does not cause problems for your website it is best to chose one to use it consistently and redirect the other version. There is no advantage to using either version, so it comes down to your personal preference.

In apache if you wish to redirect request from without the www to the www, place the following code in your .htaccess (replace example with your domain name):

RewriteCond %{HTTP_HOST} ^www/.example.com\.com$
RewriteRule ^.*$ http://example.com%{REQUEST_URI} [R=301,L]

to do the opposite, place the following code in your .htaccess (replace example with your domain name):

Rewritecond %{http_host} ^www\.example\.com
RewriteRule (.*) http://example.com [R=301,L]