Anyone developing websites that have pretentions to generate long-term traffic, beyond the lifespan of any initial marketing activity, needs to know a little bit about Search Engine Optimisation (SEO) . It’s really not a difficult set of rules to follow; generate relevant content, modify your html structure a bit, cultivate inbound links and get indexed.
Let’s examine the ‘cultivate inbound links’ part of the equation. If you are starting a website from scratch your task is made easier, because you don’t have to worry about the legacy of the old URLs stored in search engines, affiliate sites, link-exchange programs and the like that lead to your content. But what happens when you already have plenty of inbound traffic to a site, your pages are highly ranked on Google, but you want to move to a new blog or CMS? If you just change the URLs, you might lose all the hard earned PageRank that you have accrued over months or years, and have a very unhappy client when their site drops out of their Top 10 targeted searches.
You could try to redevelop the site and use the existing site url structure, but frequently the choice of blog or CMS limits that possibility. Typically you end up creating ‘mod-rewrite’ rules on your web-server that say “if someone wants this URL, show them this nice shiny new URL.’ And this is where we need to know about HTTP Status codes. Those re-write rules also need to tell Google to update its index to point at our nice shiny new URLs, and to do this we send a 301 Permanant Redirect header along with the new URL. This says to Google “that URL that you requested has moved to this new address, please remember that in future.”
What about when someone visits from a really old link that you haven’t managed to setup redirect rules for? In that instance you send a 404 Not Found header and hopefully some useful html to help the visitor find what they were looking for. Cool 404 pages were all the buzz a few months ago, but the critical bit is not the content that the visitors can see, but the 404 header. Why? Because search engines won’t index pages with a 404 header set, which stops your Google search results filling up with unhelpful “Page not found” entries.
And how about the final part of the equation? Get indexed. Imagine a scenairo when you are uploading significant changes to the website and GoogleBot just happens to choose that moment to try to re-index you site. The worst case scenairo would be your search index gets filled up with page after page of broken or mangled content. A much better solution would be to make a site-wide change so that during updating and/or testing, the server sends a 503 Service Unavailable header, so that Google knows to come back in a few minutes once the site is back up and running.
P.S. None of this is rocket science, but actually implementing it can be a royal in the arse. Sniffing out the response codes is the first port of call, and I’m rather fond of HTTP Client on my Mac.