Article
Introduction to Affordable, Scalable Websites
One morning you wake up, grab some coffee, and check the how many people visited your site yesterday. (C’mon, admit it! I know I do it every day!) You look at the numbers and they just don’t look right. Maybe your coffee isn’t strong enough -- there has to be a decimal point in there somewhere! They’re all too big! You received ten thousand new visitors yesterday …and it’s climbing rapidly. You’re finally making it big!
Next, you check your email, and find a note from your ISP telling you that you need to either upgrade to a more expensive plan, or pay a fortune for bandwidth. Oh no! What to do? Well, if you designed your site properly, you can smile with the knowledge that you have everything under control.
You’ve reached the point where you need multiple servers. It’s time to figure out how to keep your site up and running without breaking the bank. It’s important to note that this article is also relevant to you even if you don’t have a dedicated server -- it’s often beneficial to have several smaller accounts than one large one.
This article will first discuss the example of a budget scalable Website, then briefly talk about the proper way to design a site of your own. Unfortunately, this method can be prohibitively expensive for small Webmasters, but it is important to know. Next, we’ll discuss how you can design a Website so that it will be easy to scale in the future, and won’t force you to radically revamp your current way of working. Finally, I’ll talk about how to convert an existing Website into a scalable site, and we’ll explore some of the technical hurdles that need to be overcome.
An Introduction to Scalable Servers
Why does one need to specifically design a site to be scalable? Well, let me give you an example.
I’m sure you’ve heard of the ever-popular “Am I Hot or Not” Website? Two college students had some free time and one day they came up with a great idea. They thought, wouldn’t it be cool to make a site where visitors can upload their own picture and have other people vote if they’re hot or not? It’s an addictive combination and it’s no surprise that virtually overnight, the site went from receiving no hits to several million an hour. They were suddenly faced with what had become the challenge to keep their site running.
They quickly made a key observation: every page view is independent of every other. So, if you’re viewing one page of the site, it makes no difference from which server the next page is served. This is because there are only two parts to the Website: view pages, and recording votes. This situation greatly simplified matters for these particular Webmasters. They simply used one main SQL database, and had other servers calling it for each page view. This way, they could distribute the load without having to significantly change the way their program worked.
So they upgraded to seven servers in seven days and, not surprisingly, their database server started choking. They noticed that almost every page simply makes a SELECT query from a database, and each vote was written right back to the database. However, they decided there was no real reason for votes to be recorded right away—after all, who would notice if the number of votes displayed was a few hours old? So, at first they simply cached the vote data and updated it periodically.
Then, after a few more days, their server started choking again, and they upgraded to running multiple database servers. From that point, they were able to scale to any size they needed, eventually reaching more than 17 servers! Now let’s see how you can do the same, on an even smaller budget.
The Ideal Scalable Website Architecture
Building a scalable Website can be a very tricky proposition. It involves a lot of redundancy, load balancing, multiple Webservers, a separate database server, and backup servers. For example, in the figure below, all connections to the Internet first go through a load balancer.
A load balancer will distribute traffic according to a procedure that you, as Webmaster, set up. For example, it may send most of the traffic to the server with the lowest load average. This helps keep response times nice and low, even if your site is linked to from CNN.com. Multiple servers are used to help keep the system fault-tolerant. If one server crashes, the traffic can be diverted to the other servers. The same applies to the database—there is a replicated backup in case the main database goes offline.

As you’ve probably guessed by now, such a setup can be very expensive. Nonetheless, I urge you to look into such a system if you have the time, money, and ambition. The exact details of a complete architecture are beyond the scope of this article, but there are numerous books on this topic.
Fortunately for the rest of us, there’s a higher-risk, but cheaper and easier method.
Budget Scalable Architecture
For those of us who can barely afford a single server, much less seven, there is an incremental approach you can follow. Simply design your site so it can run on one or more servers. If you’re doing well, why not rent a new server, set up your software on this server, and tell your old servers about the new one. While this is not as easy as it sounds, it’s quite doable. Simply keep a few key ideas in mind when you design your site, and should it ever dramatically grow in popularity, you won’t even break a sweat (unless it’s from a victory dance).
As we’re on a budget, there are a few constraints to keep in mind. We can’t afford to run servers from our office -- instead we’re going to use the many inexpensive dedicated Web hosts out there. We want to be able to add or remove servers at will, depending on traffic, and we want to minimize bandwidth usage to save costs.
Unfortunately, most budget dedicated Internet hosts are not very helpful when it comes to setting up the kind of tiered server discussed above -- they will charge you a lot of money to connect several servers together. So, we want to avoid having a separate database server if at all possible. Please note that if only occasional lookups or writes are made to the database, it’s perfectly reasonable to send the requests over the Internet—this scenario won’t use up a lot of bandwidth.
Master and Slave Sites
First off, there are many different ways to design a scalable site. The easiest, though not necessarily the best, is simply to create a master site, and several slave sites: one main site from which all the others are replicated. Everything is copied—the database, HTML pages, everything (though you may want to use one database and have all the servers link to it in order to maintain consistency).
For example, on sitepoint.com it does not matter from which server an article is served. You may view page one of this article from a server in Sydney, and the second from a server in Detroit, and you’d never know the difference—nor does it particularly matter. However, they have a nifty little voting mechanism at the bottom of the article (you want to press the button below the number ten on this article to see what happens!). Clicking this button could hit the master server instead of a slave and update the database appropriately. Then, if you use multiple databases, once a day (or instantly if you use database replication) the data will be copied to each of the slave servers.
The easiest way to send users to a random server is to use a round robin DNS server. A round robin setup would send each request to a random server. However, because each request goes to a different server, it is hard to track session data (such as the user’s login details). To counter this, you can use cookies or hidden HTML fields.
One Server Per Visit
A variation of this method is to use multiple servers, but have the user view pages served from the same server during their entire visit. You could have www.mysite.com, www2.mysite.com, www3.mysite.com, and so on. When a user visits your site, they are randomly sent to one server (or, if you store the user’s data on a particular server, you’d always send them to that particular server).
This is very simple, and requires virtually no changes to your site. It works well with persistent data and is easy to understand and implement. But it has a very bad Achilles heel: what if everyone starts linking to a page on the www3 server? Or what if www3 crashes? It could be floored or down, leaving the other servers sitting idle. Then again, this method is extraordinarily simple, very effective, and works well on a budget.
Serve Content By Type
Another method is to serve one type of content from one server, and a different type from another server. A good example of this might be to put graphics on one server, and dynamic content on another. This allows you to use a slow server and a high bandwidth provider for the graphics, and a higher quality but more expensive server for the content. This works well when you have outgrown the bandwidth allocation of your regular Web hosting account. The hitch? If one server goes down, your entire Website would be, for all practical matters, dead in the water.
Caching
Caching is also an excellent way to scale Websites. Instead of having the user directly view your main site, they view a cached copy on a different server. In this situation, instead of serving dynamic data, you’re serving static data. One or more main servers generate all the pages and push them out to the other server/s, from which users view your site (for a more automatic and advanced method of doing this, look into using the excellent open-source Squid caching program).
This solution requires only one or two expensive servers, along with several inexpensive auxiliary servers. This technique is simple and effective, though it’s not suitable for sites that are customized for each user.
Owen currently runs a small Web design company,