Article

Cost-Effective Website Acceleration

Page: 1 2 3 4 5 6 Next

Taking Charge of Caching

There are three methods we can use to set cache control rules for the Web:

  1. Specify cache control headers via a <meta> tag

  2. Set HTTP headers programmatically

  3. Set HTTP headers through Web server settings

Each of these approaches has its pros and cons.

<meta> Tags for Basic Caching

The simplest way to implement cache control is to use the <meta> tag. For example, we could set the Expires header to a date in the future:

<meta http-equiv="Expires" content="Sun, 31 Oct 2004 23:59:00 GMT" />

In this case, a browser parsing this HTML will assume that this page does not expire until October 2004 and will add it to its cache. Because the page will be stamped with this Expires header, the browser won't re-request the page until after this date or until the user modifies the browser's caching preferences or clears the cache manually.

Of course, while it is often advantageous to cache page data, as we mentioned above, there are instances in which you would not want to cache data at all. In these cases, you might set the Expires value to a date in the past:

<meta http-equiv="Expires" content="Sat, 13 Dec 2003 12:59:00 GMT" />

You might be concerned about clock variations on the user's system and therefore choose a date that has long since passed, but in reality, this is rarely an issue since it is the server's Date response header response that matters for cache control.

The use of Expires with a past date in a <meta> tag should work for both HTTP 1.0- and 1.1-compliant browsers. There are two more <meta> tags that are often used to make sure that a page is not cached. The Pragma tag is used to talk to HTTP 1.0 browser caches, while the Cache-Control tag is used for HTTP 1.1 clients. It never hurts to include both of these if you want to make sure that a page is never cached, regardless of browser type or version:

<meta http-equiv="Pragma" content="no-cache" />

<meta http-equiv="Cache-Control" content="no-cache" />

As easy as <meta> tags might appear, they suffer from one major problem -- they are not able to be read by intermediary proxy caches, which generally do not parse HTML data, but instead rely directly on HTTP headers to control caching policy. Because of this lost potential value, and given the fact that browsers will readily use HTTP headers, <meta>-based cached control really should not be a developer's primary approach to cache control.

Programming Cache Control

Most server-side programming environments, such as PHP, ASP, and ColdFusion, allow you to add or modify the HTTP headers that accompany a particular response. To do this in ASP, for example, you would use properties of the built-in Response object by including code such as this at the top of your page:

<%    
Response.Expires = "1440"    
Response.CacheControl = "max-age=86400,private"    
%>

Here, you're asking ASP to create both an Expires header (for HTTP 1.0-compliant caches) and a Cache-Control header (for HTTP 1.1 caches). You are also specifying a freshness lifetime for this cached object of twenty-four hours (note that the Expires property requires a value in minutes while CacheControl uses seconds). As a result, the following headers would be added to the HTTP response (assuming "now" is 8:46 PM on Friday, February 13th, 2004 Greenwich Mean Time):

Expires: Sat, 14 Feb 2004 20:46:04 GMT    
Cache-control: max-age=86400,private

By and large, it's much more efficient to rely on these kinds of server-side mechanisms, rather than <meta> tags, to communicate with caches. So, when you have a choice between implementing cache-control policies using the <Meta> element, and doing so using a server-side programming environment like ASP, always choose the latter.

However, there is a different issue about which the server-side programming environment can do nothing. Imagine that the ASP file for which we created the above code links to several images that, according to your cache control policies, have freshness lifetimes of a full year. How would you implement the HTTP headers to tell caches that they can store those images for that long? You could try to use a server-side script to return the images programmatically, but this is both complex and wasteful. A better approach to setting cache control information for static externals like CSS, JavaScript, and binary objects like images, is to set cache control information on the server itself.

Web Server-Based Cache Settings

Both Microsoft IIS and Apache provide a variety of facilities for cache control. Unfortunately, each has a different approach to caching, and delegation of cache control policies is not cleanly in the hands of those who are most familiar with a site's resources -- developers!

Apache Cache Control

When it comes to implementing cache control policies easily, users of the Apache Web server are somewhat better off than those running IIS, provided that the Apache module http://httpd.apache.org/docs-2.0/mod/mod_expires.html mod_expires is installed.

With mod_expires, a server administrator can set expiration lifetimes for the different objects on a site in the main server configuration file (usually httpd.conf). As is often the case for Apache modules, both the Virtual Host and Directory containers can be used to specify different directives for different sites, or even for different directories within a given site. This is much more convenient than having to use IIS's graphical user interface or metabase scripting objects.

Even handier is mod_expires's ExpiresByType directive, which allows you to set the expiration lifetime for all files of a given MIME type with a single line of code. This directive allows you to easily set cache policies for all the scripts, style sheets, or images in your site. Of course, you often need to craft more fine-grained policies based upon object type or directory. In this case, the settings specified in the primary configuration file can be overridden (at the server administrator's discretion) by directives in a .htaccess file for a given directory and its children. In this way, developers can write and maintain their own cache control directives without requiring administrative access to the server, even in a shared hosting environment.

If your Apache server was not built with mod_expires, the best way to enable it is to build it as a shared object (using aspx is generally easiest) and then to include the following line in your httpd.conf file:

LoadModule expires_module modules/mod_expires.so

You can then put your configuration directives right into httpd.conf. However, many administrators will want to locate these in an external configuration file to keep things neat. We will follow this practice in our example, using Apache's Include directive in httpd.conf (the IfModule container is an optional, but traditional, safety measure):

<IfModule mod_expires.c>    
    Include conf/expires.conf    
</IfModule>

We can now locate the directives that control how mod_expires behaves in a module-specific configuration file called expires.conf. Here is a sample set of such directives:

ExpiresActive On    
ExpiresDefault "access 1 month"    
ExpiresByType image/png "access 3 months"

The ExpiresActive directive simply enables mod_expires, while the ExpiresDefault directive sets up a default expiration lifetime that will be used to create Expires and Cache-Control headers for any files to which more specific rules do not apply. Note the syntax used for specifying the expiration lifetime; the time unit can be anything from seconds to years, while the base time can be specified as modification, as well as access.

Next is the very useful ExpiresByType directive mentioned earlier, here applied to all .png image files on the server:

<Directory "/usr/local/apache/htdocs/static">    
AllowOverride Indexes    
ExpiresDefault "access 6 months"    
</Directory>

Finally, we have a Directory container that overrides all other rules for anything in the directory /static. This directory has its own ExpiresDefault directive, as well as an AllowOverride directive that allows the settings for itself and its children to be overridden by means of .htaccess files. The .htaccess file, in turn, might look like this:

ExpiresByType text/html "access 1 week"

Note that this overrides any directives that would otherwise have applied to files in /static and its children that have the MIME type text/html. Using a combination of configuration file directives, and then overriding those directives using .htaccess, almost any set of cache control policies, no matter how complex, can be easily implemented by both administrators and, if properly delegated, developers.

IIS Cache Control

If you're setting cache control rules on Microsoft's Internet Information Service (IIS), you need access to the IIS Metabase. This is typically accessed via the Internet Service Manager (ISM), the Microsoft management console application that controls IIS administrative settings.

To set an expiration time in IIS, simply open the ISM, bring up the property sheet for the file or directory that you want to configure, and click on the "HTTP Headers" tab. Next, put a check in the box labeled "Enable Content Expiration" and, using the radio buttons, choose one of the options provided. You can choose to expire the content immediately, set a relative expiration time (in minutes, hours, or days), or set an absolute expiration time. Note that both Expires and Cache-Control headers will be inserted into the affected responses. The basic idea is shown here.

IIS Cache Control Settings

1305_Fig5

Despite the user-friendliness of the GUI, it is actually rather clumsy to set different cache control policies for different categories of files on IIS. It gets much worse if the site's files are not neatly segregated into directories based on their desired freshness lifetimes. If you happen to have designed your site with caching in mind, and put different classes of files in different directories, such as /images/dynamic, images/static, images/navigation, and so on, it's easy to set up caching policies via the MMC. But, if you haven't designed your site this way, or you're working to optimize an existing site, you may literally have to set the policy on each and every file, which would be quite a chore.

Even more troubling is that, unlike Apache, IIS offers no easy way to delegate to developers the authority to set cache policy, as modification of the required settings necessitates MMC access. Fortunately, these gaps in functionality between IIS and Apache with mod_expires enabled can be closed fairly easily with a third-party tool such as CacheRight from Port80 Software.

Modeled after mod_expires, CacheRight creates a single, text-based rules file that lives in each Website's document root and allows both administrators and developers to set expiration directives for an entire site. In addition, CacheRight goes beyond mod_expires by adding an ExpiresByPath directive to complement the ExpiresByType directive. This functionality makes it trivially easy to set both a general cache control policy for files of a given type, and to override that rule with a more specific one for a subset of files of that type, as in this example:

ExpiresByType image/* : 6 months after access public    
ExpiresByPath /navimgs/*, /logos/* : 1 year after modification public

Here, all images will have a freshness lifetime of six months, except those located in the navimgs and logos directories. Like mod_expires, CacheRight lets you set the expiration times relative to the modification time of the file(s), as well as to the user's first access. This flexibility can be very useful when publication or update schedules are not set in stone, which is, of course, all too common in Web development.

Regardless of which server you use, it's well worth the time to figure out how to manage cache control at the server level. As with programmatic cache control, the directives will be respected by well-behaved intermediary caches and not just by browser caches, as in the case of the <meta> tag approach. Furthermore, unlike programmatic or <meta> tag-based cache control, server-based cache control makes it easy to set caching policies for all the really heavy objects in a site, such as images or Flash files. It is in the caching of these objects that performance gains are most obvious.

Conclusion

While we've only scratched the surface of the complex topic of caching, hopefully we have shed some light on this under-appreciated facet of Website performance. In particular, we hope that you better understand why it is vital to have a set of cache control policies for your site, and that you have gained some ideas about how to go about effectively implementing those policies.

The results of properly applied cache rules will be obvious -- dramatically faster page loads for your end users, especially repeat visitors. In addition, you will make more efficient use of your bandwidth and your server resources. The fact that all these enhancements can be achieved through little more than a heightened attention to HTTP headers makes effective, expiration-based cache-control one of the most cost-effective performance optimizations you will ever make to your site.

Next week, we'll wrap up this series on cost-effective Website acceleration by focusing on Web server modifications, most notably standard HTTP compression. We'll then present a case study that will illustrate the concrete performance gains that can be achieved by applying all the techniques together.

If you liked this article, share the love:
Print-Friendly Version Suggest an Article

Sponsored Links