Article

Home » Server-side Coding » PHP & MySQL Tutorials » The PHP Anthology Volume 2, Chapter 5 - Caching

About the Author

Harry Fuecks

author_HarryF Harry has been working in corporate IT since 1994, with everything from start-ups to Fortune 100 companies. Outside of office hours he runs phpPatterns: a site dedicated to software design with PHP that aims to raise standards of PHP development. He also maintains Dynamically Typed: SitePoint's PHP blog.

View all articles by Harry Fuecks...

The PHP Anthology Volume 2, Chapter 5 - Caching

By Harry Fuecks

February 16th, 2004

Reader Rating: 9

Page: 1 2 3 4 Next

In the good old days, back when building Websites was as easy as knocking up a few HTML pages, the delivery of a Web page to a browser was a simple matter of having the Web server fetch a file. A site's visitors would see its small, text-only pages almost immediately, unless they were using particularly slow modems. Once the page was downloaded, the browser would cache it somewhere on the local computer so that, should the page be requested again, after performing a quick check with the server to ensure the page hadn't been updated, the browser could display the locally cached version. Pages were served as quickly and efficiently as possible, and everyone was happy (except those using 9600 bps modems).

The advent of dynamic Web pages spoiled the party, effectively "breaking" this model of serving pages by introducing two problems:

  • When a request for a dynamic Web page is received by the server, some intermediate processing, such as the parsing of scripts by the PHP engine, must be completed. This introduces a delay before the Web server begins to deliver the output to the browser. For simple PHP scripts this may not be significant, but for a more complex application, the PHP engine may have a lot of work to do before the page is finally ready for delivery. This extra work results in a noticeable lag between the users' requests and the actual display of pages in their browsers.
  • A typical Web server, such as Apache, uses the time of file modification to correctly inform a Web browser of a requested page's cache status. With dynamic Web pages, the actual PHP script may change only occasionally, while the content it displays, which is perhaps fetched from a database, will change frequently. The Web server has no way of knowing about updates to the database, however, so it doesn't send a last modified date. If the client (browser) has no indication of how long the data is valid, it will take a guess, which usually means it will request the same page again. The Web server will always respond with a fresh version of the page, regardless of whether the data has changed. To avoid this shortcoming, most Web developers use a meta tag or HTTP headers to tell the browser never to use a cached version of the page. However, this negates the Web browser's natural ability to cache Web pages, and involves some serious disadvantages. For example, the content delivered by a dynamic page may change once a day, so there's certainly a benefit to be gained by having the browser cache a page—even if only for twenty four hours.

It's usually possible to live with both problems given a small PHP application, but as the complexity of, and traffic to, your site increases, you may run into difficulties. However, both these issues can be solved, the first with server side caching, the second, by taking control of client side caching from within your application. The exact approach you use to solve the problem will depend on your application, but in this chapter, we'll see how you can solve both using PHP and a number of class libraries from PEAR.

Note that in this chapter's discussions of caching, we'll look at only those solutions implemented in PHP. These should not be confused with some of the script caching solutions that work on the basis of optimizing and caching compiled PHP scripts. Included in this group are the Zend Accelerator, iconCube PHP Accelerator, and Turck MMCache, the latter being the only accelerator that's ready for use with Windows based PHP installations today.

How do I prevent Web browsers caching a page?

Before we look at the approaches you can take to client and server side caching, the first thing we need to understand is how to prevent Web browsers (and proxy servers) from caching pages in the first place. The most basic approach to doing this utilizes HTML meta tags:

<meta http-equiv="Expires" content="Mon, 26 Jul 1997 05:00:00 GMT"
/>
<meta http-equiv="Pragma" content="no-cache" />

By inserting a past date into the Expires meta tag, we can tell the browser that the cached copy of the page is always out of date. This means the browser should never cache the page. The Pragma: no-cache meta tag is a fairly well-supported convention that most Web browsers follow. Upon encountering this tag, they usually won't cache the page (although there's no guarantee; this is just a convention).

It sounds good, but there are two problems associated with the use of meta tags:

  1. If a tag wasn't present when the page was first requested by a browser, but appears later (for example, you modified the included pageheader.php file, which contains the top of every Web page), the browser will remain blissfully ignorant and keep its cached copy of the original.

  2. Proxy servers that cache Web pages, such as those common to ISPs, generally will not examine the HTML documents themselves. Instead, they rely purely on the Web server from which the documents came, and the HTTP protocol. In other words, a Web browser might know that it shouldn't cache the page, but the proxy server between the browser and your Web server probably doesn't—it will continue to deliver the same out-of-date page to the client.

A better approach is to use the HTTP protocol itself, with the help of PHP's header function, to produce the equivalent of the two meta tags above:

<?php
header('Expires: Mon, 26 Jul 1997 05:00:00 GMT');
header('Pragma: no-cache');
?>

We can go one step further, using the Cache-Control header that's supported by HTTP 1.1 capable browsers:

<?php
header('Expires: Mon, 26 Jul 1997 05:00:00 GMT');
header('Cache-Control: no-store, no-cache, must-revalidate');
header('Cache-Control: post-check=0, pre-check=0', FALSE);
header('Pragma: no-cache');
?>

This essentially guarantees that no Web browser or intervening proxy server will cache the page, so visitors will always receive the latest content. In fact, the first header should accomplish this on its own; this is the best way to ensure a page is not cached. The Cache-Control and Pragma headers are added for "insurance" purposes. Though they don't work on all browsers or proxies, they will catch some cases in which the Expires header doesn't work as intended (e.g. if the client computer's date is set incorrectly).

Of course, to disallow caching entirely introduces the problems we discussed at the start of this chapter. We'll look at the solution to these issues in just a moment.

Internet Explorer and File Download Caching

Our discussion of PDF rendering in Chapter 3, Alternative Content Types explained that issues can arise when you're dealing with caching and file downloads. In serving a file download via a PHP script that uses headers such as Content-Disposition: attachment, filename=myFile.pdf or Content-Disposition: inline, filename=myFile.pdf, you'll have problems with Internet Explorer if you tell the browser not to cache the page.

Internet Explorer handles downloads in a rather unusual manner, making two requests to the Website. The first request downloads the file, and stores it in the cache before making a second request (without storing the response). This request invokes the process of delivering the file to the end user in accordance with the file's type (e.g. it starts Acrobat Reader if the file is a PDF document). This means that, if you send the cache headers that instruct the browser not to cache the page, Internet Explorer will delete the file between the first and second requests, with the result that the end user gets nothing. If the file you're serving through the PHP script will not change, one solution is simply to disable the "don't cache" headers for the download script.

If the file download will change regularly (i.e. you want the browser to download an up-to-date version), you'll need to use the last-modified header, discussed later in this chapter, and ensure that the time of modification remains the same across the two consecutive requests. You should be able to do this without affecting users of browsers that handle downloads correctly. One final solution is to write the file to your Web server and simply provide a link to it, leaving it to the Web server to report the cache headers for you. Of course, this may not be a viable option if the file is supposed to be secured by the PHP script, which requires a valid session in order to provide users access to the file; with this solution, the written file can be downloaded directly.

If you liked this article, share the love:
Print-Friendly Version Suggest an Article

Sponsored Links