Article
Cache it! Solve PHP Performance Problems
Acting on the Browser's Request Headers
A more useful approach to client-side cache control is to make use of the Last-Modified and If-Modified-Since headers, both of which are available in HTTP 1.0. This action is known technically as performing a conditional GET request; whether your script returns any content depends on the value of the incoming If-Modified-Since request header.
If you use PHP version 4.3.0 and above on Apache, the HTTP headers are accessible with the functions apache_request_headers and apache_response_headers. Note that the function getallheaders has become an alias for the new apache_request_headers function.
This approach requires that you send a Last-Modified header every time your PHP script is accessed. The next time the browser requests the page, it sends an If-Modified-Since header containing a time; your script can then identify whether the page has been updated since that time. If it hasn't, your script sends an HTTP 304 status code to indicate that the page hasn't been modified, and exits before sending the body of the page.
Let's see these headers in action. The example below uses the modification date of a text file. To simulate updates, we first need to create a way to randomly write to the file:
ifmodified.php (excerpt)
<?php
$file = 'ifmodified.txt';
$random = array (0,1,1);
shuffle($random);
if ( $random[0] == 0 ) {
$fp = fopen($file, 'w');
fwrite($fp, 'x');
fclose($fp);
}
$lastModified = filemtime($file);
Our simple randomizer provides a one-in-three chance that the file will be updated each time the page is requested. We also use the filemtime function to obtain the last modified time of the file.
Next, we send a Last-Modified header that uses the modification time of the text file. We need to send this header for every page we render, to cause visiting browsers to send us the If-Modifed-Since header upon every request:
ifmodified.php (excerpt)
header('Last-Modified: ' .
gmdate('D, d M Y H:i:s', $lastModified) . ' GMT');
Our use of the getallheaders function ensures that PHP gives us all the incoming request headers as an array. We then need to check that the If-Modified-Since header actually exists; if it does, we have to deal with a special case caused by older Mozilla browsers (earlier than version 6), which appended an illegal extra field to their If-Modified-Since headers. We use PHP's strtotime function to generate a timestamp from the date the browser sent us. If there's no such header, we set this timestamp to zero, which forces PHP to give the visitor an up-to-date copy of the page:
ifmodified.php (excerpt)
$request = getallheaders();
if (isset($request['If-Modified-Since']))
{
$modifiedSince = explode(';', $request['If-Modified-Since']);
$modifiedSince = strtotime($modifiedSince[0]);
}
else
{
$modifiedSince = 0;
}
Finally, we check to see whether or not the cache has been modified since the last time the visitor received this page. If it hasn't, we simply send a 304 Not Modified response header and exit the script, saving bandwidth and processing time by prompting the browser to display its cached copy of the page:
ifmodified.php (excerpt)
if ($lastModified <= $modifiedSince)
{
header('HTTP/1.1 304 Not Modified');
exit();
}
echo ( 'The GMT is now '.gmdate('H:i:s').'<br />' );
echo ( '<a href="'.$_SERVER['PHP_SELF'].'">View Again</a><br />' );
?>
Remember to use the "View Again" link when you run this example (clicking the Refresh button usually clears your browser's cache). If you click on the link repeatedly, the cache will eventually be updated; your browser will throw out its cached version and fetch a new page from the server.
If you combine the Last-Modified header approach with time values that are already available in your application--for example, the time of the most recent news article--you should be able to take advantage of web browser caches, saving bandwidth and improving your application's perceived performance in the process.
Be very careful to test any caching performed in this manner, though; if you get it wrong, you may cause your visitors to consistently see out-of-date copies of your site.
Discussion
HTTP dates are always calculated relative to Greenwich Mean Time (GMT). The PHP function gmdate is exactly the same as the date function, except that it automatically offsets the time to GMT based on your server's system clock and regional settings.
When a browser encounters an Expires header, it caches the page. All further requests for the page that are made before the specified expiry time use the cached version of the page--no request is sent to the web server. Of course, client-side caching is only truly effective if the system time on the computer is accurate. If the computer's time is out of sync with that of the web server, you run the risk of pages either being cached improperly, or never being updated.
The Expires header has the advantage that it's easy to implement; in most cases, however, unless you're a highly organized person, you won't know exactly when a given page on your site will be updated. Since the browser will only contact the server after the page has expired, there's no way to tell browsers that the page they've cached is out of date. In addition, you also lose some knowledge of the traffic visiting your web site, since the browser will not make contact with the server when it requests a page that's been cached.
How do I examine HTTP headers in my browser?
How can you actually check that your application is running as expected, or debug your code, if you can't actually see the HTTP headers? It's worth knowing exactly which headers your script is sending, particularly when you're dealing with HTTP cache headers.
Solution
Several worthy tools are available to help you get a closer look at your HTTP headers:
LiveHTTPHeaders
This add-on to the Firefox browser is a simple but very handy tool for examining request and response headers while you're browsing.
Firebug
Another useful Firefox add-on, Firebug is a tool whose interface offers a dedicated tab for examining HTTP request information.
HTTPWatch
This add-on to Internet Explorer for HTTP viewing and debugging is similar to LiveHTTPHeaders above.
Charles Web Debugging Proxy
Available for Windows, Mac OS X, and Linux or Unix, the Charles Web Debugging Proxy is a proxy server that allows developers to see all the HTTP traffic between their browsers and the web servers to which they connect.
Any of these tools will allow you to inspect the communication between the server and browser.
How do I cache file downloads with Internet Explorer?
If you're developing file download scripts for Internet Explorer users, you might notice a few issues with the download process. In particular, when you're serving a file download through a PHP script that uses headers such as Content-Disposition: attachment, filename=myFile.pdf or Content-Disposition: inline, filename=myFile.pdf, and that tells the browser not to cache pages, Internet Explorer won't deliver that file to the user.
Solutions
Internet Explorer handles downloads in a rather unusual manner: it makes two requests to the web site. The first request downloads the file and stores it in the cache before making a second request, the response to which is not stored. The second request invokes the process of delivering the file to the end user in accordance with the file's type--for instance, it starts Acrobat Reader if the file is a PDF document. Therefore, if you send the cache headers that instruct the browser not to cache the page, Internet Explorer will delete the file between the first and second requests, with the unfortunate result that the end user receives nothing!
If the file you're serving through the PHP script won't change, one solution to this problem is simply to disable the "don't cache" headers, pragma and cache-control, which we discussed in "How do I prevent web browsers from caching a page?", for the download script.
If the file download will change regularly, and you want the browser to download an up-to-date version of it, you'll need to use the Last-Modified header that we met in "How do I control client-side caching?", and ensure that the time of modification remains the same across the two consecutive requests. You should be able to achieve this goal without affecting users of browsers that handle downloads correctly.
One final solution is to write the file to the file system of your web server and simply provide a link to it, leaving it to the web server to report the cache headers for you. Of course, this may not be a viable option if the file is supposed to be secured.
How do I use output buffering for server-side caching?
Server-side processing delay is one of the biggest bugbears of dynamic web pages. We can reduce server-side delay by caching output. The page is generated normally, performing database queries and so on with PHP; however, before sending it to the browser, we capture and store the finished page somewhere--in a file, for instance. The next time the page is requested, the PHP script first checks to see whether a cached version of the page exists. If it does, the script sends the cached version straight to the browser, avoiding the delay involved in rebuilding the page.
Solution
Here, we'll look at PHP's in-built caching mechanism, the output buffer, which can be used with whatever page rendering system you prefer (templates or no templates). Consider situations in which your script displays results using, for example, echo or print, rather than sending the data directly to the browser. In such cases, you can use PHP's output control functions to store the data in an in-memory buffer, which your PHP script has both access to and control over.
Here's a simple example that demonstrates how the output buffer works:
buffer.php (excerpt)
<?php
ob_start();
echo '1. Place this in the buffer<br />';
$buffer = ob_get_contents();
ob_end_clean();
echo '2. A normal echo<br />';
echo $buffer;
?>
The buffer itself stores the output as a string. So, in the above script, we commence buffering with the ob_startfunction, and use echo to display a piece of text which is stored in the output buffer automatically. We then use the ob_get_contents function to fetch the data the echo statement placed in the buffer, and store it in the $buffer variable. The ob_end_clean function stops the output buffer and empties the contents; the alternative approach is to use the ob_end_flushfunction, which displays the contents of the buffer.
The above script displays the following output:
2. A normal echo
1. Place this in the buffer
In other words, we captured the output of the first echo, then sent it to the browser after the second echo. As this simple example suggests, output buffering can be a very powerful tool when it comes to building your site; it provides a solution for caching, as we'll see in a moment, and is also an excellent way to hide errors from your site's visitors, as is discussed in Chapter 9. Output buffering even provides a possible alternative to browser redirection in situations such as user authentication.
In order to improve the performance of our site, we can store the output buffer contents in a file. We can then call on this file for the next request, rather than having to rebuild the output from scratch again. Let's look at a quick example of this technique. First, our example script checks for the presence of a cache file:
sscache.php (excerpt)
<?php
if (file_exists('./cache/page.cache'))
{
readfile('./cache/page.cache');
exit();
}
If the script finds the cache file, we simply output its contents and we're done! If the cache file is not found, we proceed to output the page using the output buffer:
sscache.php (excerpt)
ob_start();
?>
<!DOCTYPE html public "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Cached Page</title>
</head>
<body>
This page was cached with PHP's
<a href="http://www.php.net/outcontrol"
>Output Control Functions</a>
</body>
</html>
<?php
$buffer = ob_get_contents();
ob_end_flush();
Before we flush the output buffer to display our page, we make sure to store the buffer contents in the $buffer variable.
The final step is to store the saved buffer contents in a text file:
sscache.php (excerpt)
$fp = fopen('./cache/page.cache','w');
fwrite($fp,$buffer);
fclose($fp);
?>
The page.cache file contents are exactly same as the HTML that was rendered by the script:
cache/page.cache (excerpt)
<!DOCTYPE html public "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Cached Page</title>
</head>
<body>
This page was cached with PHP's
<a href="http://www.php.net/outcontrol"
>Output Control Functions</a>
</body>
</html>