Article

The PHP Anthology Volume 2, Chapter 5 - Caching

Page: 1 2 3 4 Next

How do I capture server side output for caching?

It's time to look at how we can reduce server side delay by caching output. The general approach begins by rendering the page as normal, performing database queries and so on with PHP. However, before sending it to the browser, we capture and store the finished page somewhere, for instance, in a file. The next time the page is requested, the PHP script first checks to see whether a cached version of the page exists. If it does, the script sends the cached version straight to the browser, avoiding the delay involved in rebuilding the page.

What about Template Caching?

Template engines such as Smarty often talk about template caching. Usually, these engines offer an in-built mechanism for storing a compiled version of a template (i.e. the native PHP generated from the template), which prevents us having to recompile the template every time a page is requested. This should not be confused with output caching, which refers to the caching of the rendered HTML (or other output) that PHP sends to the browser. You can successfully use both types of caching together on the same site.

Here, we'll look at PHP's in-built caching mechanism, the output buffer, which can be used with whatever page rendering system you prefer (templates or no templates). Consider a situation in which your script displays results using, for example, echo or print, rather than sending the data directly to the browser. In these cases, you can use PHP's output control functions to store the data in an in-memory buffer, which your PHP script has both access to and control over.

Here's a simple example:

Example 5.1. 1.php  
 
<?php  
// Start buffering the output  
ob_start();  
 
// Echo some text (which is stored in the buffer);  
echo '1. Place this in the buffer<br />';  
 
// Get the contents of  
$buffer = ob_get_contents();  
 
// Stop buffering and clean out the buffer  
ob_end_clean();  
 
// Echo some text normally  
echo '2. A normal echo<br />';  
 
// Echo the contents from the buffer  
echo $buffer;  
?>

The buffer itself stores the output as a string. So, in the above script, we commence buffering with ob_start and use echo to display something. We then use ob_get_contents to fetch the data the echo statement placed in the buffer, and store it in a string. The ob_end_clean function stops the output buffer and trashes the contents; the alternative is ob_end_flush, which displays the contents of the buffer.

The above script displays:

2. A normal echo  
1. Place this in the buffer

In other words, we captured the output of the first echo, then sent it to the browser after the second echo. As this simple example suggests, output buffering can be a very powerful tool when it comes to building your site; it provides a solution for caching, as we'll see in a moment, and is an excellent way to hide errors from your site's visitors (see Chapter 10, Error Handling). It even provides a possible alternative to browser redirection in situations such as user authentication.

HTTP Headers and Output Buffering

Output buffering can help solve the most common problem associated with the header function, not to mention session_start and set_cookie. Normally, if you call any of these functions after page output has begun, you'll get a nasty error message. With output buffering turned on, the only output types that can escape the buffer are HTTP headers. Using ob_start at the very beginning of your application's execution, you can send headers at whichever point you like, without encountering the usual errors. You can then write out the buffered page content all at once, when you're sure there are no more HTTP headers required.

Using Output Buffering for Server Side Caching

Now you've seen a basic example of output buffering, here's the next step, in which the buffer is stored as a file:

Example 5.2. 2.php  
 
<?php  
// If a cached version exists use it...  
if (file_exists('./cache/2.cache')) {  
 
 // Read and display the file  
 readfile('./cache/2.cache');  
 exit();  
 
}  
 
// Start buffering the output  
ob_start();  
 
// Display some HTML  
?>  
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"  
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">  
<html xmlns="http://www.w3.org/1999/xhtml">  
<head>  
<title> Cached Page </title>  
<meta http-equiv="Content-Type"  
 content="text/html; charset=iso-8859-1" />  
</head>  
<body>  
This page was cached with PHP's  
<a href="http://www.php.net/outcontrol">Output Control  
Functions</a>  
</body>  
</html>  
 
<?php  
// Get the contents of the buffer  
$buffer = ob_get_contents();  
 
// Stop buffering and display the buffer  
ob_end_flush();  
 
// Write a cache file from the contents  
$fp = fopen('./cache/2.cache', 'w');  
fwrite($fp, $buffer);  
fclose($fp);  
?>

First, the above script checks to see if a cached version of the page exists and, if it does, the script reads and displays it. Otherwise, it uses output buffering to create a cached version of the page. It stores this as a file, while using ob_end_flush to display the page to the visitor.

The file 2.cache looks exactly like the HTML that was rendered by the script:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"  
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">  
<html xmlns="http://www.w3.org/1999/xhtml">  
<head>  
<title> Cached Page </title>  
<meta http-equiv="Content-Type"  
 content="text/html; charset=iso-8859-1" />  
</head>  
<body>  
This page was cached with PHP's  
<a href="http://www.php.net/outcontrol">Output Control  
Functions</a>  
</body>  
</html>

Chunked Buffering

A simplistic approach to output buffering is to cache an entire page. However, this approach forfeits the real opportunities presented by PHP's output control functions to improve your site's performance in a manner that's relevant to the varying lifetimes of your content.

No doubt, some parts of the page you send to visitors change very rarely, such as the page's header, menus and footer. But other parts, such as the table containing a forum discussion, may change quite often. Output buffering can be used to cache sections of a page in separate files, then rebuild the page from these—a solution that eliminates the need to repeat database queries, while loops, and so on. You might consider assigning each block of the page an expiry date after which the cache file is recreated, or alternatively, you may build into your application a mechanism that deletes the cache file every time the content it stores is changed.

Here's an example that demonstrates the principle:

Example 5.3. 3.php (excerpt)  
 
<?php  
/**  
* Writes a cache file  
* @param string contents of the buffer  
* @param string filename to use when creating cache file  
* @return void  
*/  
function writeCache($content, $filename)  
{  
 $fp = fopen('./cache/' . $filename, 'w');  
 fwrite($fp, $content);  
 fclose($fp);  
}  
 
/**  
* Checks for cache files  
* @param string filename of cache file to check for  
* @param int maximum age of the file in seconds  
* @return mixed either the contents of the cache or false  
*/  
function readCache($filename, $expiry)  
{  
 if (file_exists('./cache/' . $filename)) {  
   if ((time() - $expiry) > filemtime('./cache/' . $filename)) {  
     return FALSE;  
   }  
   $cache = file('./cache/' . $filename);  
   return implode('', $cache);  
 }  
 return FALSE;  
}

The first two functions we've defined, writeCache and readCache, are used to create cache files and check for their existence, respectively. The writeCache function takes rendered output as its first argument, as well as a filename that should be used when creating the cache file. The readCache function takes a filename of a cache file as its first argument, along with the time in seconds after which the cache file should be regarded as having expired. If it finds a valid cache file, the script will return it; otherwise it returns FALSE to instruct the calling file that either no cache file exists, or it's out of date.

For the purposes of this example, I used a procedural approach. However, I wouldn't recommend doing this in practice, as it will result in very messy code (see later solutions for better alternatives) and is likely to cause issues with file locking (e.g. what happens when someone accesses the cache at the exact moment it's being updated?).

Let's continue this example. After the output buffer is started, processing begins. First, the script calls readCache to see whether the file 3_header.cache exists; this contains the top of the page—the HTML head section and the start of the body. We've used PHP's date function to display the time at which the page was actually rendered, so you'll be able to see the different cache files at work when the page is displayed.

Example 5.4. 3.php (excerpt)  
 
// Start buffering the output  
ob_start();  
 
// Handle the page header  
if (!$header = readCache('3_header.cache', 604800)) {  
 // Display the header  
 ?>  
 
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"  
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">  
 <html xmlns="http://www.w3.org/1999/xhtml">  
 <head>  
 <title> Chunked Cached Page </title>  
 <meta http-equiv="Content-Type"  
   content="text/html; charset=iso-8859-1" />  
 </head>  
 <body>  
 The header time is now: <?php echo date('H:i:s'); ?><br />  
 
 <?php  
 $header = ob_get_contents();  
 ob_clean();  
 writeCache($header,'3_header.cache');  
}

Note what happens when a cache file isn't found. Some content is output and assigned to a variable with ob_get_contents, after which the ob_clean function empties the buffer. This allows us to capture the output in "chunks" and assign it to individual cache files with writeCache. The header of the page is now stored as a file, which can be reused without our needing to re-render the page. Look back to the start of the if condition for a moment. When we called readCache, we gave it an expiry time of 604800 seconds (one week); readCache uses the file modification time of the cache file to determine whether the cache is still valid.

For the body of the page, we'll use the same process as before. However, this time, when we call readCache, we'll use an expiry time of five seconds; the cache file will be updated whenever it's more than five seconds old:

Example 5.5. 3.php (excerpt)  
 
// Handle body of the page  
if (!$body = readCache('3_body.cache', 5)) {  
 echo 'The body time is now: ' . date('H:i:s') . '<br />';  
 $body = ob_get_contents();  
 ob_clean();  
 writeCache($body, '3_body.cache');  
}

The page footer is effectively the same as the header. After this, the output buffering is stopped and the content of the three variables that hold the page data is displayed:

Example 5.6. 3.php (excerpt)  
 
// Handle the footer of the page  
if (!$footer = readCache('3_footer.cache', 604800)) {  
 ?>  
 
 The footer time is now: <?php echo date('H:i:s'); ?><br />  
 </body>  
 </html>  
 
 <?php  
 $footer = ob_get_contents();  
 ob_clean();  
 writeCache($footer, '3_footer.cache');  
}  
// Stop buffering  
ob_end_clean();  
 
// Display the contents of the page  
echo $header . $body . $footer;  
?>

The end result looks like this:

The header time is now: 17:10:42  
The body time is now: 18:07:40  
The footer time is now: 17:10:42

The header and footer are updated on a weekly basis, while the body is updated whenever it is more than five seconds old.

The diagram in Figure 5.1 summarizes the chunked buffering methodology.

1284_image1
Figure 5.1. Chunked Buffering Flow Diagram

Nesting Buffers

You can nest one buffer within another practically ad infinitum simply by calling ob_start more than once. This can be useful if you have multiple operations that use the output buffer, such as one that catches the PHP error messages, and another that deals with caching. Care needs to be taken to make sure that ob_end_flush or ob_end_clean is called every time ob_start is used.

If you liked this article, share the love:
Print-Friendly Version Suggest an Article

Sponsored Links