PHP - Output buffering

From LXF Wiki

Table of contents

Practical PHP Programming

(Original version written by Paul Hudson for Linux Format magazine issue 46.)


One feature we've consistently been asked to cover in-depth is output buffering, so this tutorial we're looking at it in some depth.


Output buffering was introduced in the original release PHP 4, yet for one reason or another it has still yet to become commonly used in websites. This is a great shame, because without output buffering, PHP sends the output of your scripts to your web server as soon as it's ready - this might be line by line or code block by code block.

As you can imagine, the need to send lots of little bits of data is incredibly slow, however much more annoying is the fact that you're restricted in the order you can send data. Output buffering solves this problem by enabling you to buffer up your output and send it to output when you're ready to do so, or even to not send it at all, if you so decide.

We had lots of requests for PHP tutorials on particular topics, and this was the most common. Many people didn't seem to understand the basic OB concepts, so I've broken this tutorial down into bite-size chunks to make each point clear by itself before moving on.


OB Advantages

As most people who work with cookies and other HTTP headers will know, it's often quite a pain to order your output properly. In HTTP, you always need to send header data before content data, which means that if you want to set a cookie half-way through a script, you're in trouble. Luckily, output buffering comes to the rescue by letting you "send" cookies at any point your script - it stores these cookies separately to the HTML data then sends them together at the end, in the correct order. The bulking together of data also provides quite a performance improvement because there's no longer any need to send it a few kilobytes at a time - PHP stores up all the output of your script until you instruct it to send, at which point all data is sent in one chunk. The most popular advantage to output buffering is that you can compress content before you send it. Due to the fact that HTML is lots of very simple, repeating text elements, and that normal written text on a web site is also very easy to compress, compressing your pages can make a big dent in the amount of bandwidth your site (and your visitor!) uses. Because compression requires full knowledge of what it is compressing, you need output buffering. One of the least-used but most powerful advantages is that output buffers are stackable, meaning that you can have several buffers working on top of each other, allowing you to build your output up over multiple buffers.


Performance considerations

If you're not using content compression, output buffering is very unlikely to affect the speed of your web server by any great amount - if anything, it should help it serve pages /faster/ because of the optimised data sending. Content compression does take up a little CPU time on both the server and on clients visiting your site, but it's pretty small. On the up-side, content compression should decrease the amount of bandwidth you use by 40-60%, which means your server will spend less time sending data across the network. The compression level you achieve depends entirely on the kind of content you serve up - if you have lots of pictures, which content compression won't affect, your compression level will be lower; if you're sending lots of XML, which is a naturally repeating format that is very easy to compress, your compression level will be much higher. It's important to remember that only the output of your PHP script will be compressed - images, CSS files, etc, are all served as normal.


Getting started

You can enable output buffering in one of two ways, one of which seems easier at first but is likely to cause hassle in the long term. The "easy" option is to edit your php.ini file to enable output buffering for all scripts - this might sound great, but it will mean your scripts will break on other PHP installations, and also means you have no way to not have output buffering for a script. The second option, which is much smarter, is to use the set of output buffering function calls on a script-by-script basis.

The ob_start() function is used to create a new output buffer, and you can immediately start writing to it by printing out content as normal. Once you have a buffer open, there are two ways to close it: ob_end_flush() and ob_end_clean(), both of which end the buffer, but do so in slightly different ways. The former ends the buffer and sends all data to output, and the latter ends the buffer without sending it to output, effectively wiping out any information you saved in there. Every piece of text outputted while an output buffer is open is placed into that buffer as opposed to being sent to output. Consider the following script:

<?php
  ob_start();
  print "In first buffer!\n";
  ob_end_flush();
  ob_start();
  print "In second buffer!\n";
  ob_end_clean();
  ob_start();
  print "In third buffer!\n";
?>

That script will output "In first buffer" because the first text is placed into a buffer then flushed with ob_end_flush(). The "In second buffer" won't be printed out, though, because it's placed into a buffer which is cleaned using ob_end_clean() and not sent to output. Finally, the script will print out "In third buffer" because PHP automatically flushes open output buffers when it reaches the end of a script.


Stacking buffers

The functions ob_end_flush() and ob_end_clean() are complemented by ob_flush() and ob_clean() - these do the same jobs as their longer cousins, with the difference that they don't end the output buffer. Instead, these functions send the content to output or clean the buffer (respectively), leaving it open for more text. We'll be looking at how you can use these functions to re-use your buffers later on, but for now it's important to understand that you can stack buffers up upon each other to make them even more useful.

Consider the following script:

<?php
  ob_start();
  print "In first buffer!\n";
  ob_start();
  print "In second buffer!\n";
  ob_clean();
?>

In that script, we call ob_start() twice without closing either of the buffers, and so the end result is that "In first buffer" is printed out by itself. If you thought that "In second buffer" would be printed out too or that neither lines of text would appear, you haven't grasped quite how buffer stacking works!

The first buffer is started and filled with "In first buffer", then a second buffer is started on top of the first buffer, leaving the first buffer still intact and containing "In first buffer". At this point, we can no longer write to the first buffer, because the second buffer is top of the stack. The new buffer is filled with "Hello second", and finally ob_clean() is called, wiping the second buffer, /but leaving the first one intact/.


Flushing stacked buffers

When you stack your output buffers up, data you flush is moved up one level in the stack as opposed to being sent directly to output. This makes more sense with some code, so here you go:

<?php
  ob_start();
  print "In first buffer\n";
  ob_start();
  print "In second buffer\n";
  ob_end_flush();
  print "In first buffer\n";
  ob_end_flush();
?>

That script will output the following:

In first buffer
In second buffer
In first buffer

What happens there is that the second buffer gets flushed into the first buffer where it left off, as opposed to directly to output - it literally gets copied into the parent buffer. The first buffer then gets "In first buffer" added to it, then flushed to output. Take a look at this following script:

<?php
  ob_start();
  print "In first buffer\n";
  ob_start();
  print "In second buffer\n";
  ob_end_flush();
  print "In first buffer\n";
  ob_end_clean();
?>

It's the same as the previous script, with the only difference being the last line - ob_end_clean() is used rather than ob_end_flush(). This time the output is nothing at all, because the second buffer gets flushed into the first buffer, then the first buffer gets cleaned, which means the clients receives none of the text.

As long as you keep in mind that output buffers are stacked up like blocks, which means you can't write to any one of the stack of buffers, this functionality will work in your favour. Using this method it's very easy to progressively build up your content by opening up new buffers as needed, flushing in content to a parent buffer as you go.


Reusing buffers

Given that ob_flush() and ob_clean() leave the current output buffer open for further writing, there's a potentially big performance boost just waiting to be taken advantage of - by not closing and re-opening buffers all the time, this next script could be rewritten...

<?php
  ob_start();
  print "In first buffer!\n";
  ob_end_flush();
  ob_start();
  print "In second buffer!\n";
  ob_end_clean();
  ob_start();
  print "In third buffer!\n";
?>

...like this...

<?php
  ob_start();
  print "In first buffer!\n";
  ob_flush();
  print "In second buffer!\n";
  ob_clean();
  print "In third buffer!\n";
?>

In the new script, the buffer is first flushed and left open, then cleaned and still left open, until finally being automatically closed and flushed by PHP as the script terminates. By not needing to create and end output buffers as the script executes, thereby reusing the same buffer each time, that script executes about 60% faster - this is a substantial difference, as I'm sure you'll agree.


Reading buffers

While writing to and flushing buffers is a boon by itself, you can also /read back/ the contents of output buffers, effectively receiving a copy of all the output it holds. This clever functionality is contained in one simple call to ob_get_contents(), which takes no parameters and returns a string of all the content it contains. Reading your output back from a buffer is more useful than you might at first think, but it does take a little experimenting to get quite right.

Last issue, for example, we used output buffering and ob_get_contents() to write a static page cache - the modified date of the PHP script was compared against the modified date of the cached page, and if the script was newer, it would execute and output its content into an output buffer, which was then retrieved and written to a file.

The key advantage to retrieving output buffering is that you can make one script do many things with almost no change. For example, if you have a script that tracks the location of a package while it's being shipped around the world, the default configuration might have it send it s data directly to output for web browsers. However, by using output buffering it literally is a tiny change to make that same script send its output to email, or to an SMS number - the possibilities are endless.

Combining reading output buffering with flushing means that you can save your output to a buffer, read it back in, pass it through various functions to alter the data, then send it back to output - you really have much, much more flexibility, and there are many clever ways you can take advantage of this.


Other OB functions

There are two utility functions that give you information on your current output buffering situation, and these are ob_get_level() and ob_get_length(). The ob_get_level() function is particularly useful as it tells you the buffer stack level you're at - literally how many buffers you have open currently. By default this will return 0 because you have no buffers open, but this number increases as you add more buffers. Ob_get_level() function is particularly helpful if you want to recursively work with and close open buffers, because you can loop down from ob_get_level() to 0.

On the other hand, there's ob_get_length(), which returns the size in bytes of the current output buffer. Note that this is not the total length of all buffers, but only the length of the current buffer - you need to use this in combination with ob_get_level() to get the total buffer lengths while flushing.


Compressing output

When visitors come to your site, the HTTP request they send for a page also includes a lot of other information about that visitor. For example, it sends the name of the web browser they are using, the last page they were at, and what kind of content encoding they can accept. The content encoding is what we're interested in, because browsers that support compressed (gzipped) HTML say so in the HTTP request, which means that the web server can gzip the content before it sends it. A key feature of this system is that if a client /doesn't/ say that it supports gzip encoding, the web server sends back plain text - this process is all entirely transparent to the user.

Output compression, being just gzip wrapped up nicely, requires that you have all the content ready before you compress it. Naturally this is very close to output buffering, which collects all its content up in buffers before sending it out. As such, it's an easy jump from output buffering to content compression - you store your data up in a buffer, compress it at the last moment then send it out.

Passing the parameter 'ob_gzhandler' to ob_start() enables output buffering compression - PHP automatically takes care of checking whether the client supports it or not, and only sends compressed output if it is supported. By calling ob_start() with 'ob_gzhandler', you're effectively saying "if compression is supported, send content compressed; otherwise, send plain text".

At the client end, the compressed content is automatically uncompressed and displayed normally, because the process is transparent. If you want to check whether your output compression is worked, you need to telnet into your server. First, create a very basic script like the one below:

<?php
  ob_start('ob_gzhandler');
  print "Goodbye, Perl!\n";
?>

Remember that PHP automatically closes and flushes open output buffers, so the above script simply outputs the compressed text, "Goodbye, Perl!". Save the script into your public HTML directory as gztest.php. To check that the ob_gzhandler parameter is working, you need to telnet into your server on port 80, so, from a console window, enter this command: telnet <your server> 80. This will connect to your web server on the HTTP port - you now get to pretend to be a web browser.

Once you're connected, enter "GET /gztest.php HTTP/1.0" and press enter twice. This forms your complete HTTP request, and you should get your response quite quickly - this should be several lines of HTTP headers, followed by the content of the page, "Goodbye, Perl!".

Here's what I got below:

paul@hud-lxf:~$ telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET /gztest.php HTTP/1.0

HTTP/1.1 200 OK
Date: Mon, 04 Aug 2003 10:43:31 GMT
Server: Apache/1.3.28 (Unix) PHP/4.3.3
X-Powered-By: PHP/4.3.3
Connection: close
Content-Type: text/html

Goodbye, Perl!
Connection closed by foreign host.

After the usual collection of HTTP headers coming back in the response, "Goodbye, Perl" is there - it's not compressed though. The reason for this is because, as mentioned, web servers will only send compressed output if browsers say that they support compression. So, to get compressed content back, we need to mimic a browser that supports compression. Open up the telnet connection again with the same command as last time, but this time enter "GET /ob.php HTTP/1.0", press enter /once/, then type "ACCEPT-ENCODING: gzip" and press enter twice. Here's what I got this time around:

paul@hud-lxf:~$ telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET /gztest.php HTTP/1.0
ACCEPT-ENCODING: gzip

HTTP/1.1 200 OK
Date: Mon, 04 Aug 2003 10:43:31 GMT
Server: Apache/1.3.28 (Unix) PHP/4.3.3
X-Powered-By: PHP/4.3.3
Content-Encoding: gzip
Vary: Accept-Encoding
Connection: close
Content-Type: text/html

OI(tm)L'H- QI.
               Connection closed by foreign host.

Note that our content has now been compressed - it's almost certainly not going to come out properly printed in the magazine, but if you try it yourself you'll see that it outputs various obscure ASCII characters instead of "Goodbye, Perl!". You'll also see that the compressed version is actually /longer/ than the uncompressed version - this is because compressing very small amounts of text rarely has any gain, and usually actually works out worse. However, it's rare your site will contain such a small amount of text - once you reach about 100 bytes, the compression will start to work in your favour.

As you've seen, the same PHP code can send two different pages depending on whether the client signals that it supports content compression. As such, there really is little reason why you shouldn't use compression unless you're particularly short of CPU time and have bandwidth to spare.


Conclusion

Hopefully this excursion into the world of output buffering has cleared up any questions you had, and also hopefully expanded your horizons a little. Output buffering still hasn't got a big following in the developer community, which is a shame - it's powerful, fast, and opens up so many new possibilities for programming that would simply not be possible otherwise.

For many, output buffering is a "neat feature" that's rarely used, simply because they often don't understand quite how big an impact it can have. So, I encourage you to throw your inhibitions to the wind and give it a try - I'm almost certain you won't regret it!