PHP - Forking processes

From LXF Wiki

Table of contents

Practical PHP

(Original version written by Paul Hudson for Linux Format magazine issue 54.)

Using a Unix means you get a slew of cool PHP functions that can really empower your command-line scripts. We show you how...


Command-line PHP programming brings with it a number of cool possibilities that are either not possible in the Apache module or just not sensible - that's part of the fun. As CLI SAPI programming is gaining more and more use in the community, people are starting to experiment with its potential and finding it's capable of some really cool stuff. In order to give you a big headstart in that arena, we're going to be looking at two particularly cool ways to use PHP on the command line, process control and the POSIX functions, so you can start experimenting yourself.


What are processes?

One process can be thought of as a single unique running instance of a binary file or script. That is, while the vim binary usually exists only once on your computer, you can launch it many times and have many vim processes. Each of these processes is wholly standalone - you can quit one, kill one, halt or background one, and the others remain unaffected. Linux (as well as any half-way decent OS) automatically does process scheduling so that each process gets a slice of the CPU's time (a /timeslice/, no less) and everyone's happy.

In some ways, processes can be thought of like threads, however they are quite different beasts once you start getting deep. Linux, and also most other Unixes, creating a process is very, very quick. This needs to be the case because so much of Unix is about inter-process communication (IPC) - we create a process of ps, pass it through grep, then finally through wc to find out how many instances of one program is running. Running three programs as if they were one combined program is absolutely the norm in Unix, so it would be terrible for performance if starting a process was slow. However, on other OSs (most notably Windows) start a process /isn't/ very fast, and so programs rely on threads, which are essentially virtual processes, to mimic processes.

The whole point of having processes (or threads) is that they are functionally separate parts of a program. If you take an average PHP script, it runs from top to bottom (with occasional interludes when calling functions), but it always, always, /always/ only executes one line at a time. However, when you have two scripts running, you have two processes each doing their own thing, and the OS automatically balances the two so that they both appear to be running at the same time. Of course, if you have a single CPU machine the two processes aren't running at the same /exact/ time because your CPU can only do one thing at once, but if you have a multi-CPU server it is able to run more than one process at once, which allows your two scripts to run side-by-side.

So, if you take the LXFBench 2004 benchmark, which has some elements written in PHP. If that were to run entirely in one process, a 4-way 3.2GHz Xeon server would return the same score as a 1-way 3.2GHz Xeon, which is of course unrealistic. However, by spawning processes, the OS automatically balances the processes out over the other CPUs, and so it returns a score inline with the number of CPUs.

Hopefully you should now have a firm grasp of what processes are, and also why you'd want to use them. However, if not, here's a quick nutshell-type round up: processes let you run more than script at once, and create these child scripts from inside another script.


Spawning children

In order to get process control working on your PHP machine, you need to configure it with "--enable-pcntl", re-build it, and re-install it.

Once that's done, you can get started writing process control scripts. The first thing we're going to do is look at the pcntl_fork() function, which takes no parameters and returns an integer. If you get back -1, the forking failed - we couldn't create a new process. If 0 is returned, the process has now been forked and you're now in the child process. If anything else is returned, the process has now been forked and you're now in the parent - the return value is the PID (process ID) of the child that was created.

Now, this is where people usually get confused - if 0 is returned, you're in the child, if anything larger than 0 is returned, you're in the parent. Yes, you can be in two places at once now because we have multiple processes. When you fork, what happens is that a complete copy of the existing process is made - including the values of all the variables previously used, and also the current position the PHP script is up to in terms of execution - and that copy becomes the child process. As a result, the child process starts executing from the line after the call to pcntl_fork().

As always, this makes more sense when you see it in code, so try this one out:

<?php
  $pid = pcntl_fork();
  if ($pid != -1) {
    if ($pid) {
      print "In the parent: child PID is $pid\n";
    } else {
      print "In the child\n";
    }
  } else {
    echo "Fork failed!\n";
  }
?>

When run, you should get something like this as output:

In the child
In the parent: child PID is 8616

But then running it again you might get this instead:

In the parent: child PID is 8618
In the child

As you can see, the position of the parent and child output is pretty much variable - it's down to whichever of them hits their echo statement first, and that is down to how much time they are given by the OS. The scheduler in Linux (particularly in 2.6) is very smart, so it's not really worth trying to outguess it unless you have a very firm grasp of kernel semantics. Heck, even then don't rely on it - the scheduler may change at any time, and you don't want to get into the mess of relying on exact timings.

Now, try editing the script to this:

<?php
  $pid = pcntl_fork();
  if ($pid != -1) {
    if ($pid) {
      print "In the parent: child PID is $pid\n";
    } else {
      sleep(1);
      print "In the child\n";
    }
  } else {
    echo "Fork failed!\n";
  }
?>

Running that back you'll now always find the parent's message is printed out before the child's, because as the child process is sleeping the parent continues on. However, there's more of interest in that script - here's the output I get on my machine:

[paul@hud-lxf lxf54]$ php fork2.php
In the parent: child PID is 8666
[paul@hud-lxf lxf54]$ In the child

Note that we get our command prompt back immediately after the parent's message is printed out, and that the child's message is printed after the command prompt? The reason for this is because the command prompt waits for the parent process to terminate before it re-appears, and the parent process terminates before the child process has printed its message out. So, the parent prints out its message, terminates, then the child process wakes up (after the command prompt re-appears) and prints out another message to the screen. This should amply demonstrate that even after the parent process is dead, the children live on.


Quadruplets

Forking one child is a cinch, as you can see, but how do you fork more than one? The problem here is that you need to be very careful with what process does the spawning - if you slip up, you might end up in a recursive spawning loop where hundreds of thousands of processes try to spawn and will probably drag your machine down pretty sharpish! The easiest way to avoid problems is to have the parent process do all the spawning, and this can be done using a simple loop. Try this out for size:

<?php
  for ($i = 0; $i < 4; ++$i) {
    $pid = pcntl_fork();
    if ($pid != -1) {
      if ($pid) {
        print "In the parent: child PID is $pid\n";
      } else {
        sleep(1);
        ++$i;
        print "In child $i\n";
        exit;
      }
    } else {
      echo "Fork failed!\n";
    }
  }
?>

This time the code loops around four times in total, but note there are three minor tweaks in the child code. The first two are that we increment $i by one in the child so that each child has a number from 1 to 4 and it prints it out, and the third is that "exit" is called immediately after the child process prints out its message. This last part is absolutely crucial: if the child process isn't terminated, it will go back through the loop itself and spawn more children, which will spawn more children, and more children. It's not infinite, though, because each child inherits the loop from its parent as well as the value of $i, so the loop doesn't last too long. If the ++$i line was excluded, the loop would last a little longer, but would still not be infinite - that's a /good/ thing.

So, if you want to create four child processes at once, the easiest way to do it is by using a tightly controlled loop - unless you're really confident of your own abilities, it's almost certainly best not to try to spawn processes from spawned processes.


Waiting for the kids

As we've seen, child processes are able outlive their parents, but what if you don't want that to happen? Well, parent processes are able to wait for children to finish through the pcntl_waitpid() function, which takes a process ID to wait for and a reference where the exit status can be placed. The exit status of a child is made up of an integer value returned back as well as how the child exited. To extract the actual return value from a status code, you need to use the pcntl_wexitstatus() function, which takes a status code as its only parameter and returns the return value of the child. So, with that in mind we can rewrite our original forking script like this:

<?php
  $pid = pcntl_fork();
  if ($pid != -1) {
    if ($pid) {
      print "In the parent: child PID is $pid\n";

      pcntl_waitpid($pid, &$status);
      echo "Back in parent\n";
      echo "Child exited with ", pcntl_wexitstatus($status), "\n";
    } else {
      sleep(1);
      print "In the child\n";
      exit(19);
    }
  } else {
    echo "Fork failed!\n";
  }
?>

Now, note that we are waiting for $pid and storing its return value in $status. When the parent hits that line, it will pause indefinitely until the child with that PID has returned - if has already returned, it will continue immediately. The child scripts waits for one second, prints out a message, then returns the value 19 (totally arbitrary, just to demonstrate how it works) and terminates. The parent, which was waiting for this child to terminate, wakes up again and stores the child's exit status in $status. Then another message is printed out ("back in parent") and its finishes off by printing out the child's return value using pcntl_wexitstatus(). On my machine, here's the output:

paul@hud-lxf lxf54]$ php fork3.php
In the parent: child PID is 8894
In the child
Back in parent
Child exited with 19

As you can see, the child executed fully while the parent was waiting, then returned data back successfully. Things get a little more difficult when working with multiple processes, as you will want to wait for all child processes to terminate before the parent can terminate. The easiest way to solve this problem is to count back from the loop used to spawn the children and use -1 as the first parameter to pcntl_waitpid() - that causes PHP to wait for any child to return, as opposed to specific PIDs.

Take a look at this script:

<?php
  for ($i = 0; $i < 4; ++$i) {
    $pid = pcntl_fork();
    if ($pid != -1) {
      if ($pid) {
        print "In the parent: child PID is $pid\n";
      } else {
        sleep(1);
        ++$i;
        print "In child $i\n";
        exit($i);
      }
    } else {
      echo "Fork failed!\n";
    }
  }

  if ($pid) {
    while ($i > 0) {
      pcntl_waitpid(-1, &$status);
      $val = pcntl_wexitstatus($status);
      echo "Child $val returned\n";
      --$i;
    }
    echo "Parent complete!\n";
  }
?>

Note first that the children now return their child number, but the main block of new code is at the bottom. First we need to check $pid, as we only want to wait if we have actually spawned any processes. Then, the parent enters a loop: while $i is greater than zero, wait for children. As $i has already been incremented to the number of children we have as part of the loop, this essentially reverse the loop and waits for all children to complete. Note that the -1 is in there for pcntl_waitpid(), whihc means the loop will wait for any chuldren and print out its value in the order it arrive. As a result, and again this shows just how unusual the scheduler may appear to be, here's what I got out of a test run with this code:

[paul@hud-lxf lxf54]$ php fork.php
In the parent: child PID is 8963
In the parent: child PID is 8964
In the parent: child PID is 8965
In the parent: child PID is 8966
In child 1
In child 2
In child 3
Child 3 returned
Child 2 returned
Child 1 returned
In child 4
Child 4 returned
Parent complete!

As you can see, children 1, 2, and 3 had all executed and returned even before child 4 was able to print out its message.


Smoke signals

A key aspect to IPC is the ability to handle signals sent from other processes, and this is fully available in the CLI SAPI. The primary function here is pctnl_signal(), which installs a callback function for a given signal, and takes two parameters: the signal to catch, and the function to call when that signal is received.

Now, if you're unsure what signals are, let me briefly explain. When you press Ctrl-C to halt a program, what is it that actually makes it halt? How about when you type "killall php" - what actually makes the PHP scripts halt? The answer is signals, and there are quit a few types. The most popular are SIGINT (interrupt), SIGHUP (hang-up), SIGTERM (terminate cleanly), and SIGKILL (terminate immediately, clean or otherwise), but there are many others. To give you an idea how the various signals are generated, try out this script:

<?php
  declare(ticks = 1);

  function signal_handler($signal) {
    switch ($signal) {
      case SIGTERM:
        echo "Caught SIGTERM\n";
        break;

      case SIGQUIT:
        echo "Caught SIGQUIT\n";
        break;

      case SIGINT:
        echo "Caught SIGINT\n";
        break;
    }
  }

  pcntl_signal(SIGTERM, "signal_handler");
  pcntl_signal(SIGQUIT, "signal_handler");
  pcntl_signal(SIGINT, "signal_handler");

  while (1) {

  }
?>

Before we go into the technicalities of how that script works, I first want you to run it. Now, try pressing Ctrl-C or Ctrl-\. Then try opening a new console up and typing "killall php". All being well you should have seen "Caught SIGINT" on the screen every time you pressed Ctrl-C, see "Caught SIGQUIT" on the screen every time you hit Ctrl-\, and seen "Caught SIGTERM" on the screen every time you typed "killall php". Now try typing "ps aux | grep php" into the other terminal to get the PID of the PHP process, and typing "kill -9 <PHP PID>". This time you should see "Killed" on the screen, and the script should terminate.

Running that little test should have shown that Ctrl-C sends SIGINT to the running script, Ctrl-\ sends SIGQUIT, "kill <PID>" and "killall php" sends SIGTERM, and "kill -9 <PID>" actually kills the script. The reason for the latter is because -9 sends the special signal SIGKILL. This signal cannot ever be overridden in our scripts - whereas SIGTERM means "I'd like you to terminate", SIGKILL means "you're getting killed whether you like it or not". You could, for example, use "kill -3 <PID>" to send the SIGQUIT signal, but pressing Ctrl-\ is much easier!

Now that you have an idea of what the script does, it's time to see how it works. The first line seems to call the declare() function - something we haven't covered before. Usage of declare is quite advanced, so we won't actually be covering it here beyond "you need this line at the top of your script if you want to use signals".

Next comes the signal callback function, signal_handler(). Note that it takes a signal as its only parameter, and when it gets called by PHP the signal that was received will get passed in. So, the contents of the callback function is simply a switch/case statement where we print out a relevant message according to the signal received - we'll be looking at this more in a moment, but for now continue on after the function. The three calls to pcntl_signal() set up SIGTERM, SIGQUIT, and SIGINT to call signal_handler() when the appropriate signals are received, and finally the program goes into an infinite loop to cause it to sit around waiting for signals.

Hopefully that should explain it all, but before you comfortable thinking you're the master of signals let me briefly re-iterate that SIGTERM and SIGQUIT are there as polite notices to your script that the user has asked them to shut down. As you saw earlier, our script currently prints out a message when the signals are received, but it doesnt't actually do anything about it, which is incredibly frustrating if you're an end-user trying to shut a script down. Generally speaking, you should use SIGTERM and SIGQUIT to clean up - close database connections, save files, etc - then shut down.

SIGINT, however, is a little different. Although it can be used to halt a program (and often is), it is also often used to say "user wants to know what's going on" or "stop doing the big time-intensive function, but keep running the program". It's your call what you do, but by default we strongly recommend you exit on SIGTERM, SIGQUIT, and SIGINT.


Future echos

We've only managed to touch on the very beginning of the special Unix functions in PHP, but I hope you agree that it makes for a much more advanced (and much more /standard/) application. I hope you can also see why much of this functionality isn't really a good idea for an Apache module! There's more to come, however - next month we'll be looking at how the special alarm signal works and also how POSIX functions let you program in PHP in much the same way as C.