PHP - The Monty Hall Problem

From LXF Wiki

Table of contents

PHP: The Monty Hall Problem

(Original version written by Paul Hudson for LXF issue 70.)


Having problems with the last tutorial? We show you how to prove these problems using brute force, with just a dash of brain power and a highly trained team of goats.


Anyone who has studied advanced mathematics has a deep - if somewhat unconventional - appreciation of beauty. In fact, we lovingly bestow names on certain types of numbers that show our full appreciation for their magic, such as harmonic numbers, semi-perfect numbers, perfect numbers, hyperperfect numbers, amicable numbers and, believe it or not, even weird numbers. But we love them all equally - except perhaps for the golden ratio, which deserves special veneration. Anyway, numbers are well loved because they are predictable, unchangeable and exact, and therefore can be relied upon for logic, building complex proofs and pretty much anything else except paying the bill when you take one out to dinner.

However, in the odd times that I venture into the outside world, it has come to my attention that the unenlightened masses that call themselves "normal" do not have the same respect and love for numbers that the rest of us have. Numbers scare them, and, as we all know, fear causes anger, anger leads to hate and hate leads to suffering. So, these people consider numbers the path to the dark side. How wrong they are! However, it's possible to at least glimpse their irrational terror: early in your numerical romance you too may recall being overwhelmed by their immense strength, or bamboozled by their exotic attractions.

Take, for example, the Monty Hall Problem. This is a relatively simple question that has caused many people great fear, at least until they come to terms with the magic of numbers. The problem goes like this: you are on a game show, and asked to choose one of three doors. Behind one door is an expensive car - the star prize - but behind the other two doors are goats. Having picked a door randomly, the host (who knows which door leads to the prize) opens one of the other two doors to reveal a goat. He offers you the chance to stick with your current, unopened door, or change to the other unopened door. Do you switch, or do you stick with your original choice?

The answer - naturally - is that you should always change, as there is a greater chance that the other door contains the car. Many people will disagree with this, and of course they are wrong and should be mocked publicly. But if they are someone you actually care about, it is much better to show them the error of their ways and spread some number love. So, this issue we're going to write a PHP script that proves the correctness of the "switch" decision in the Monty Hall problem so that you can show your friends just how wrong they are - you know you want to! If you were interested, the problem is named after Monty Hall, the host on an old US game show called "Let's Make a Deal", where something similar to this problem was done on most shows. In the real game, there were minor variations - occasionally the contestant wouldn't get the chance to switch, in which case they would more often than not be going home with a goat. Shame!


Enter, Monty

Before we write any code, it's important to understand why the Monty Hall problem (MHP from now on) causes people such anguish. There are a few other things we should also nail down now to avoid confusion: "win" means to pick the door with the car, "lose" is to get one of the two doors with a goat, the doors are numbered 1, 2 and 3, "host" is the game show host and "contestant" is, well, the game show contestant - duh. With that out of the way, let's analyse the problem from the point of view of a student just starting out at high school. In MHP you get to pick a door to win a prize, and because there are three doors and cheating is ruled out we can safely assume that there is a one in three chance of winning.

So, our high school student chooses Door 1. At this point, there is a 1 in 3 (1/3) chance that they have the correct door. The host knows where the car is, and so opens Door 2 to reveal a goat. This leaves Door 1 and Door 3 closed, and it's at this point that our student goes wrong, because they will now believe that there's an equal chance of Door 1 and Door 3 containing the car. "After all," they will explain, "there are two doors, no clues or cheating, so we can just follow the basic rules of probability - it's just the same problem again with one fewer door."

Of course, that's a whopping great big fallacy - to answer this problem you don't need to know about the basic rules of probability. Instead, you need to know about the /rules of probability when something else has already happened/. If you pick Door 1, it is correct to say that there is a 1/3 chance of having picked the door that leads to the car. However, it is equally correct to say that there is a 2/3 chance that the car is behind one of the other two doors. Now, when the host opens one of the other two doors (we'll say he opens Door 2), you need to stop - don't try to re-evaluate your probability. We know that there was a 2/3 chance of the car being behind one of the two doors you didn't choose. Now that the host has opened Door 2, /there is still a 2/3 chance of the car being behind one of those two doors/, except now you know that one of them contains a goat. This effectively gives Door 3 a 2/3 chance of holding the car. So, if Door 1 has a 1/3 chance of leading to the car, and Door 3 has a 2/3 chance, what do you do? Switch, of course!

php70-goatsncars.png-thumb.png (http://www.linuxformat.co.uk/images/wiki/php70-goatsncars.png)
Outcomes.

This is shown in the picture. If we presume you pick door 3, then there are three possible outcomes (labelled A, B and C). In possibility A, Door 3 contains a goat, and Door 2 will be opened by the host. If we stick with Door 3 we lose, but if we switch to the other door (Door 1), we win. In possibility B, Door 3 contains a goat, and Door 1 will be opened by the host. If we stick with Door 3 we lose, but if we switch to the other door (Door 2), we win. In possibility C, Door 3 contains a car, and either Door 1 or Door 2 will be opened by the host - it doesn't matter, so let's say he opens Door 1. That leaves Door 3 (our choice) and Door 2. If we stick with Door 3 we win, but if we switch to Door 2 we lose. So in possibilities A and B, we need to switch to win; in possibility C we lose if we switch. Therefore switching has a 2/3 chance of winning, and sticking has a 1/3 chance of winning.


Cracking the code

The way to empirically prove that Monty Hall works is to write a script that opens doors thousands and thousands of times, switching and not switching, to see what results we get. This calls for a PHP script that accepts user input (whether to switch or not; how many attempts we get), runs the calculations, then prints the results. In terms of code, we need three distinct parts: a function that runs a single Monty Hall problem; something that parses user input, runs the problem multiple times, tracks results and prints them out; and some HTML that allows users to specify the test to run. The whole thing is under 100 lines, but we're going to take it in chunks, starting with the easy bit: the HTML. Here goes:

<form method="post">
<p>Number of iterations: <select name="Iterations">
<?php
   /// generate all these using a loop, rather than lots of typing
   for ($i = 50; $i <= 10000; $i += 50) {
      if (isset($_POST["Iterations"]) && ($_POST["Iterations"] == $i)) {
         print "<option value=\"$i\" selected>$i</option>";
      } else {
         print "<option value=\"$i\">$i</option>";
      }
   }
?>
</select></p>
<!-- This sets the box to "No" if that was its previous value -->
<p>Should the contestant switch? <select name="DoSwitch"><option value="Yes">Yes</option> <option value="No"
<?php if (isset($_POST["DoSwitch"]) && ($_POST["DoSwitch"] == "No")) print  "selected"; ?>>No</option></select></p>

<p><input type="submit" value="Prove it!" /></p>
</form>

Okay, so I lied a little bit: it's not just HTML in there, but it definitely is the easiest part of the script. There are two comments in there to explain what's going on, but it's all very easy: it prints out a form for the user to choose how many times we should execute the problem, and whether the contestant should switch doors or not.

Moving on, we need to handle that form using PHP, then run the problem a suitable number of times. Here's the code for that:

   if (isset($_POST["DoSwitch"])) {
      /// only run this if the form has been completed
      $wins = 0;
      $losses = 0;
      /// run the function many times. This returns either "Win" or "Lose"
      for ($i = 0; $i < $_POST["Iterations"]; ++$i) {
         $result = MontyHall($_POST["DoSwitch"]);
         /// keep track of the results
         if ($result == "Win") {
            ++$wins;
         } else {
            ++$losses;
         }
      }
      /// print out the results
      print "<h1>The Amazing Monty Hall Empirically Proof Device</h1>";
      print "<p>Number of iterations: {$_POST["Iterations"]}</p>";
      print "<p>Contestant switches: {$_POST["DoSwitch"]}</p>";
      print "<p>Number of wins: $wins</p>";
      print "<p>Number of losses: $losses</p>";
   }

Again, very easy stuff: if the form has been submitted, set the initial score to zero, then run the function multiple times (passing in the "should the contestant switch doors?" variable). Each time the function runs, we catch the return value and alter the scores appropriately. Still with me?

Now on to the serious code: the MontyHall() function itself. As you've just seen, we need to make this accept a parameter to store whether the user wants to switch or not, but that's the easy part. Here's the rest:

   function MontyHall($DoSwitch) {
      /// make an array of the doors, then randomise its order
      $doors = array(1=>"Goat", 2=>"Goat", 3=>"Car");
      shuffle($doors);
      /// pick a door, then remove it from the $doors array
      $our_choice = array_rand($doors);
      $our_door = $doors[$our_choice];
      unset($doors[$our_choice]);
      /// now to "open" a door
      /// get the first remaining door, and check whether it's a goat
      if (reset($doors) == "Goat") {
         // first door in the array contains a goat,
         // eliminate it from the list of choices!
         // get the current key
         $cur_key = key($doors);
         unset($doors[$cur_key]);
      } else { // first door not a goat, so remove the next door
         next($doors);
         $cur_key = key($doors);
         unset($doors[$cur_key]);
      }
      /// should the contestant switch?
      if ($DoSwitch == "Yes") {
         /// yes, switch. Is the remaining door the car?
         if (reset($doors) == "Car") {
            // yes - we won!  
            return "Win";
         } else {
            // a goat :(
            return "Lose";
         }
      } else {
         // contestant shouldn't switch
         if ($our_door == "Car") {
             // we picked the right door first time
            return "Win";
         } else {
            // another goat! :(
            return "Lose";
         }
      }
   }

Again, there are comments throughout that should explain everything - I even used meaningful variable names, so there's no excuse! The one area you might get confused is with the handling of the $doors array. In the code I've used reset(), next() and keys() - functions that aren't used all that often. The first of them moves the array cursor (the internal pointer that tells the array which element is being read) to the start of the array, and returns the value of that element. So, our line "if (reset($doors) == "Goat")" means "go to the start of the $doors array, get its value and, if it is "Goat", then the condition is true". Then we use the key() function, which returns the key of the element that is currently pointed to by the cursor (ie, the first one, because we just used reset() to move to the beginning), and use that value to remove the element from the array. This is quite a big hoop to jump through, but remember that we don't have any other way of indexing in to the array because we don't know which value we removed. If the value of the first $doors element is not "Goat", we remove the second element. This is done by calling next(), which moves the array cursor forward one place.

Once we have removed the opened door, we now have the door in $our_door, and one element in the $doors array. The remaining question is: should the contestant switch? Of course, this was passed in as a parameter, so we just check that and act accordingly. So, if we've selected to switch, we return "Win" if the value in $doors is "Car" and "Lose" otherwise; if we've selected not to switch then we return "Win" if the value in $our_door is "Car" and "Lose" otherwise. Simple, right?

You're not done just yet: you should add some top-and-tail HTML code to present it nicely, but more importantly you should add a call to srand() near the top of the script. Without this, PHP might not bother randomising things much, and your results are likely to be very predictable between runs. Now all that remains is to go ahead and run the thing using your web browser of choice - copy the script from your home directory to /var/www/html (although this may vary according to your distro), and go for it. You may want to compare your script against mine, just to make sure it's in the right order.

That's it for this month, but hopefully you have learned a bit about numbers and probability, a bit about PHP's array functions, and a lot about how PHP can be used to test all possibilities in a problem to find the right answer. What's more, we did all that without the need for me to goat to town on goat jokes. Until next time!


Home improvement

Okay, it's time to stop kidding around and separate the sheep from the goats: if this has been enough programming for you, then use tiredness as your scapegoat and go off to enjoy the summer of numbers with your pals. But if you're a real programmer and don't let tricky hacking sessions get your goat, then its time to think about how to improve the script.

There are at least three changes you should try out to flex your coding muscle:

FUN: Make the script run /iterations/ number of tests with switching turned on, then /iterations/ number without, to save the user having to toggle the select box. These results should be kept separate on the screen.

TRICKY: Our script makes no attempt to print out any sort of result percentages based on the number of wins and losses. For simple values of /iterations/ this is quite easy - if there are 1000 test runs, then you can see what's going on at a glance. But if you do 19350 runs the answer is less immediately obvious. Calculate the win and loss percentages and display them in the output.

TAXING: MHP extends above and beyond a three-door problem. We could theoretically have a hundred doors, with 99 goats and a car - the host could close 98 doors after we choose ours, and we would have a 99/100 chance of winning if we switched. This can't be tested with our current script, but by adding an extra form element to let people choose it, passing that element to the MontyHall() function and constructing the array with many goats and adding a single car to the end, it can be done.

MAYHEM: Assume the game show host doesn't know which door contains the car - what happens to the chances of winning now, and why? Answers on a postcard to the usual address!