PHP - Regular expressions

From LXF Wiki

Table of contents

Practical PHP Programming

(Original version written by Paul Hudson for LXF issue 61.)


You've had it easy so far, but now we're onto regular expressions - send all hate mail our way!


The curious thing about regular expressions is that their name describes them perfectly. A regular expression, you see, is an absolutely regular way to express a particular search and/or replacement algorithm, which might make you think that regular expressions are, ahem, obvious - cue raucous laughter. Sadly, nothing could be further from the truth: there's an art to reading regular expressions that's akin to reading the stream of bits and bytes in The Matrix, and it's very easy to almost-illegible expressions that "just work".

I say this because we'll be using regular expressions this month, as we're going to be working solely on conditions. As it stands, every room shows the same information and the same links no matter what the player has done, which is pretty dull! The first thing we'll be doing this issue is to add regular expression parsing to our room information so that GMs have some control over what is shown depending on the current state of play.


Setting an example

Love 'em or hate 'em, regexes are powerful little things that can eliminate a lot of hard work. However, as I've mentioned already, it's very easy to rely on them so much that your code is no longer usable, so we're going to be using just one regular expression in our code and do the rest by hand. This is probably sub-optimal in terms of performance, but should boost maintainability a great deal!

Our current solution for handling room text calls the function parse_room(), which in turn just prints the text out raw. That probably seemed pointless up, but what it means to us right now is that to add special functionality for room information we just need to edit this one function - huzzah for code re-use!

First thing's first, though: we need to add some sort of example text that needs to be parsed so we've got an idea what we need to tackle. So, execute these three SQL statements in the interfict database:

INSERT INTO rooms (Game, Name, Info, SafeToRest, SafeFromEncounters, SafeToAttack TINYINT, CallTrigger,
CallTriggerOnce) VALUES (1, 'Small walk-way', 'At the side of the palace the weeds are overgrown and the
ground uneven.  [IFNOT VARIABLE PALACE_WINDOW_BROKEN]There\'s a large, old-looking stain-glassed window
on the palace building to your right.[END] [IF VARIABLE PALACE_WINDOW_BROKEN]There\'s a large hole
in the wall to your right where the window was.[END]', 1, 1, 1, 0, 0);
INSERT INTO rooms (Game, Name, Info, SafeToRest, SafeFromEncounters, SafeToAttack TINYINT, CallTrigger
 CallTriggerOnce) VALUES (1, 'The palace chapel', 'It\'s cool and dark here, with a faint smell of incense.
What used to be a fine example of classical stained-glass is shattered to the west, and glass is strewn on
the floor.', 1, 1, 1, 0, 0);
INSERT INTO rooms (Game, Name, Info, SafeToRest, SafeFromEncounters, SafeToAttack TINYINT, CallTrigger,
CallTriggerOnce) VALUES (1, 'Small walk-way', 'You climb back out of the chapel and into the walk-way
alongside, only to be jumped on by some palace guards.  After a brief struggle, one of them stabs you.
[SET GAMESTATE DEAD]You are dead.[END]', 1, 1, 1, 0, 0);

Those three make up two new physical rooms, and three logical rooms. The difference? Physical rooms are actual places in the game world, whereas logical rooms may duplicate themselves according to the will of the GM. This was mentioned back in the original design document (Practical PHP Programming, LXF57), so I'll just quote what I said back then:

Each room can link to other rooms so that players can move to and fro freely, although note that a link is not bidirectional. That is, if room A links to room B, room B does not necessarily link back to room A. This is helpful because the link might be players going down a trap door that closes, or, in more advanced adventures, GMs might create a clone of room A that's subtly different from the original (we'll call the clone room C) so that players can move from room A to room B, see that they can go back to room A (it's actually room C, but it looks like room A), and thus fall into a trap if they go back.

In that scenario, A and C are both logical rooms - they both represent one physical room, but only one of the two actually exists at a time. That exact scenario is recreated in the SQL code above: note that the first and third new rooms are both the small walk-way, but they have different text. As a result, we can use cunning linking to make it look like the player moves back to the same room they left a moment before, while actually transporting them somewhere else entirely.

The next step is to put those links into SQL, so here you go:

INSERT INTO links (FromRoom, ToRoom, ConditionType) VALUES (5, 3, 0);
INSERT INTO links (FromRoom, ToRoom, ConditionType) VALUES (3, 5, 0);
INSERT INTO links (FromRoom, ToRoom, ConditionType, ConditionVar) VALUES (5, 6, 1, 'PALACE_WINDOW_BROKEN');
INSERT INTO links (FromRoom, ToRoom, Condition) VALUES (6, 7, 0);

So, room 5 (the small walk-way) unconditionally links to room 3 (the palace) and vice versa, allowing players to walk to-and-fro as much as they like. However, the small walk-way leads through to the palace chapel only when the PALACE_WINDOW_BROKEN variable is set - more on that later. Finally, note that the palace chapel links back to the alternative walk-way room, where the player will later get mauled by the guards.

We'll come back to all this soon enough - let's look at parsing the room with some good ol' regular expressions!


The Pros and Cons of Regular Expressions

As you saw from the example, here is the kind of string we want to match:

[IFNOT VARIABLE PALACE_WINDOW_BROKEN]There\'s a large, old-looking stain-glassed window on the palace
building to your right.[END] [IF VARIABLE PALACE_WINDOW_BROKEN]There\'s a large hole in the wall to
your right where the window was.[END]

Each "command" block is made up of an opening element surrounded by square brackets, [ and ], followed by any text, then terminated by [END]. The first command in that sample is IFNOT VARIABLE PALACE_WINDOW_BROKEN, so it's an IFNOT condition that examines whether the variable PALACE_WINDOW_BROKEN is set. There is no "else" statement in our little command language, which is why the code sample above has both IF and IFNOT for the same variable

Anyway, what we want to do is grab a list of all the processing directives in each room, which means we need to have a regular expression like this: "search for a square bracket, followed by any number of alphanumeric characters or spaces, followed by a closing square bracket, then suck in everything up until the next opening square bracket, then look for [END]."

In regular expression language, that looks like this choice gem:

/\[[A-Za-z0-9 _]+\][^\[]*\[END\]|[^\[]*/

Working that into PHP, we get this:

preg_match_all("/\[[A-Za-z0-9 _]+\][^\[]*\[END\]|[^\[]*/", $room, $output);

That inputs the $room text and stores the output from the function in $output. By default preg_match_all() will make that parameter and array made up of the various subexpression matches in the regular expression, but as we have just one search we only need the first element therein.

So, it's time to take the results from that regular expression and parse it - are you ready for the monster size of the new function? Yes? Well, here goes!

function parse_room($room) {
    preg_match_all("/\[[A-Za-z0-9 _]+\][^\[]*\[END\]|[^\[]*/", $room, $output);
    $output = $output[0]; // we don't have any subexpressions!

    // loop through each match
    foreach($output as $textitem) {  

      if ($textitem{0} == "[") { // this is a processing instruction!  
        // find the end of the opening instruction
        $starttagend = strpos($textitem, "]") + 1; // this includes the terminating ]
        $instruction = substr($textitem, 0, $starttagend);
        $endtagstart = strpos($textitem, "[", $starttagend); // find the start of the ending instruction      

        // grab everything that is inbetween the start and end tags
        $body = substr($textitem, $starttagend, $endtagstart - $starttagend);

        // now, parse the instruction
        $instruction = substr($instruction, 1, -1);
        // note, that's -2 because we're starting from 1 rather than 0 and we want to end 1 short.

        // split the instructions up into parts
        $instruction = explode(" ", $instruction);
        $DOEXECUTE = false;
 
        switch ($instruction[0]) {
          case "IF":
            switch ($instruction[1]) {
              case "HOLDING":
                // if this person is holding an item, do nothing; this is left blank for now, as are quite a few others below
                break;

              case "VARIABLE":
                // this sets $DOEXECUTE to be true if there are any rows returned in a query checking whether a variable is
		// set for the character.  If the variable is set, one row will be returned, and so $DOEXECUTE will be true.
                $DOEXECUTE = mysql_num_rows(mysql_query("SELECT ID FROM variables WHERE CharacterID = {$_SESSION['IF_CURRENTCHAR']}
		AND Variable = '{$instruction[2]}';"));
                break;
            }

            break;

          case "IFNOT":
            switch ($instruction[1]) {
              case "HOLDING":
                break;

              case "VARIABLE":
                $DOEXECUTE = !mysql_num_rows(mysql_query("SELECT ID FROM variables WHERE CharacterID =
		{$_SESSION['IF_CURRENTCHAR']} AND Variable = '{$instruction[2]}';"));
                break;
            }

            break;

          case "SET":
            switch ($instruction[1]) {
              case "GAMESTATE":
                break;

              case "VARIABLE":
                break;
            }

            break;

          case "UNSET":
            switch ($instruction[1]) {
              case "VARIABLE":
                break;
            }

            break;

          case "GIVE":
            switch ($instruction[1]) {
              case "CASH":
                break;

              case "ITEM":
                break;
            }

            break;
        }

        if ($DOEXECUTE) {
          echo $body;
        }
      } else {
        echo $textitem;
      }
    }
  }


Gulp!

Yeah, that's a huge code block - but I've commented it thoroughly so it should be easy to grasp. There are quite a few bits in there where the code is left out - you're welcome to add these bits yourself, as the hard work is already done. What we have now is a room parsing function that will only show the correct text for our room conditions, however there are still a few tasks left to do:

  • If you haven't already done so, make you create the variables, triggers, and events tables.
  • We need up to update the code that shows links between rooms
  • We need to add an object that allows players to break the palace window
  • We need to tie that object up to a trigger that calls events to set the variable PALACE_WINDOW_BROKEN.

Let's tackle them in that order, starting with #2. The current code for printing out movement options looks like this:

  $result = mysql_query("SELECT r.ID, r.Name FROM links l, rooms r WHERE fromroom =
  $sess_IF_CURRENTLOCATION AND r.ID = l.ToRoom AND l.ConditionType = 0;");
  while ($r = mysql_fetch_array($result)) {
    extract($r, EXTR_PREFIX_ALL, 'link');
    echo "<A HREF=\"game.php?RID=$link_ID\">$link_Name</A><BR />";
  }

So, it pulls out all links from the current room that have no condition attached to them. We want to extend that so that it also shows rooms that have a variable check condition attached, so add this code before that shown above:

$result = mysql_query("SELECT r.ID, r.Name FROM links l, rooms r WHERE fromroom = $sess_IF_CURRENTLOCATION
AND r.ID = l.ToRoom AND l.ConditionType = 1 AND l.ConditionVar IN ($IF_CURRENTVARS);");

  while ($r = mysql_fetch_array($result)) {
    extract($r, EXTR_PREFIX_ALL, 'link');
    echo "<A HREF=\"game.php?RID=$link_ID\">$link_Name</A><BR />";
  }

  $result = mysql_query("SELECT r.ID, r.Name FROM links l, rooms r WHERE fromroom = $sess_IF_CURRENTLOCATION
AND r.ID = l.ToRoom AND l.ConditionType = 2 AND l.ConditionVar NOT IN ($IF_CURRENTVARS);");

  while ($r = mysql_fetch_array($result)) {
    extract($r, EXTR_PREFIX_ALL, 'link');
    echo "<A HREF=\"game.php?RID=$link_ID\">$link_Name</A><BR />";
  }

If you try that code straight away your screen will get flooded with errors because we're using a variable, $IF_CURRENTVARS, that isn't defined yet. As you can see, that variable makes the queries nice and small, and so it's well worth the little bit of extra code necessary to set it up. All we need do is set up an array that contains a comma-separated list of variables that are set, so the code needs to look like this:

  $IF_CURRENTVARS = array("'-1'");

  $result = mysql_query("SELECT Variable FROM variables WHERE CharacterID = $sess_IF_CURRENTCHAR;");

  while ($r = mysql_fetch_array($result)) {
    extract($r, EXTR_PREFIX_ALL, 'var');
    $IF_CURRENTVARS[] = "'$var_Variable'";
  }

  $IF_CURRENTVARS = implode(",", $IF_CURRENTVARS);

So, what that does is set up an array with just one element in: -1. This is important, because it means it's a valid array that can be passed into the MySQL query looking for links. If we didn't have this, and there were no variables set, $IF_CURRENTVARS would be empty and the SQL query would be borked. Once we have a basic array set up, we then search through the database and add to the array each variable this character has set. Once we have a complete array of all the variables set, we then implode() it using commas so we get a single string separated by commas that can be put into the SQL query to get links.

Put that code just above the line "if $char_GameState == -1", save, and we'll continue.


Music with rocks in

Ask any cat burglar and they'll tell you there are dozens of ways to break into someone's house, but the way I've chosen to do it is with a humble rock. Players pick up the rock, use it next to the window, and that should smash it and leave a way through.

The first step in doing this is to create the trigger and event that handles breaking the window. If you recall, triggers are what are called by the game engine when something needs to happen, and each trigger can have multiple events. For example, we want to have a trigger that breaks the window when the rock is used, so we'd need to have an event that adds the variable PALACE_WINDOW_BROKEN. The idea of having triggers /and/ events is that sometimes you might want to set a variable, create an object, add some health to the player, etc, all with one trigger.

Here are the two SQL lines to add a new trigger and a new event that will break the window:

INSERT INTO triggers (Game, Name) VALUES (1, 'Break the palace window with a rock');

INSERT INTO events (Trigger, Action, EventType) VALUES (1, 1, 'PALACE_WINDOW_BROKEN');

The name of the trigger is essentially irrelevant: it will be used in the admin console when we make it so that people can easily recognise triggers when creating new events. As these are the first two triggers and events you have added (if not, delete and recreate the table to make them so), the Trigger field for the new event is 1, linking it to the new trigger. We can now use that in creating a new item, our rock:

INSERT INTO items (Game, Name, ShortDesc, Type, DamageMin, DamageMax, Info, UseWrong, UseRight, UseRightTrigger, LocationFound,
LocationUsed, DeleteOnUse, SellWorth, FightingAdjust, DefenceAdjust, MinLevel, CreateWhenState, AutoStart)
VALUES (1, 'Rock', 'A rock', 1, 1, 3, 'The rock is small, smooth, and roundish.', 'You throw the rock on the floor.',
'You glance furtively around you, and, seeing no one, you hurl the rock through the stained-glass window.
It shatters noisily and you hear the guards from around the front of the palace clattering this way.',
1, 2, 5, 1, 0, 0, 0, 1, 1, 0);

That is a truly illegible piece of SQL, but it's to be expected because we're doing all the SQL by hand right now. So, it's in game #1, it's a rock, it's a weapon (type 1), it does between 1 and 3 damage when used, then there's the description and some text for when the rock is used in the wrong place (that is, anyway except in the little alley next to the palace). The next text string ("you glance furtively around") is for when the rock is used corrected, and it's followed by the ID number of our new trigger, 1. So, when the rock is used next to the window, that text is printed, and trigger 1 is executed. The LocationFound value is 2, which is the main street, the LocationUsed is 5, which is the small walk-way, and DeleteOnUse is set to 1, which means the rock is removed from the game after it's thrown. This is important, because to be realistic (ahem) we should recreate the rock inside the palace. The rest of the SQL is predictable enough.

So, we've got our trigger now, and we've got our rock. If you now go back into the game through your web browser and create a new character you should be able to see the rock. Pick it up, go to the palace, then the small walk-way, then use the rock - the window should smash, a new link through to the chapel should appear. If you walk through you'll be in the new room, and if you go back to the walk-way again you should die. If you look at the code for the room parsing, there's nothing in there that handles game state setting - you're welcome to add something like that yourself.


Easing the pain of regular expressions

php61-shot-1.png-thumb.png (http://www.linuxformat.co.uk/images/wiki/php61-shot-1.png)
RegExplorer: good for quick and easy use, but limiting for harder expressions.
php61-shot-2.png-thumb.png (http://www.linuxformat.co.uk/images/wiki/php61-shot-2.png)
Regex Coach: Looks like the back of a bus, but power-packed for regex gurus.

Alright, so perhaps "/\[[A-Za-z0-9 _]+\][^\[]*\[END\]|[^\[]*/" doesn't just roll off the tongue, but there are quite a few regular expression helper tools around that can make your life easier.

One such tool is called RegExplorer (requires Qt), which is very simple to learn and does the job quickly and easily. More advanced users might want to try The Regex Coach (requires Motif) - it looks ugly, but is much more powerful than RegExplorer. Not only does the Coach tell you what the expression means in plain English (or, at least, slightly simpler English!), but it can also draw a tree of how the expression works - it's surprising how much easier even the hardest regex looks when it's been drawn out as a tree.