Perl - Flow control

From LXF Wiki

Table of contents

Perl Tutorial part 3

(Original version written by Marco Fioretti for Linux Format magazine issue 71.)


We show you how to make Perl grow up and take its own responsibilities.


No script is useful if it cannot choice among many alternatives, grab its own data from the hard drive or leave in a file a clear report of what it did. For this reason, this time we will start by summing up the several Perl flow control structures. File I/o, that is how to read and (orderly!) write data from and to external files or programs, will be covered immediately after.

When Perl speaks like us

Perl syntax can be messy, but sometimes it is closer to real language that you'd think. By default, if-then statements must be written in a form very similar to the C language, with curly braces enclosing all the corresponding commands. Simple conditional commands, where only one instruction must be executed if something happen, can be written almost in the same format as English:

print “SORRY, no more money” if ($ACCOUNT_BALANCE <= 0);
print “Access Denied!\n”     unless ($WHO_IS_AT_THE_DOOR eq 'Me');


Flow control

The short story is that Perl has, more or less, almost all the flow control operators of the C language, even if some of them have different names and syntax. If-else statements, for example, look just like you would expect:

if ($LEFT_MONEY == 0) {
    print “Find a job, or else!\n”;
} else {
    print “Quick! Spend some money!\n”;
}

They can be nested at will with the contracted elsif keyword:

if (MONEY > 50) {
    print “Ask mum a little extra money\n”;
} elsif ((MONEY > 0) && ($MONEY <= 50)) {
    print “Ask mum MORE money!!\n”;
} else {
	print “MUM! I'm in trouble!!!\n”;
	}

An alternative conditional form is the one using unless:

unless ($MONEY > 1000000) {
    print “Better ask mum some money...\n”;
}

which basically means the opposite of “if”, that is “ask for money unless you already have more than 1000000 pounds”.

The while statement can assume the following two forms:

while ($MONEY < 1000000) {
    print “Ask for more money...\n”;
}
# or (same effect):
do {
    print “Ask for more money...\n”;
} while ($MONEY < 1000000);

and is never executed if you already had more than 1000000 pounds when the script started. Therefore, if you want to be sure that whatever is in the curly braces happens at least one time in the program, use the complementary until instruction:

do {
    print “Ask at least 10 more pounds...\n”;
} until ($MONEY > 1000000);

Cycles can be built with the equivalent for and foreach keywords, which we already met in the two first part of the tutorial. They are conceptually equivalent. The main difference, as far as readability and common practice are concerned, is that the second is normally preferred for looping over arrays or hashes:

for ($i = 0; $i < 100; $i++) {
    #do something 100 times
}

foreach $JEDI_KNIGHT ( keys %JEDI_DIRECTORY ) {
    print” The phone number of $JEDI_KNIGHT is $JEDI_DIRECTORY{$JEDI_KINGHT}{'Phone number'}\n”;
}

In the second case the $JEDI_KNIGHT is loaded at each iteration with the value of another key of the array. If it is omitted, the values of the several keys go in the ever present $_ built-in variable.


File test operators

Scripts would be useless if they lived closed in themselves, without ever reading data from some files or leaving track in other ones of what they did. To avoid problems, however, before doing anything with a file it is very handy, when not downright necessary, to check if it has just the properties we expect from it. Perl provides several operators to perform this kind of checks. The following is just a short summary: a complete list is available online at www.unix.org.ua/orelly/perl/prog3/ch03_10.htm:

if (-e $FILE) # if $FILE exists....
if (-r $FILE) # if $FILE exists and is readable
if (-d $FILE) # if $FILE exists and is a directory


Dealing with files

Reading and writing to files in Perl is pretty simple. You must simply open and close in the proper way, that is with the homonym functions, every file you access:

open( FILE_HANDLE, name_of_file) || die "Oh my, what happened to my file!!!\n" ;
#do whatever you want with the file
close(FILE_HANDLE);

The die function terminates a script printing on its standard error stream the string given as it arguments. In the first instruction above, the OR operator (the double pipe character) makes the script actually die only if the file could not be opened. The usage of die is not mandatory, but forgetting it in these cases is one of the best ways known to man to waste countless hours wandering why nothing worked.

The first string passed to “open” is a file handle, that is an identifier used by Perl as a pointer to the real file, whose name is contained in the last argument. This second string normally starts with a character specifying in which mode the file must be opened: reading, writing, or appending:

open(MY_FILE, "<  $SOME_FILE_TO_READ") or die("Could not read from $SOME_FILE_TO_READ\n") ;
open(MY_FILE, ">  /home/mylogin/some_file_to_write") or die("Could not write to your file\n") ;
open(MY_FILE, ">> $SOME_FOLDER/$SOME_FILE") or die("Could not append MORE data to $SOME_FOLDER/$SOME_FILE\n") ;

Be careful with write mode: if the file opened in this way already contained some data, these will be happily overwritten (of course some times this might be just what you want). When previous data must be preserved, append (third instruction above) is the way to go. Should you omit the mode, Perl will just do the safest thing, that is opening the file in read mode.

You can, of course, use scalar variables to build on the fly file names and paths, as shown in the first and last commands. Actually, just do it that way every time. If the file name is not known before the script starts (like when it's read from somewhere else) this is the only possibility. Even when it's fixed, however, if the file name is into a variable, and this is used many times in the script, you'll only need to change one line if the file must be changed.

Hey, we left the file opened. How do we write or read it? In the first case just use the print statement, which you already know, or its printf cousin if you need to format the output in a more controlled way. To see the difference between the two commands, save the short script below into a file:

#! /usr/bin/perl

$MIDICLORIAN_RATE = 4533233.434;
$JEDI_QUOTE = 'The Force is strong with this one!';
open(TEST, "> testfile.txt") || die "Cannot open testfile.txt\n";
print TEST "$MIDICLORIAN_RATE: $JEDI_QUOTE;\n";
printf TEST "%5.2f: %20.30s;\n", $MIDICLORIAN_RATE, $JEDI_QUOTE;
close (TEST);

then run it several times, using every time different values for $STRING and $NUMBER, and check inside testfile.txt how the formatting changed. All those odd substrings starting with the % character tell to printf how to format the text: %5.2f means “print $MIDICLORIAN_RATE as a floating point number, with two decimals”. The other, “%-20.30.s”, stands for “print $JEDI_QUOTE as a string, in at least 20 characters but not more than 30, right justified”. The complete listing of printf options is included in the Perl documentation.

As far as reading is concerned, the two most common options are to load one line at a time and use it, or to slurp, with only one command, the whole text in some array, where it will remain available for later processing:

# load files one line at a time....!
open(TEST, "< testfile.txt") || die "Cannot open testfile.txt\n";
while(<TEST>) {
# The current line is loaded in the $_ default variable
$LINE = $_ ;
#do something with $LINE;
}
close (TEST);

# ...or save their content in @ALL_THE_LINES
open(TEST, "< testfile.txt") || die "Cannot open testfile.txt\n";
@ALL_THE_LINES = <TEST>;
close (TEST);


//CROSSHEAD/// System pipes

It isn't a secret that the power of Unix comes also from the capability to combine many small tools in cascade, obtaining one powerful program. This can be accomplished even from within Perl Scripts. The trick is to use, instead of a file name with an opening mode, a pipe symbol before or after the external program(s) to be executed:

# Use the output of program_1
open(README, "program_1 |")  or die "Could not open program_1\n";

# Send your data to program_2
open(README, "| program_2")  or die "Could not open program_2\n";

In a nutshell, the pipe makes you open _programs_ instead of _files_. Nice, uh?


Formatting whole reports

The printf function is made on purpose to print a mix of variables and constant text formatted just like you want, but has one serious limitation. Unless you really torture it (and yourself...) it is really not practical when you have to format lots of stuff on many lines, possibly a whole page. Luckily Perl, being born as a Report Language, has a couple of solutions made to order just for these cases.

Imagine a generic phone book script, whose first part retrieves $NAME, $STREET_NAME_AND_NUMBER, $CITY, $POSTAL_CODE of each record from some database. The second, and the only one on which we will focus here, must print these variables to the terminal in the most compact and readable way.

The first solution is to use the Perl equivalent of shell “HERE documents”. You just place all the variables in a basic template, marking where it begins and ends with some constant string in this way:

print <<END_OF_RECORD;
Name: $NAME
Street: $STREET
$CITY           $POSTAL_CODE      $COUNTRY
END_OF_RECORD

This is as simple as it gets. Whenever this print statement is executed, it will print all the text till END_OF_RECORD, inserting in each line the the current values of the requested variables. The only problem is that no justification is possible, so every record will have $STREET, $COUNTRY and so on aligned in a different way and the result will look everything but pleasant.

This limitation can be overcome with two commands called format and write. The first is used to specify exactly where each variable must go and how much space it can fill. This is achieved alternating so-called picture lines, defining variable fields, with argument lines listing which variables should be used:

format STDOUT_RECORD =
Name: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$NAME
Street: @>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
$STREET
@<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<  @>>>>>> @>>>>>>>>>>>>>>
$CITY, $POSTAL_CODE, $COUNTRY
.

The bunch of hieroglyphics above defines a template, called STDOUT_RECORD, which is made of three picture lines, the only ones which will be printed. The difference with respect to the first case is that, this time, every variable comes with its own formatting instructions. The first line, for example, has one @ character followed by 39 < ones. Darn ugly? Maybe, but it simply means that the following variable ($NAME) must be written there, starting from the @ position, in no more than 40 characters, left justified. The $POSTAL_CODE, using the same syntax, will always go in a 7 characters, right justified field in the third row, always starting at the same column.

This is only the template definition. Think to it as drawing on a piece of paper the boxes reserved to each single variable. When it's time to use it, that is to actually print one record, this is the way to go:

$~ = "STDOUT_RECORD";
write;

or, in human language: “write to STDOUT using the whole template called STDOUT_RECORD”. The Perl format command has many other fields available: an hash (#), for example, is used to define aligned numeric fields and a caret (^) multi-line text blocks within the template. Centred text is defined using vertical bars (|) instead of > or < signs. Again, study the online documentation to learn all the gory details.