PHP - Handling tar files part 2

From LXF Wiki

Table of contents

Practical PHP Programming

(Original version written by Paul Hudson for Linux Format magazine issue 43.)

We finish production of a PHP extension to handle tar files.

php43-screenshot1.png-thumb.png (http://www.linuxformat.co.uk/images/wiki/php43-screenshot1.png)
Syntax highlighting IDEs make the world go round. Not that it ever was any other shape...


Welcome back, PHPers! Part one of this mini-series saw us creating a basic extension using ext_skel, modifying the m4 file, and finally compiling smoothly with the rest of PHP. In part two, the extension was, uh, /extended/ to include support for the first new function - tar_list() - to return in an array the names of all files in a tar archive, and we also looked at what made up a zval. In this concluding part to the mini-series, we'll be adding the final two tar functions, tar_add() and tar_extract(), then reviewing how the PHP extension process works.

If you didn't quite grasp how tar_list works, it is strongly recommended that you go back and re-read last issue, as we'll be building on much of the same knowledge in this issue. If you've made it this far you're onto the home straight - you're already able to create and compile your own extensions, we're just going to be looking at a few more advanced topics this issue.


Tarred and feathered

Of the two functions left for us to implement, tar_extract() is definitely the easiest, so we'll be starting with that. The function prototype we'll be working to is:

bool tar_extract(string tarfile, string location)

which means that tar_extract() takes the name of a tar file as parameter one, where it should extract the tar archive to as parameter two, and returns boolean true if it was successful or boolean false if there was an error. As always with PHP, the first changes to make are to declare the functions inside the extension, so we'll get that out of the way first.

Open up php_tar.h (you may want to take backups of all your existing files, in case things go wrong!), and look for the line "PHP_FUNCTION(tar_list);". Below that, you need to add two more lines to define the new functions we'll be adding, so it should end up looking something like this:

PHP_FUNCTION(tar_list);
PHP_FUNCTION(tar_add);
PHP_FUNCTION(tar_extract);

You'll need to edit tar.c also, so that the tar_functions list array looks like this:

function_entry tar_functions[] = {
	PHP_FE(tar_list,	NULL)
	PHP_FE(tar_add,	NULL)
	PHP_FE(tar_extract,	NULL)
	{NULL, NULL, NULL}
};

With those changes, we're all set to go ahead and program the tar_extract() function. You can use tar_list() as your base to save yourself quite a bit of typing, because large amounts of the functions are the same. Enter in this function below tar_list in tar.c:

PHP_FUNCTION(tar_extract)
{
	TAR *t;
	char *tarfile = NULL;
	int tarfile_len;
	char *location = NULL;
	int location_len;

	if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "ss", &tarfile, &tarfile_len, &location, &location_len) == FAILURE) {
		return;
	}

	if (tar_open(&t, tarfile, NULL, O_RDONLY, 0, TAR_GNU) == -1)
	{
		php_error(E_WARNING, "%s", strerror(errno));
		RETURN_FALSE;
	}

	if (tar_extract_all(t, location) != 0)
	{
		php_error(E_WARNING, "%s", strerror(errno));
		RETURN_FALSE;
	}

	if (tar_close(t) != 0)
	{
		php_error(E_WARNING, "%s", strerror(errno));
		RETURN_FALSE;
	}

	RETURN_TRUE;
}

If you compare that to tar_list, you'll see the changes are two new variables to hold the extra parameter, a minor change to the call for zend_parse_parameters() also to handle the extra parameters, and the "while ((i = th_read(t)) == 0)" look being replaced with an if statement. The two new variables are the char* /location/ and the integer /location_len/ - these are to store the second parameter string (where to extract the tar file contents to) and its length. The third parameter (not the second, remember that TSRMLS_CC is a macro that has a comma in) is now "ss" rather than "s", because we're expecting two string parameters being passed in - hence the addition of &location and &location_len.

The key change, of course, is the new if statement: if (tar_extract_all(t, location) != 0). Previously we called th_read() to loop through each file in the archive, but this is not necessary as libtar provides one function to extract all the files: tar_extract_all(). tar_extract_all() takes two parameters, which are the TAR* file that is an opened archive, and the location to extract the files to - we use the TAR* we declared earlier, and the char* location that, hopefully, our user passed in. If there was a fatal error extracting the files, tar_extract_all() will return -1, and if this happens we use the PHP macro RETURN_FALSE to return boolean false from the function and exit.

If the tar file is opened, extracted, and closed without hitch, we use RETURN_TRUE to return boolean true from the function, and we're done - that was easy! It won't compile just yet, because we've added the definition for tar_add() but have yet to add the implementation. Let's do that now...


Adding to the tar pit

We've now got tar_list() and tar_extract() working, which just leaves one function: tar_add(). If you recall last issue, you'll remember the prototype given was this:

bool tar_add(string tarfile, string file_to_add)

We'll be changing that just slightly, to this:

bool tar_add(string tarfile, array files_to_add)

This change is for two reasons: firstly, most tar files contain more than one file, simply because there's no point having it otherwise. Secondly, it gives us the chance to look at how arrays are handled in extensions. Thirdly, if users pass an array of one item, it works the same way anway - everyone's a winner!

Here's the first draft of code we'll be working with:

1	PHP_FUNCTION(tar_add)
2	{
3		TAR *t;
4		char *tarfile = NULL;
5		int tarfile_len;
6		zval *arr;
7		zval **entry;
8		HashTable *harr;

9		if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "sa", &tarfile, &tarfile_len, &arr) == FAILURE) {
10			return;
11		}

12		if (tar_open(&t, tarfile, NULL, O_WRONLY | O_CREAT, 0644, TAR_GNU) == -1)
13		{
14			php_error(E_WARNING, "%s", strerror(errno));
15			RETURN_FALSE;
16		}

17		harr = HASH_OF(arr);

18		for (zend_hash_internal_pointer_reset(harr); zend_hash_get_current_data(harr, (void**)&entry) == SUCCESS; zend_hash_move_forward(harr)) {
19			convert_to_string_ex(entry);

20			if (tar_append_file(t, Z_STRVAL_PP(entry), Z_STRVAL_PP(entry)) != 0) {
21				fprintf(stderr, "tar_append_file: %s\n", strerror(errno));
22				return -1;
23			}
24		}

25		if (tar_close(t) != 0)
26		{
27			php_error(E_WARNING, "%s", strerror(errno));
28			RETURN_FALSE;
29		}
30	}

As you can see, this code is quite a bit more complicated than both tar_list() and tar_extract(), because we need to deal with PHP's array objects. Internally, a PHP array is a hash - a set of data with a key and a value each. If you were curious, an object is also a hash internally, and works in much the same manner as arrays. The functions zend_hash_* all, unsurprisingly, manipulate hashes inside C code, and there's a massive collection of hash-related functions available for you to use - check out zend_hash.h for a list.

On lines 6 and 7, two zvals are declared to hold the new information that will be passed in. The zval /arr/ will hold the array zval in its entirety (see the box in last issue, "Anatomy of a zval"), /entry/ will hold an individual value from inside the array, and /harr/ is declared to hold the hash table of our array. The hash table holds the actual array data - keys and values - of a zval, which is they key information we want.

Line 12 is slightly different to our previous calls to tar_open, because this time we open the file as O_WRONLY (write only) ORed with O_CREAT (create if non-existant), and we also pass in 0644 as parameter five, which sets the file to read/write for us, and read for others. If you find you have no permissions to modify the tar file created by PHP scripts, make sure you don't have 0 for this parameter!

On line 17 we use the macro HASH_OF to take the hash from our array, /arr/, and place it into /harr/. HASH_OF, found in zend_API.h, simply returns the hash table of a given zval, whether that be an array or an object, From there, we hit the complicated part of the function: iterating through the array. We use three functions to form our loop: zend_hash_internal_pointer_reset(), zend_hash_get_current_data(), and zend_hash_move_forward(). As you know, /for/ loops take three arguments - definition, condition, and action - which means that this loop starts by resetting /harr/, sets /entry/ to the current value the internal pointer of harr is at, and moves the pointer forward one place, iterating through the array, as its action. The end result is that the loop goes from the start of the array to the end of the array, picking out the values as it goes.

Inside the array, the first thing we do is call convert_to_string_ex(), which is another Zend macro (although not prefixed with zend_ like many of the others) that makes sure a given zval has its type set to IS_STRING. It takes a pointer to a pointer to a zval, which is how we declared /entry/, so all is well.

Lines 20 to 23 handle the tar-related code, and uses a new function, tar_append_file(), to do the hard work. tar_append_file() takes three parameters - the TAR* file to write to, the name of the file to add, and the name the file should have within the archive. As with the other two tar functions, we're using t for our TAR* file, so parameter one is a doddle. Parameter two and three are the same as each other - we read files in and store them using the same name. Z_STRVAL_PP is a macro to a macro that returns the string value of a zval. In last issue I mentioned the macro Z_STRVAL, which returns the string value of a zval. Z_STRVAL_PP uses the Z_STRVAL macro, but also deferences the pointer pointer, saving two characters of typing - sometimes programming with PHP can be quite confusing thanks to the extreme use of macros!

If tar_append_file() fails, it will return -1 - we catch this and return an appropriate error message. So, the loop as a whole iterates through the array passed in, adding each file in the array to our tar file - quite straightforward once deciphered. Lines 25 to 30 run the usual tar_close() routine to clear up before the function exits.

That's the new function complete, so we can now go ahead and recompile PHP to take advantage of the new functions tar_add(), and tar_extract(). Once it's recompiled, you can test the new functions out by creating a few dummy files, and running a basic script along the lines of this:

<?php
  $array[] = "index.html";
  $array[] = "index.html2";
  $array[] = "index.html3";

  tar_add("mytar.tar", $array);
  tar_extract("mytar.tar", "foobar");
?>

As you can see, we add three dummy HTML files to an array, then pass it into tar_add with parameter one being "mytar.tar" to create a new tar archive, then extract the new archive to the directory foobar. Quick, easy, and powerful - once all the hard work is done, it's easy to re-use your code.


Extensions Round up

OK, it did take three issues worth of explanation, but the end result is that we've managed to create an all-new PHP extension that adds something valuable to the language, and it really wasn't that hard after all. Using ext_skel, it's a cinch to get started with extensions because all the drudgery is done for you - to get going, all you really need is an idea and a few man pages discussing how a library works. Once ext_skel has done its work, it's merely a matter of planning out the functions you want to write - most of the things that sound complicated, such as making sure variables are passed in in the right order, are done automagically by the Zend Engine. The more you write for PHP, the more you appreciate the power and freedom the Zend Engine gives you - as long as you're smart, there's little need to worry about memory leaks, user errors, and such.

Once you're able to get around the PHP source code easily, you'll also find it gives you a greater understanding of the language in general, increasing your skill level at writing PHP scripts. So much goes on behind the scenes in PHP that it's nearly always worthwhile checking how it works in the source code to make sure you're not calling code that's more complicated than you realise. Another good reason to be competent with the PHP source code is that you no longer have to worry about manual pages being out of date. Some pages still have "no description available" or flat out incorrect data - once you're able to jump into an extension and see for yourself what it does, this becomes a non-problem.

If you're stuck for an extension to write, just take a look through the lib directory on your PC to get some ideas. Debian users can type *apt-cache search ^lib* to get a good list of possiblities, or, if you're an RPMer, try something like *rpm -qa | grep ^lib*. As you can see looking through the PHP documentation, people have written extensions for all sorts of weird and wonderful things - from writing GUIs using GTK to writing Java apps with, well, Java.

You might find it easier to try making the libtar extension more powerful - there are several other functions in libtar that we haven't looked at at all here, such as using gzip to make tar.gz files, and this might be a good starting point for the slightly less adventurous. There's a lot that can be added to the extension to make it more flexible and user friendly - adding files that don't exist generates an unhelpful error message, there's no checking to make sure a tar file doesn't exist before it's created, or, worse, that it's been created by other users and we're likely to get a "permission denied" error back. Try to be flexible - libtar is capable of lots, yes, but there's even more you can do (adding entire directories in one call, anyone?) just by playing around with new ideas.


Conclusion

Extension writing isn't hard, as I'm sure you can now see. The PHP system comes bundled with a massive array of macros that are there to make life easier for those wanting to jump into the language. Sometimes they make life a little hard - particularly when you get compilation errors that are nigh impossible to track down thanks to macros being used so extensively. However, on the whole adding to the PHP language can be fun and rewarding - you get to extend your favourite language with new features otherwise only available to locally compiled programs, and you get to learn a lot about the internals of your system to boot. Marvellous!

Delving into extensions is probably the hardest thing we'll be covering in this series - partially because there are few things more difficult than writing extensions, but partially because we only received a couple of tutorial suggestions, which suggests that I may have taught you all you need to know, young grasshopper!


Questions from the forum

There has been some discussion on the LXF forums regarding choosing a decent PHP editor - it might sound easy at first, but there are lots of options. Quanta is a good program, but it has quite a way to go - particularly with regards to performance. Zend Studio is the best out there, but not free - if you don't mind laying out some money, you can't do better.

I always recommend people give phpmole a try, because they're usually surprised that a PHP editor written in PHP works so well. vim and emacs work quite well, particularly for those who are seriously used to them. I myself have a free licence of Zend Studio, but (and I think this probably says a lot) I still use Kate ;)


New PHP version

PHP 4.3.2, the first "proper" patch to the 4.3 tree is now available. 4.3.1 fixed just one bug, which had been fixed in earlier releases and broken in 4.3.0. So, 4.3.2 is the first meaningful patch, and it's a *big* difference - so big, in fact, that this is the first time since PHP 4.0 that the upgrade has been described as "strongly recommended" on the PHP homepage. This new releases incorporates over 150 (yes, one hundred and fifty) key bug fixes in the language, which should make everything that much more stable and reliable, while at the same time not affecting backwards compatibility in the least. "Strongly recommended"? 100% - this thing went through 5 release candidates before being declared final, and so is a massive improvement over all existing versions.


PHP 5...?

PHP 4.3 may well be the last minor version from the PHP 4.x branch, which would be no surprise given that 4.0 was launched way back in May 2000 - it's served everyone well, to say the least! Between now and 5.0, there will almost certainly be a 4.3.3 release to add features that were left out in the long 4.3.2 feature free, and potentially even a 4.4 as a stop-gap release before 5.0. In the meantime, extensive amounts of work are being done on PHP5 to make it as significant a redesign as PHP 4 was to PHP 3. If reading the php-dev mailing lists aren't your thing, consider visiting http://www.zend.com/zend/future.php to read a selection of features written by Zeev Suraski regarding the new functionality in PHP 5 - there are some /big/ changes coming, so it's best to get on the ball now!

We'll be covering PHP 5 in depth once it gets close to release. For now, it's best that you at least make sure you know what will change, how it will affect you, and whether it's best for you to try to bring your code in line now to save hassle later.