PHP - The Curl library

From LXF Wiki

Revision as of 02:25, 21 Jun 2007; view current revision
←Older revision | Newer revision→

PHP: Come Curling

(Original version written by Paul Hudson for LXF issue 68.)


Sockets are shiny, and FTP is efficient, but why must you treat them separately? We get on the ice to show off Curl.


PHP 5.1 looms on the horizon like the Super Star Destroyer in The Empire Strikes Back. Of course, it has the PHP development team at its helm rather than Darth Vader, but let's not quibble over details: it's coming, and should enter beta in the next few months. We'll be giving it a thorough examination once the beta process starts, but for now we recommend you visit the PHP snapshots site at http://snaps.php.net and compile the source yourself to see what's changing.

This issue our main focus is the Curl library, which stands for Client URL Request Library. This is a unifying system that groups HTTP, FTP, Telnet, and other protocols under one roof, meaning you needn't worry about individual PHP functions. Basic Curl usage is easy, however there are quite a few special constants to learn - more on that later!


Getting started

The first thing to try with Curl is to initialise the library, download a URL, then close the library. Curl is really just a state machine, which means you need to set its configuration options completely before asking it to execute. Although the Curl extension has quite a few functions, the main four are:

  • curl_init() - creates the Curl instance and returns it for you to store in a variable
  • curl_setopt() - sets configuration options for a given Curl instance
  • curl_exec() - runs the Curl instance
  • curl_close() - closes the Curl instance and frees up the memory

We need all four to get one complete Curl script, like so:

<?php
   $curl = curl_init();
   curl_setopt ($curl, CURLOPT_URL, "http://www.worldcurlingfederation.org");
   curl_exec ($curl);
   curl_close ($curl);
?>

So, we squirrel away the return value from curl_init() for use in later functions - this is crucial, as with most PHP extensions. Then, curl_setopt() is called, passing in that Curl instance along with CURLOPT_URL and a URL. It's a bit of a no-brainer what that does: it simply tells Curl to use the third parameter as the URL to visit. The next line uses curl_exec() to execute the Curl request, then curl_close() frees up the memory. The important part, as you can see, is the curl_setopt() parameter: that's where you set all the options for your Curl request, and can drastically change what Curl does.

By default, curl_exec() runs the request, then outputs whatever it received back directly to the screen. This can be changed with curl_setopt(), using the CURLOPT_RETURNTRANSFER setting - it's set to 0 by default, which prints to the screen, but changing it to 1 will cause curl_exec() to return the received data rather than just print it out. We can rewrite the above script like this:

<?php
   $curl = curl_init();
   curl_setopt($curl, CURLOPT_URL, "http://www.worldcurlingfederation.org");
   curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
   $return = curl_exec($curl);
   curl_close($curl);
   print $return;
?>

Although the two scripts are functionally identical, the latter allows us to post-process the web site before printing it, if it gets printed at all - there's no reason it couldn't be saved to a file or sent back over the wire someplace else. If you do plan to save the data to a file, Curl can handle that for you also with a different constant: CURLOPT_FILE. This should be a file pointer opened for writing, like this:

<?php
   $curl = curl_init();
   $file = fopen("output.txt", "w");
   curl_setopt($curl, CURLOPT_URL, "http://www.worldcurlingfederation.org");
   curl_setopt($curl, CURLOPT_FILE, $file);
   curl_exec($curl);
   curl_close($curl);
   fclose($file);
?>

Now you have seen curl_setopt() taking a string, an integer, and a file pointer as its third parameter - it's variability means you only need the one function for setting options.

The default setting for CURLOPT_URL is to have the request sent as HTTP GET. This can be changed through two other constants: CURLOPT_POST, which enables HTTP POST mode, and CURLOPT_POSTFIELDS, which is where you specify the fields you want to send over POST. To test this, we need to create a second script that will output the data we send to it with the first script, so save this code as postreceive.php:

<?php
  var_dump($_REQUEST);
?>

We then need to modify the original script to this:

<?php
   $curl = curl_init();
   curl_setopt($curl, CURLOPT_URL, "http://localhost/postreceive.php");
   curl_setopt($curl, CURLOPT_POST, 1);
   curl_setopt($curl, CURLOPT_POSTFIELDS, "This=Test