Subversion - Setting up

From LXF Wiki

Table of contents

Installing and configuring a Subversion server

(Original version written by Graham Morrison for Linux Format magazine issue 68.)


You don't need to use proprietary software like BitKeeper to manage your project development. We explain how to set up and configure an open-source equivalent, Subversion, in the first of a three part tutorial.


This is the first of a short series of three tutorials covering the version control system known as Subversion. This tutorial is aimed at server administration, from importing a development project and migrating version history, to setting up remote access and Apache integration. Next month will cover the client side of Subversion, with the third instalment on more advanced features such as branching and tags. At no point will we cover printing your own copies of Das Kapital for distribution amongst friends and colleagues.

In the beginning, programmers used to hide in their laboratories tapping away on machines the size of the Cavaillé-Coll organ in the Sacre Coeur. Their results, along with the programs that generated them, could only be saved on strips of paper and this made collaboration a little difficult. The modern age has changed all that. Computers are networked together and development projects of hundreds or even thousands of people are actively sustaining themselves. Without online collaboration there would be no Linux.

The magic behind collaborative project development is called version control, and as the name suggests, it's software designed to keep track of every single change or revision. This is more complicated than it sounds. What do you do if there's more than one developer working on the same piece of code? Or if an addition breaks the project in some way? RCS, the original Revision Control System was designed to address many of these problems, and in its turn the Concurrent Versions System (CVS) was designed to plug some RCS holes. Specifically, CVS was intended to handle a whole project rather than separate files.

CVS uses a simple concept. A project's source-tree is stored on the server, known as the repository. Each developer then needs to checkout their own copies of this tree by downloading the source to their local machine, which becomes their working copy. Any changes a developer makes to their working copy need to be committed back to the repository. If more than one developer has altered a piece of code, creating a conflict that CVS cannot resolve, the code needs to be manually edited before the server will allow the commit.


68_tut_svn_02.png-thumb.png (http://www.linuxformat.co.uk/images/wiki/68_tut_svn_02.png)
CVS and SVN share the same development cycle. [1] Original version of file in repository. [2] File is checked out by two developers. [3] Each developer edits a different part of the file. [4] Both files are committed into the repository, merging both changes.

As you might expect, CVS is built using a client/server model. The server handles the content while the client requests changes to be made to the server's document tree. It doesn't necessarily need to be application source code, in fact, there are a few crazy people who attempt to use CVS for controlling their whole life, but that's a different story.

CVS is basically a multi-document extension to RCS, and has inherited many of the problems associated with its progenitor, and even managed to add a few of its own. This is where Subversion comes in. Subversion was developed as a direct replacement for CVS, built from the ground up to solve many of CVS's problems. Top of its features list is version control for directories; not just files as with CVS. This makes a massive difference, as Subversion lets you manipulate directories in the same way you can files, moving, renaming, copying and deleting. The other big problem with CVS is also a direct result of its file based structure. It's possible to get half way through a commit and lose the connection. The result is that half the files from a developer may be updated, while the other half aren't, leaving the administrator with no way of knowing which files were updated and which weren't.

The solution, as provided by Subversion, is to make all changes to the repository 'atomic'. This means that all changes are updated as a single transaction, avoiding half updated errors and dual commits. This, along with other essential additions (resource cheap branching and tagging, excellent support for binary formats) makes upgrading to Subversion almost imperative for CVS users at some point in the not-too-distant future.


Setup

Once installed, the first step to administering a subversion repository is obviously to create the structure and populate it with files. Subversion uses the Berkeley database which restricts installation to a local file system. This is because Berkeley relies on a file system that supports relatively advanced features not currently available over a network equivalent (such as Samba or NFS). Once a suitable location for the repository has been selected, a repository can be created using a command Subversion reserved for dramatic repository changes, called svnadmin.

To create your own repository called subres, execute the following from the directory you wish the repository to reside (I've used /usr/share):

$ svnadmin create subres

Managing a Subversion repository isn't all that different from managing a CVS one but as an administrator, the most important thing to understand is the Subversion file system. This isn't a file system in the same way you think of ext3 or Reiser, but instead, it's the file structure and organisation that SVN expects. The previous command generated this structure under the subres directory, which should now contain a couple of files and several directories.

There are five directories in total; conf, dav, db, hooks and locks. Conf contains a repository's specific configuration file called svnserver.conf. The dav directory contains the system's bookkeeping information, specifically used with the Apache access modules. The aforementioned Berkeley database system uses the db directory for all its files.

Hooks are scripts that can be triggered by a specific repository event. These events include commits, or version changes, and the hook directory contains an example script for each action that SVN implements. The locks directory is used for containing the locking data, responsible for tracking accesses to the repository.

Subversion Hooks

Scripting added Subversion functionality

Hooks are executed when a specific event is triggered from the repository. Each event corresponds to one of the following scripts within the hooks directory:

start-commit; This is run before the transaction and is used to ensure the user has the adequate privileges for the commit.

pre-commit; This script is executed after the transaction with the repository has finished, but importantly, before the it makes the final commit. This is to ensure that the entire transaction adheres to the repository's commit policy. This may be as simple as checking the destination location, but this script can also be pulled into authorisation requests or checking a commit against a bug tracking tool.

post-commit; After the transactions has been successfully committed, this script is most commonly used to inform a project's development mailing list of the new revision. Subversion even includes an example script for exactly this reason (see subversiontools/hook-scripts/commit-email.pl).

pre-revprop-change; Subversion's own revision properties are the only data in a SVN repository to not be versioned. This means that changes to those properties are lost unless the administrator takes some action to store the previous changes and that's where this script comes in. It is executed before the property change, making it easy to script the necessary operation.

post-revprop-change; Depending on whether the presence of the pre-revprop-change script, this one is executed after a revision property as been changed. Once again, the Subversions package includes an example for informing a development of any changes to a revision property, through email.


That's enough theory for now. To make this repository useful, we need to import a source tree. As with CVS, this is accomplished by using the 'svn import' command. Subversion uses an URL to reference locations, and local directories are prefixed by three forward slashes to indicate the lack of an internet address. Later on, when we've configured the Apache modules, we could use http and revert to a more familiar use of the forward slashes.

$ svn import file:///usr/share/subres -m "Initial project import"
Adding         helloworld/helloworld.cpp
Adding         helloworld/Makefile

Committed revision 1.

As you can see from the output, the import command takes the contents of the helloworld directory and inserts the two files into the repository, any subdirectories would be included too. It's here that you can see one of SVN's conceptual differences from CVS, a revision number for the whole update rather than one for each file. The committed revision number reflects each atomic update.

To make sure that the files have been successfully added to the repository, you can list the contents using either the list or ls commands:

$ svn ls file:///usr/share/subres
Makefile
helloworld.cpp

To checkout a working copy of the project, you use the co, or checkout command, from the directory you want the project to be:

$ svn co file:///usr/share/subres
A  subres/helloworld.cpp
A  subres/Makefile
Checked out revision 1


Migration

The only problem with importing a project in this way is that you will lose any revision history from you current versioning system. This can make migrating to Subversion more trouble than it's worth. There are a couple of options, however, that can work to bring your current versioning system's revision history into your new Subversion repository.

The simplest way to maintain some project history is to export each significant release from your old system into Subversion. You would first import the earliest version of the project that your current versioning system contains, after which you need to checkout a working copy. For each release you want to export to Subversion, you need to copy the files onto your working copy, remove any necessary files or directories, and update each version in turn.

It's easy to change files and directories directly within your working copy, following the routine of adding them first to the repository, followed by update and commit. But, it's worth bearing in mind that Subversion differs from the way CVS handles files in an important way - it's able to maintain the revision history of renamed and copied files, as long as you use Subversion's own rename and copy commands.

Common file operations that have a Subversion equivalent include copy, rename and move, and are used in roughly the same way as their shell counterparts, as with:

$ svn cp helloworld.cpp welcome.cpp
A         welcome.cpp
$ svn rm helloworld.cpp
D         helloworld.cpp

The most essential Subversion command throughout all these file operations, and indeed, probably for Subversion use in general, is status. You can check the status of all the files in the repository, which is especially useful before a commit, using the verbose extension.

$ svn status -v
                3        3 graham       .
D               4        4 graham       helloworld.cpp
A  +            -     	 ?   ?          welcome.cpp
                3        1 graham       Makefile

The output shows the current status of the files contained within the working copy. The D and A in the first column show that helloworld.cpp has been deleted (D) while welcome.cpp has been added (A). The '+' symbol shows that the addition of the welcome.cpp includes revision history for that file. This is a result of using the Subversion copy command, rather than copying the file manually. The numbers are for the working revision and the last committed revision. The first number is the currently committed revision and the second is the last committed version, followed by the author.

Committing the changes, followed by checking the repository status again should generate output similar to the following:

$ svn commit -m "Copied and removed cpp files"
Deleting       helloworld.cpp
Adding         welcome.cpp

Committed revision 6.

$ svn status -v
                3        3 graham       .
                6        6 graham       welcome.cpp
                3        1 graham       Makefile

Using the above commands, it should be possible to build a worthwhile revision history from a project's release history. It requires a fair amount of work though, and it would be much easier if there was some form of automation to the process, or even a tool for converting the revision history from another version control system. Luckily, there are a couple of scripts that have been developed to migrate your old source repository to a Subversion one.

68_tut_svn_03.png-thumb.png (http://www.linuxformat.co.uk/images/wiki/68_tut_svn_03.png)
Rather than import an entire project history into Subversion, it's sometimes easier to just select several significant releases, and import them.

As you might expect, CVS has received special attention when it comes to migrating to Subversion, and the cvs2svn python script usually yields excellent results. The process works by first sniffing through all the revision control entries from the CVS system, which are then extracted and used to rebuild the version history within Subversion. There are several possible levels of complexity to the conversion script, which not only affect the conversion, but also the performance and storage requirements for your repository.

The lightest option is to only include the main trunk of the CVS repository (cvs2svn --trunk-only), ignoring all the historical data from any branches or tags you may have. If this is too restrictive, you can install only the branches and tags that are of most importance to your project (you could safely ignore nightly builds for example). This is accomplished using the --exclude flag to specify which branches and tags to ignore. Finally, you could just import the whole shooting match and use the cvs2svn script to convert the complete repository to Subversion.


Remote access

Obviously, the vast majority of users accessing a Subversion repository are going to be spread throughout the known universe, and that's going to involve networking. Subversion uses Apache, and more specifically a WebDAV protocol, for providing repository access to networked clients. Installation and configuration is straightforward, with the only requirement being a working Apache 2 installation. Subversion won't work with any earlier version than 2.

As long as you've got a working Apache server, installation is as straightforward as adding a few lines to your distribution's Apache configuration. The two required modules are mod_dav and mod_dav_svn, and should be added automatically by your package management utility. Otherwise you need to make sure these modules are loaded in order, usually from http.conf. This file also needs some specific Subversion configuration information:

<Location /svn> 
DAV svn 
SVNPath /usr/share/subres 
SVNAutoversioning on 
AuthName "LXF Subversion Repository"
AuthType Basic 
</Location> 

The comments can obviously be removed as necessary. DAV simply informs Apache as to which module to use (svn) and SVNPath points to the repository. Because Subversion uses a WebDAV-compliant web server, it is just about possible to use other DAV compatible clients to access the Subversion repository. The theory is that such compliant clients can treat the repository as a kind of file system, in the same way you can access samba or NFS through your file manager.

While there's a great deal yet to be implemented to get true compatibility, the SVNAutoversioning is a compromise and allows updates from generic clients without an associated log entry or comment; Subversion adds a generic one automatically. This is far from ideal, but it does provide an extra option.

The final setting in our example is the authentication type, currently set to use HTTP's own basic authentication. Write access could be restricted by only allowing general users read access to the repository via restricting HTTP to commands that Subversion relies on for reading only:

<LimitExcept GET PROPFIND OPTIONS REPORT>

Using this example, users will be allowed write access only through the provision of separate user accounts and a corresponding htpasswd file, as you would with normal Apache configuration. Secure connections are also possible by accessing the repository through https rather than http and this relies on a preconfigured mod_ssl module.

After Apache has been configured, it should now be possible to access the repository remotely using your server's URL rather than the file one we were using before:

$ svn co http://localhost/subres


Maintenance

Once a repository is up and running, there are several commands that are essential to its maintenance. The most important of these is svnadmin, a single command that is roughly equivalent to cvs admin, and provides several vital subcommands. We've already seen 'svnadmin create', which was responsible for generating the repository earlier on in the tutorial.

As you can probably guess from the command name, 'svnadmin dump' can be used as part of a backup, but is more useful for migrating the contents of one repository to another. The command itself copies the contents of the repository to the stdout, using a portable dump-file format. The dump command also allows you to specify a range of repository revisions to include, rather than the default option of including the whole lot.

$ svnadmin dump /usr/share/subres >dump.txt
///ENDCODE///

The essential partner to the dump command is 'svnadmin load', which can import the output of the dump command. But neither command is ideal for backing up a repository. That task falls to another admin command, hotcopy. Thanks to Subversion using the Berkeley database system, it's perfectly possible to copy the contents of the repository without taking it offline. The standard Subversion installation also provides a Python script within the tools/backup directory that can be scripted to run automatically (via cron for example), rather than running 'svnadmin hotcopy' manually.

The best way to examine the status of a working repository is by using the svnlook command. By default, svnlook examines the most recent revision, and the command is read-only, so there's never a danger that any potentially catastrophic changes could be made.

<pre>
$ svnlook info /usr/share/subres --revision 1
graham
2005-04-15 16:33:44 +0100 (Fri, 15 Apr 2005)
22
Initial project import

'svnlook info' is great for getting a general overview of a repository, with the above command displaying author, date, and log message (and size) for the first revision made to the repository. Each field can be accessed independently by replacing info with author, date, or log and lends itself particularly will to writing scripts for use with the hooks mentioned earlier.

At this point, you should have a good understanding of the basic principles of running a Subversion server. The next stage would be to either start a project using Subversion, or rescue a CVS project languishing on your hard drive. There's really no substitute to using Subversion in anger, and once you've got the repository up and running, the next stage is for you and your co developers to use it, but that's for the next tutorial!


Subversion versus CVS

Despite SVN and CVS sharing many of the same commands, there still many differences.

It's no secret that Subversion was developed to replace CVS, and it's for this reason that both applications are so similar to use. For the client, many of the commands are identical, and you can often get away with replacing cvs with svn on the command line. It's the background processing, the server and the network protocols that are different, and that's largly transparent to the typical user. Here is a list of some of the commands that are slightly different:

cvs admin	svnadmin	Access the administrative front end
cvs annotate	svn blame	Show revision information for files
cvs co -j	svn merge	Merge two sources into a working copy
cvs history	svn log		Outputs the log messages
cvs init	svnadmin create	Create a new repository
cvs rdiff	svn diff	Generate list of differences for making  a patch
cvs remove	svn delete	Removes an entry from the repository
cvs rtag	svn copy	Copies an item including its revision history


Quick tip 1

Help me!

The best way of getting to grips with Subversion is through its excellent online documentation. Rather than relying on the traditional man of help commands, svn provides an excellent overview of all its internal commands by executing svn help from the command line. This provides a list of svn's internal commands which can themselves be queried by adding them to the end of the help command:

$ svn help co

There is also an excellent online book entitled 'Version Control with Subversion' that can be either browsed online, or downloaded as a PDF or HTML document: http://svnbook.red-bean.com/


Quick tip 2

Use a window

There are plenty of people who use a graphical front end to access a CVS repository, and many are going to be disappointed that tools like KDE's Cervisia don't provide a SVN version. Luckily, there's an excellent graphical frontend that can almost compete, called eSvn (available from http://esvn.umputun.com), a multi-platform application using the Qt library. From within eSvn you can accomplish nearly all the same tasks as from the command line, including checkout, merge, revert and diffs. You can also examine the current state of a repository in real-time.

68_tut_svn_01.png-thumb.png (http://www.linuxformat.co.uk/images/wiki/68_tut_svn_01.png)
Check local revisions against a repository with eSvn