Subversion - Branches, tags and mergers

From LXF Wiki

Table of contents

Branches, tags and mergers

(Original version written by Graham Morrison for Linux Format magazine issue 70.)


For the final installment in this three part series, we look at how to manage your repository once development starts to expand.


This is the final installment in our three part tutorial dedicated to the version control system known as Subversion. We started by covering the basics of setting up and administering a Subversion server, before moving on to some of the more typical uses for Subversion from a client's perspective. This means you should now have a good idea of some the potential benefits using Subversion can bring to your project, but if you do take that leap, then it won't be long before you need what we're about to cover in this month's final installment which deals mainly with 'branches'.

This part is going to be something of a combination of the previous two. We'll be using some of the concepts from last month to implement some good server practices. These are going to have an impact on how you run your Subversion repository but it's all going to be done using the client. So you can see, this month involves a little of both the previous tutorials and makes a good conclusion to this short series.

Fire in the forest

Real trees, the kind that tend to grow in forests, have branches. Branches grow off the main trunk of the tree, and so do those of the Subversion variety. But there is a difference. Some Subversion branches can go on to become the main trunk of development, which would make a real tree fall over. Subversion branches can strengthen the development process rather than unbalance it.

There are numerous reasons for creating a branch from the main development trunk. The most common reason is to facilitate development of a new version, while still allowing for critical updates to be made to a previous one. To put this into context, every major release needs to be updated. The KDE 3.4 release, for example, was followed by an update called 3.4.1. This fixed many bugs and added several translations, but it didn't add any new features or functionality. Those are reserved for any upcoming major release.

Using two separate branches makes maintaining both a stable release version and a development version much easier. Bug fixes can be applied to both, while new features need only be applied to the development version; significant changes can also be merged back into the stable version. This means developers can forge ahead with new ideas using the development branch, safe in the knowledge that their work won't affect the stability of the release version.


Making a branch

You can look at a branch as nothing more than a copy of the main truck made at a certain point. There's nothing stopping you using a simple local copy to implement this but there's no need. Subversion treats a branch as nothing more than a copy, albeit with an initial history common to that of the trunk, and this is reflected in how you create a branch.

70_tut_svn_02.png-thumb.png (http://www.linuxformat.co.uk/images/wiki/70_tut_svn_02.png)


Branches and tags are taken from the main development trunk. A tag can't be modified, but is a snapshot of the trunk at a certain moment such as a 1.0 release. A branch, on the other hand, is a snapshot that can be worked on, either independently of the trunk or with the intention of merging parts back to the trunk.

For these examples, we're still using the simple 'helloworld' example from the previous two months. This simple project is nothing more than a small source file (helloworld.cpp) and a corresponding make file. Both files are currently in a single directory, but as we're moving to a branched development cycle, they need to be moved to their own branch.

$ svn mkdir branches
A         branches
$ svn mkdir branches/stable_1_0
A         branches/stable_1_0
$ svn mv helloworld.cpp branches/stable_1_0/
A         branches/stable_1_0/helloworld.cpp
D         helloworld.cpp
$ svn mv Makefile branches/stable_1_0/
A         branches/stable_1_0/Makefile
D         Makefile

What we just did was create a 'branches' directory in our local working copy, followed by creating another called 'stable_1_0', intended to hold the stable development branch. We then moved the two source files into the latter directory in preparation of taking a branch from the stable directory. The next stage is to copy the 'stable_1_0' branch directory to the directory we intend for the unstable development branch, but we first need to 'commit' the previous changes before Subversion will allow us to copy a directory. As with:

$ svn commit -m "Created new branch structure."
Deleting       Makefile
Adding         branches
Adding         branches/stable_1_0
Adding         branches/stable_1_0/Makefile
Adding         branches/stable_1_0/helloworld.cpp
Deleting       helloworld.cpp
Committed revision 5.

The only new part in the previous example is the '-m' added to the 'commit' command. This means you can enter your comments without Subversion having to launch an editor. Now that the previous revisions have been committed, we can make a development branch by simply copying the stable_1_0 directory:

$ cd branches/
$ svn copy stable_1_0 HEAD
A         HEAD
$ svn commit
Adding         branches/HEAD
Committed revision 6.

The most important part is how the branch is managed, which is usually down to project policy. In our example, we've used a copy of the stable branch to create a development branch suitable for adding features and fixing problems with. As this is at the 'head' of the development effort, we've called the branch 'HEAD'.

Bubble Up

Subversion stores every single revision as a new file system tree, creating a logical copy of all the files within a repository. This isn't the same as having a physical copy of each file, and depending on the number of changes, would be mostly a series of links to the previous version. This is what makes Subversion so efficient at making copies; it doesn't copy everything, only the changes from one revision to the next.

In Linux terminology, it's a similar concept to a hard link. A hard link looks and acts just like a file, but it's actually nothing but a connection to the location of the file on disk. In fact, filenames themselves are hard links, as they don't contain any document data - just its name and a link to where the file system can find it.

Changes are managed through a process known technically as 'bubble-up'. What basically happens is that every change is copied to a new file - the start of the bubble. There's no need to physically copy anything else, other files stay the same, but they still need to be inserted into the file tree for the new revision. Each revision is actually a complete file tree, made up of copied versions of any changed files, using links to previous files that haven't changed.

Subversion creates a new revision link between the edited file and its parent directory. From here, the links continually 'bubble-up' through the file tree until the process reaches the root, at which point the new revision is complete. This is why Subversion is able to copy the source tree, and create branches efficiently as it's only the changed files that are copied.

By comparison, CVS needs to read each and every file in the directory tree whenever there's a branch. The result is that the time it takes for CVS to create a branch is directly proportional to the number of files in the repository, rather than the amount of change.


Now that the branches have been made, any developers joining the project would work on the stable branch for bug fixing, and the HEAD branch for adding new features.

If security is a concern, you may want to restrict access to specific branches to certain people, or only allow certain people to create branches in the first place. If you only want to grant certain developers to be able to modify specific branches, you need to be using Apache for repository access. The reason for this is that the best way of providing per-user access rights is by installing the 'mod_authz_svn' Apache module.

A new working copy of the development branch can now be checked out by downloading from the HEAD directory:

$ svn checkout  file:///usr/share/subres/branches/HEAD
A  HEAD/helloworld.cpp
A  HEAD/Makefile
Checked out revision 6.

The most important aspect to working with branches in this way is that the history information from each file is maintained. The history for the HEAD directory will only go as far back as the branching. This makes sense when you consider that, to Subversion this is simply another directory that files have been copied to. But, you will find that the history information for the files within each branch will have been brought forward from their previous location.

The difference now is that the 'helloworld.cpp' has been copied into two separate directories (the two branches). The history for will also branch to reflect the changes made to each file individually. You can easily check this using the log command and you will find that both files share a common history up to the point where the branch was made. After which the history will reflect only changes made to each specific copy. Here is a truncated version of the output from the 'svn log helloworld.cpp' command in the HEAD branch:

r7 | Added a cutting edge feature
r6 | Added HEAD development branch.
r5 | Moved project into a branch structure
r4 | Resolved conflict by incorporating both changes.

You can see that helloworld.cpp has inherited the history from before the creation of the branch at r6 (revision 6). Revision 7 refers to the addition of a 'cutting edge' feature added to the file after the branch was created. Depending on how a project may be organised, the process of adding new features to the development branch of a project can include bug and stability fixes that should be ported to the stable branch. For this, we need to merge the changes into the stable branch.

Merging one branch with another

Working on the development branch of a project often involves solving problems that should be applied to the stable branch, especially when working with security problem. In our simple helloworld.cpp example, the development branch called 'HEAD' has been modified to include another output line containing 'a cutting edge feature'. In the real world of course, changes will be significantly more complex, but the principle remains the same.

Despite sharing the same ancestry Subversion sees the two files as totally separate. Last month we used 'svn diff' to check the differences between various revisions of the same file. This time we need to rationalise the differences between the same file but now copied to two branches. For this, we need 'svn merge' to apply the differences between two sources. First, from the stable working directory, we need to check the differences against those changes made within the unstable HEAD branch.

If we examine the log of helloworld.cpp from the stable branch, there's obviously none of the changes that have been made to the development branch:

r5 | Moved project into a branch structure r4 | Resolved conflict by incorporating both changes.

We're missing revisions six and seven from the HEAD branch. As you can see from the previous log messages from HEAD, revision 6 was the process of copying the files to the new branch, while revision seven was the addition of a cutting edge new feature (r7 | Added a cutting edge feature). You can see the differences between the two revisions with the diff command:

$ svn diff -r 6:7 file:///usr/share/subres/branches/
Index: HEAD/helloworld.cpp
=================
--- HEAD/helloworld.cpp (revision 6)
+++ HEAD/helloworld.cpp (revision 7)
@@ -6,6 +6,7 @@
 {
   cout << "Hello World!" << endl;
   cout << "Both modified additions." << endl;
+  cout << "Cutting edge feature." <<endl;
   return(0);
 }

The only difference is the addition of the 'Cutting edge feature', signified by the '+' symbol at the beginning of the line. As we covered last month, we could use the 'diff' output to generate a patch. But, there's no need with Subversion, as you can use 'merge' to apply the differences immediately to your local working copy. From a local working copy of the stable branch 'svn merge' will add only the changes specified by the revision:

$ svn merge -r 6:7 file:///usr/share/subres/branches/HEAD/helloworld.cpp
U	helloworld.cpp
$ svn status
M	helloworld.cpp

As you can see from the above section of code, the changes made to the HEAD/helloworld.cpp file are merged into the local copy of the same file, represented by the 'M' in the first column of output from 'svn merge'. The developer is now free to examine the changes made to the helloworld.cpp file, and commit them to the stable branch. As with last month's section on commiting changes, there is a chance that there is going to be a conflict between the merged and original version, so great care is needed when merging between branches.

One unconventional side effect to using the revision numbers to specify which changes to be merged is that you don't have to use them to update from one revision to another. You can also turn them around, using 7:6 instead of 6:7 for example, and has the effect of winding back any revision 7 changes back to revision 6. Using the previous example, we would type:

svn merge -r 7:6 file:///usr/share/subres/branches/HEAD/helloworld.cpp
G  helloworld.cpp

The 'G' in the output is to show that Subversion has successfully managed to merge the repository's changes into the local file. This is a good moment to mention the 'svn revert' command, which is a much safer method of reverting any local changes back to the version held in by the repository.

70_tut_svn_03.png-thumb.png (http://www.linuxformat.co.uk/images/wiki/70_tut_svn_03.png)


The code within a development tree is inherited by each successive version: [1] The original release (1.0). [2] Updates to the original release should just add fixes (1.1).

[3] Features are reserved for a significant revision update (2.0).

Another neat trick you can perform with branches is switching the branch your local working copy references on the server to a different branch. Logically enough, the command for achieving this magic is 'svn switch'. In fact, it doesn't actually do anything all that clever, it just changes the URL that your working copy references. You can see the current URL for your working copy using 'svn info':

$ svn info
URL: file:///usr/share/subres/branches/stable_1_0

From the above example, we can change our working copy branch from 'stable_1_0' to 'HEAD' using the 'svn switch' command:

$ svn switch file:///usr/share/subres/branches/HEAD
U  helloworld.cpp
Updated to revision 7.
$ svn info
URL: file:///usr/share/subres/branches/HEAD

Vendor Branches and Tagging

Vendor branches enable third-party projects that your development may depend on, to be integrated into your project tree. An example would be an external library that provides functions upon which your application relies. By using a vendor branch, you can keep on top of any changes in these external projects, but more importantly, you can make sure all your developers are using the same version. CVS has specific support for vendor branches, but Subversion is versatile enough to integrate them without too much difficulty.

A vendor branch usually exists as a separate folder structure under the root of the Subversion repository. This is often under a directory called 'vendor'- hence the term vendor branch. You need to import the entire third-party project into this folder. Whenever the third-party project releases a new version, you need to merge these changes against a local working copy so that your own changes won't be lost. You can then commit these changes back to the repository in order to update the version of the third-party application for other developers to use.

Quick Tip

It's entirely up to the project's maintainer how the file system is structured, but it's worth bearing a few things in mind. If you're developing different applications within the same repository, then you're obviously going to separate them at the top level into different directories. You also need to consider how you're going to manage branches and tags. Most projects use a folder called 'trunk' for the main development source tree, and create separate 'tags' and 'branches' directories at the same level. But of course, with Subversion, it doesn't need to be this way, and the file system structure you use is left entirely up to you.

To be able to use vendor branches effectively, you need to mark the third-party branch to show it can't be modified. This is known as 'tagging' in version control speak, and to Subversion, it's simply a branch that shouldn't be edited. Like a branch, a tag is a complete copy of the repository at a certain point in time. This may seem like any revision point within the repository, and it is, but creating a tag is often a good way of marking a specific point in the development cycle.

One of the main reasons for creating a tag is to mark a significant release version, such as the stable_1_0 in our example. Just like branches, tags are nothing more than copies of the repository and it's the 'svn copy' command we use to create them. The only differences are that the comment for the revision should reflect the creation of the tag, and that developers mustn't edit the tagged development branch.

To create a tag from our previous example:

$ svn mkdir file:///usr/share/subres/tags
Committed revision 8.
$ svn copy file:///usr/share/subres/branches/stable_1_0 file:///usr/share/subres/tags/release_1
Committed revision 9.

There's only one problem with this approach, and that is there's nothing stopping a developer from changing the contents of the tagged directory. Most of the time this won't be a problem; it's simply project policy to leave the contents of the 'tags' directory alone. But there is the option of hardening access if you need to.

This takes us right back to the first episode in our Subversion series (LXF68), because you can solve this problem using Subversion Hooks. If you can't remember, these are scripts that are executed when a specific Subversion event is triggered. Using a script in this way, it becomes trivial to 'revert' any changes that may accidentally be made to the tags branch. You can also restrict access using the 'mod_authz_svn, as discussed earlier.

And that's it! The end. C'est tout. Je suis un sandwich! After reading through this brief series of three tutorials, I hope you now have the confidence to not only run your own server, but to also contribute to a project. If the project is a success, then you will no doubt need to merge and manage changes as we have covered in this tutorial, hopefully bringing the whole development cycle full circle.

Svnserve

70_tut_svn_01.png-thumb.png (http://www.linuxformat.co.uk/images/wiki/70_tut_svn_01.png)
Once Svnserver is running you can access the server using Konqueror

We covered how to configure the Subversion server to use an Apache module for remote access in the first tutorial. But we've also used the 'svn:/' protocol for accessing remote servers. The server application responsible for servicing this protocol is called 'svnserve' and is usually part of a typical Subversion installation. It's not used by default because it's meant for smaller and less demanding access, but it is a worthwhile alternative.

Getting the server running is actually incredibly straightforward; you just need to execute the command with a couple of parameters - operating mode and path to the repository. For example:

///CODE// $ svnserve -d -r path_to_repository </pre>

This runs the server as a daemon (the other operating modes use the Internet Daemon, inetd), and points to the repository with the '-r' parameter. Once the server is running, you can access your repository directly using the 'svn' protocol, as with:

$ svn co svn://localhost

By default, svnserve only provides read-only anonymous access to the repository. This can be changed by editing the svnserve.conf file located within the repositories 'conf' directory. The configuration file is well documented, and it's under the [general] section that you can revoke or allow anonymous write access. To add per-user permissions, enable it in the auth-access field, and point password-db to the location of a password file and create the password file yourself. For example:

svnserve.conf:
[general]
anon-access = none
auth-access = write
password-db = passwd

passwd:
[users]
graham = grahampassword

You will then be able to access the repository as a user by specifying the username before the server address and entering the password, as in:

$ svn co svn://graham@localhost

The only problem with the above method is that passwords are sent as plain text, and do pose a security risk. The answer is to use the ever-versatile ssh to harden the connection between the client and the server. All you need is to have OpenSSH installed, and a user account on the server machine. 'svnserve' is run as the connecting user, so there's no need to have it running beforehand, but the client needs to specify the location of the repository on the server filesystem. For example:

$ svn co svn+ssh://graham@localhost/usr/share/subres