Link into libraries

From LXF Wiki

Build a super filesystem searcher in under an hour? Paul Hudson makes the impossible possible...

Your mission, should you choose to accept it, is to write a program that indexes a user's filesystem in the background, scanning emails, RSS feed, browser history and the contents of files, then lets people search through all of them at lightning-fast speed. Got that? Okay, I'll see you in an hour - good luck!


Table of contents

The cheater's guide

This being only the fourth part of our tutorial series, I wouldn't be surprised if the thought of writing such a complex application gave you the shakes. Surely I couldn't possible ask you to tame such a tricky beast all by yourself? Well, I am going to make you write such a program, but I'm not going to make you to do it alone - I'm going to show you exactly how to do it in my customary suave(!) manner. After all, that's why you bought the magazine, right? Right.

We'll also be sticking quite closely to the time limit of an hour. Yes, we're going to build this amazing masterpiece in under 60 minutes, which might seem like we're in need of a miracle. But think about it - what Linux application do you know of that can search through all those data sources? Come on, think! The answer is Beagle, of course. And what platform is Beagle built on? Mono! And what is this tutorial about? Mono! And why the need for all these self-answered questions? Because Rhetorical is my middle name, baby!*

If there's one thing that .NET does well, it's let you stand on the shoulders of giants by re-using their libraries. Sure, C does this as well - the program Ark RPG has libarkrpg0c2a, which lets other C-based programs call its library routines rather than reimplement the functionality themselves. But this has problems: GStreamer is split up into multiple libraries, each of which have specific version numbers. Here's a fairly common example: libgstdataprotocol-0.10.so.0.8.1. C doesn't take well to these libraries being changed without the program being updated, which is why library version numbers are so specific - an app is likely to rely on libgstdataprotocol-0.10 exactly, and not 0.9 or 0.11.

Mono does away with all this by letting programmers say "give me Beagle" and they'll get whatever version is available. This doesn't always work too well, particularly in the case of Beagle and some other new Linux apps that have yet to reach 1.0. The reason they have such low version numbers (Beagle is just 0.2.14) is because they don't have stable Application Programming Interfaces (API), which is the term used to describe the function names, parameter types and return values that a given library has. A stable API means that if the function DoFoo() accepts an integer and returns a string in v1.0, any programs using it will also work with v1.01 of the library because the function hasn't changed.

The important thing in all this is the "I" in API: the interface itself (name, parameters, return value) might be fixed, but how the library goes about its business - how it actually accomplishes its task - is completely unknown, and might well change between versions even though the API is fixed.

In summary:

  • Because we're using Mono and Beagle uses Mono, we can take advantage of its power.
  • Beagle has its own API that lets us call into its functionality, but it's not stable and might change in the future (it has changed quite drastically in the past!)
  • To use Beagle, we just need to know the names of its functions and how to use them.

If that still sounds hard, that's because it is hard. But you haven't exchange your £6.49/$15/daughter for a copy of this magazine just so we can do easy stuff! Let's get coding...


Hit me one more time

Mono-based Gnome applications tend to avoid the G-naming frenzy that enveloped the desktop in the early years, so we need to name our application something Beagle-related and something that also has a weird geek in-joke. My suggestion is Poochy, which is the name of an obscure Nintendo character (even Mike hadn't heard of him!) and of course fits into the dog theme set by Beagle.

So, start up MonoDevelop, go to File > New Project, select C# > Console Project, uncheck the "Create separate Solution subdirectory" box, then enter the name "poochy" into the Name field and click the New button at the bottom. You'll get the same old "Hello World!" application that we "programmed" at the start of this series - just go ahead and remove the Console.WriteLine() line, leaving the Main() method empty.

We need to use Beagle and - confusingly for now, at least - Gtk for this project, so right-click on References in the left-hand pane (just above Resources, AssemblyInfo.cs and Main.cs) and choose Edit References. From the window that pops up, switch to the Packages tab, then select Beagle (version 0.0.0.0 - I mentioned it was far from stable, didn't I?) and gtk-sharp (version 2.10.0.0). These version numbers just tell us what version we have installed currently - if someone has a different version of Beagle, Mono will use that instead.

We've already requested that we want to make the Beagle and gtk-sharp libraries available to our code, but to actually be able to use them we need "using" lines at the top. The default using line is this one:

using System;

The "System" library includes all the basic tools to write a console application, including providing access to the console itself for reading and writing purposes. We need to include Beagle and GTK in that list, so - as you can probably guess - you need to add these two lines:

using Beagle;
using Gtk;

The Main() method is where we need to set up Beagle and tell it what to search for, which is all done using the special Beagle object called a Query. We can also use this Query object to attach our own methods that should be run when certain events happen, such as when it finds something that matches out search results. Here's how it all looks in C# - place this code where the old Console.WriteLine() was:

// these creates a new query, and attaches two methods to it to handle events
Query q = new Query();
q.HitsAddedEvent += OnHitsAdded;
q.FinishedEvent += OnFinished;

// this tells Beagle to search for the user's parameter
q.AddText(args[0]);

// this makes Beagle start its search
q.SendAsync();

That's not complicated, but neither is it the whole story: Beagle will call the OnHitsAdded() method whenever it finds a file/RSS feed/etc matching our search, and it will call the OnFinished() method when it has finished searching - we need to implement both of these before the code will compile.

For now, these two empty stubs are enough to let the application compile:

static void OnHitsAdded (HitsAddedResponse response) { }
		
static void OnFinished(FinishedResponse response) { }

The HitsAddedResponse and FinishedResponse objects provide us with lots of interesting data, but for now we're just going to throw them away so that our program compiles cleanly. So, go ahead and hit F8 to compile this first release of Poochy (let's be brave and call it 0.1), then open up a terminal window and browse to /path/to/poochy/bin/Debug. Poochy needs to be run from the command line because it reads from args[0], which we need to provide on the command line to give it something to search for. Run this command:

mono poochy.exe foo

That searches the Beagle cache for "foo", and it should take a second or so to execute - you should replace "foo" with something you definitely know exists in your Beagle cache, such as a file called "todo" or something. But even then you'll notice it outputs absolutely nothing - no hint of any files or other results that contain "foo". Is it a problem with Beagle, or a problem with our code? Obviously Beagle is absolutely perfect, which means the problem must lie with our code. Engage sleuthing hat...


The race against time

There are two reasons for this, of which the most notable is the fact that our OnHitsAdded() method - which is what gets called whenever Beagle finds a result, and is where should be printing things out - is empty, so Mono does nothing whenever it gets a hit back from Beagle. The easy fix to this is to have Poochy print out the URL of each hit Beagle finds, like this:

foreach(Hit hit in response.Hits)
{
	Console.WriteLine("Hit: " + hit.Uri);
}

If you run the program again, you'll see it still says nothing, even though Beagle should definitely have found the text you're searching for. The problem now is that our Beagle search is done using the SendAsync() method, which means that Beagle should perform the search asynchronously. Think for a moment how programs get executed - the computer starts at the first line, then goes to line two, then line three, and so on, until it hits the end. Sure, there's some element of jumping around as methods get executed, but the point is that right now only one line of code is executed at a time.

Now, there are two ways to make Beagle do its search, known as synchronous and asynchronous. The first means "go and execute your search; don't let me continue until you've finished" and the second means "go and execute your search, and I'll carry on executing my program; go ahead and interrupt me as you do things." This is also known as "blocking" and "nonblocking", because a synchronous method blocks your next line from executing until the method finishes, whereas an asynchronous method lets your code carry on executing while the method executes in parallel.

The reason you need to know all this is because of one possibility: what happens if your program finishes executing before the asynchronous method call has done anything? That is, what happens if our program finishes before Beagle has had chance to find anything? Well, the program terminates, and the method call is effectively killed before it can print anything out. That's why our program is ending without printing anything out, which means we need to have it hang around until Beagle has finished with its search.

Here's where GTK makes its entrance. When you use GUI programs, the application sits and waits until you type something, click a menu item or even mouse your mouse over a button. In fact, any action you take sends a signal to the application that lets it respond accordingly, which means that application is really just waiting to receive these signals. Obviously these applications don't just terminate when they finish executing all their actions, otherwise you'd need to type continuously into AbiWord in order to stop it closing itself. Instead, they use something called a "main loop" that looks something like this:

while (1) {
   LookForSignals();
   ActOnSignals();
}

Yes, that's just an infinite loop, but it's an infinite loop that effectively consumes no CPU time, which lets other parts of the application (eg the GUI, or in our case Beagle) run at full speed. This is exactly what we want, which is why we're going to use the GTK main loop to have our program sit idle so that Beagle can perform its search and print out the results without being killed prematurely.

To do this, we just need to add two lines to the end of the Main() method, like this:

Gtk.Application.Init();			
Gtk.Application.Run();

That tells GTK to start running and continue to run, so go ahead and compile with F8 then run it with the same command line again and this time you should get a list of results similar to this:

Hit: file:///home/paul/foo.xml
Hit: file:///usr/share/doc/tzdata-2006m/tz-link.html
Hit: file:///usr/share/gtk-doc/html/libuser/libuser-config.html
Hit: file:///usr/share/doc/nant-0.85/releasenotes.html
Hit: file:///usr/share/gtk-doc/html/libgnome/libgnome-gnome-config.html
Hit: file:///usr/share/gtk-doc/html/gtk/gtk-question-index.html
Hit: file:///usr/share/gtk-doc/html/gtk/gtk-Resource-Files.html

...and you'll also see that in fixing one bug we've created another: the application doesn't actually exit. After Beagle has returned all the hits it can, Poochy will sit there dumbly waiting for more.


Two steps forward, one step back

You've already seen how application main loops are really just glorified infinite loops, which means that as soon as we call Application.Run() we're entering a state of permanent execution. As long as the program doesn't crash, the machine isn't rebooted, or we don't press Ctrl+C, Poochy will run forever in its current state.

This might be what you want - perhaps you want to have it check a resource frequently, or perhaps you want it to wait for other searches to be sent over a socket. But Poochy is destined to be a program that gets told to execute a search, then quits as soon as it finishes. We've already filled in the OnHitsAdded() method stub, so now we can turn our attention to the OnFinished() method, which is currently empty.

When Beagle has returned all the results it has, it looks to see whether you've registered a method in the FinishedEvent property. We already did that with our OnFinished() method, which means that method will be called as soon as Beagle finishes. At that point we need to tell GTK to exit its main loop so that our program can terminate properly, which is actually very simple. Here's the new OnFinished() method:

static void OnFinished(FinishedResponse response)
{
   Application.Quit();
}

Application.Quit() is a GTK method that exits the main loop, cleans up all the resources GTK was using (very little in our case, as we're not doing anything graphically), then returns control back to our application immediately after the Application.Run() line in our Main() method. So, effectively the code goes like this: Main() -> SendAsync() -> Application.Run() -> GTK main loop -> OnFinished() -> Application.Quit() -> Main().

Compile, run, and marvel at your expertise: your program does exactly what we hoped it would do, and only took ten lines of meaningful code!


But wait... there's more!

Poochy prints out all the URLs, RSS feeds, OpenOffice.org documents, MP3s, PowerPoint presentations, Tomboy notes, source code, JPEGs, Applications ... (deep breath) ... emails, videos, calendar appointments, Zip files and installable packages that it finds on your system, as long as they match the text we pass in with AddText(). But what if you only want files stored on your hard disk rather than absolutely everything?

Two issues ago we looked at the EndsWith() method of strings, which accepts a string as its only parameter and returns true if string A ends with string B. For example:

string foo = "bar";
bool baz = foo.EndsWith("r");

When that code has run, the baz variable will be set to true, because foo ends with 'R'. .NET also has the StartsWith() method, which does pretty much the opposite - it returns true if one string starts with another. This means we can edit our code so that it only prints files by changing the OnHitsAdded() method to this:

static void OnHitsAdded (Beagle.HitsAddedResponse response)
{
   foreach(Hit hit in response.Hits)
   {
      if (hit.Uri.StartsWith("file"))
      {
         Console.WriteLine("Hit: " + hit.Uri);
      }
   }
}

That's the brute force way of working: Beagle returns everything, and we filter before printing it out. But Beagle can be told to search for items of a specific type by using the AddHitType() method - this lets you specify in plain text the exact type of information you're looking for. Valid parameters include Application, Calendar, Contact, FeedItem (for RSS), Image, IMLog, MailMessage and - most important for us - File.

Going back to the Main() method again, look for the q.AddText(args[0]) line and add this line just before it:

q.AddHitType("File");

When you run the program now, the only data type that will be returned are files - no more chat logs and the like cluttering up our results!


The Mono FAQ, part 1

Q: I don't get it. Why do I have to set something as a reference, then as a using line?

A: Because Bill Gates said so, that's why. The technical answer is slightly trickier: adding something as a Reference lets you access its code; adding something with a "using" line lets you access its code with fewer keystrokes. To use Beagle, we absolutely have to add it as a reference - this makes all its objects and methods available to us. We can then immediately start using it, with adding a "using Beagle;" line - we just need to say Beagle.Query rather than just Query. By saying "using Beagle;" we're telling .NET that when we say Query we mean Beagle.Query. This is much more useful in other places, such as when working with the SHA1 algorithm. In .NET this is hidden under the System.Encryption.Cryptography namespace, which means we need to say "System.Encryption.Cryptography.SHA1CryptoServiceProvider foo = new System.Encryption.Cryptography.SHA1CryptoServiceProvider()" - that's pretty ugly, I think you'll agree! Instead, we can put "using System.Encryption.Cryptography;" at the top, then just "SHA1CryptoServiceProvider foo = new SHA1CryptoServiceProvider()". Keep in mind that MonoDevelop adds "System" as a default reference.

Q: Why do we use += to attach a method to a Beagle event - I thought += was for adding to a variable?

A: += is used for adding to variables, yes, but if you extend that metaphor to events - things that happen as Beagle is working - then you'll see that it makes sense for += to be used to add methods to events. This is known as "subscribing", and by using += you can actually subscribe multiple methods to one event - Mono just runs them one by one in the order you subscribed them.

Q: What is the AssemblyInfo.cs file for?

A: This might sound silly, but it really is for storing information about your assembly! In .NET terms, an "assembly" can either be a shared object (an SO file on Linux, or a DLL on Windows) or an executable file. The version number of your executable, the name of who programmed it, etc, all gets combined into the final poochy binary file, and you set all this data in AssemblyInfo.cs.

Q: Do I really need to have all my { and }s on their own lines?

A: No - far from it! I personally like to have the opening brace ({) attached to the end of the line before it. This is known as the One True Brace Style, because it was style adopted by Brian Kernighan and Dennis Ritchie (usually abbreviated to K&R) when they invented the C programming language. MonoDevelop uses the BSD style by default, which places the opening braces on a line by themselves, so I've stuck with it here for consistency. Choose whatever suits you most, but please do keep consistency within a program!

Q: Is there more Beagle documentation online?

A: Yes, but, to be honest, it's rubbish - you'll find it lists all the method names and variables for each object, but doesn't actually say what any of them do. Presumably this will be developed more in the future. You can read it all here: http://kubasik.net/beagledocs/BeagleClient/Beagle/index.html

Q: Do you have any more rhetorical questions for me?

A: What shall we do with a drunken sailor? Shall I compare thee to a summer's day? What's up, Doc? Right, enough of that - go do some coding!

Monkey business

If you followed part one of this tutorial series you should have a number of libraries available - Avahi, D-bus, Gtk-sharp, ipod-sharp and more. And now you know how to make use of Beagle, you can just transplant that knowledge to these other libraries and surprise yourself with your elite skills! But be warned: documentation is very thin on the ground; Beagle - weak as its documentation is - is actually one of the better-documented libraries around. I'll try and get onto some of these libraries in future issues, so just keep on reading...

Quick tip

+= lets you subscribe methods to events, but its opposite, -=, unsubscribes methods.

Quick tip

"using" lines let you write shorter code, but sometimes it can have collisions. For example, .NET has a Timer class in System.Timers, but SDL.NET has its own Timer class. If you had both of these namespaces as "using" statements, Mono wouldn't be sure which Timer you meant when you wrote "Timer foo = new Timer();", so you'd need to be specific and say Timers.Timer.

Quick tip

After Application.Quit() is called control gets passed back to your Main() method where it carries on executing immediately after Application.Run(). The Quit() method just tells GTK to finish - you're free to carry on doing as much as you want.