Chris DiBona - interview

From LXF Wiki

Power Broker

The mighty Google is bringing its wealth and expertise to open source. Paul Hudson talks to the man who makes it happen, Chris DiBona.


When Google was announced as the co-sponsor of O'Reilly's annual Open Source Awards last summer, there was no question of prizes going unpaid (as has happened in the past). This is a company that can afford to put money where its mouth is. Some in the community think Google's involvement is wonderful: as well as having deep pockets, it uses open source tools and is actively involved in development. But others are sceptical it sees itself as special, they argue, an too much of its own software is closed Chris DiBona, Google's enthusiastic open source program manager, has been tasked with the delicate job of opening up some of Google's applications. We met him to find out just how much code will be released...

Linux Format: Just so we can introduce you to our readers, could you give a rough overview of your job at Google? Chris DiBona: Sure. It's actually quite simple. My job is to make sure that Google complies with open source licences, so that there's no question as to whether we're using the software itself correctly. My job is also to run open source outreach for the company, which includes making sure the open source community knows who we are, knows what we're doing with their software, how we're using it and how we can help. So we do a lot of patching, and other things to existing codebases, Apache... It's sort of my job to make sure that happens.

We also run programmes like the Summer of Code. This was where we incentivised 419 students to work on open source projects, working with the Apaches and Ubuntus of the world to ensure that our community keeps on getting more people developing.

LXF: You're an evangelist, almost?

CD: Not really. Google is not really an evangelist kind of company, we sort of let what we do speak for us. Rather than having us yammer about what we do, we let the yammering be done by the product.

LXF: Apache got quite a few students for the Summer of Code, KDE and Gnome got quite a few ­

CD: Yes, Mono, that got 16.

LXF: How many did the kernel get?

CD: So this was the funny thing. It's hard to say, "Hey, kernel guy, do you want this?" Because it's not like the Apache organisation where there's a central body for that kind of thing. So the kernel didn't get any. The thing is that also, the kernel doesn't need any in the way that say KDE does. There's always an influx of people going into the kernel.

The other side of this is that I did not actually pick the organisations, I set a deadline for when organisations could ask to join, and no one from the kernel came to me.

LXF: But the end result was no actual kernel hackers!

CD: You know, it's not just about the kernel.

LXF: No, but it's probably the largest open source project?

CD: I don't think so. I think Apache is the largest. In terms of developers... it's just a much larger project. I would have liked to [include the kernel]... but who would I talk to? Who's the accountable party that's going to give me the reviews of these kids and approve their money, and who do you pay? Maybe there would be a way to do it if we do this again. It's tricky.

LXF: But Novell employs quite a number of kernel hackers who could have acted as mentors.

CD: Sure. They didn't come to me.

LXF: Red Hat.

CD: They didn't come to me. I mean, the Fedora Core people did, you know. So there were people working on some interesting Fedora core work, something with Ubuntu, but no one listed really called about this, so...

LXF: It's just a bit of a shame really, because the Summer Of Code grants came out at the same time that OSDL laid off quite a few engineers.

CD: Well, we weren't hiring engineers, were we, we were bringing students into the fold as it were. If your readers want to do kernel development for a job, we have openings, Novell has openings. There is a lot of opportunity for people in Linux kernel development.

LXF: Summer Of Code wasn't done for PR, clearly.

CD: No!

LXF: So what was the prime motivation ­ was it a cunning recruitment drive?

CD: The way this started, actually, was our founders, Larry and Sergei, and our CEO Eric Schmidt, they worried actually for a number of years that people don't have the means to just go to school and just study and just become a better student, a better computer scientist... what do they end up doing over the summer? They end up dulling themselves, they end up working on things that have nothing to do with computers and not developing, thus retarding their development as computer scientists. [Google's founders] hated that. So after I got my sea legs on Google, the others said, "Can you fix this problem? We'll give you $x million a year to fix it ­ and it would be great if you could do it in an open sourcey kind of way." So I did, and I came up with this programme. That was the primary motivation for the founders.

Now, there are going to be side benefits ­ competitively, strategically, all sorts of stuff. Because a strong open source community means a better, more level playing field for all of us in the computer industry, I truly believe that. But it wasn't really the primary motivation. And it's just very good for open source and we love that. We use a ton of open source software and consider ourselves pals, you know? We like to work with the open source community, and this is a way of doing it.

LXF: In what other ways would you say that Google is sponsoring open source?

CD: Actually I don't like the word `sponsoring'. I don't like sponsoring, I don't like `subsidising', I don't like `giving back'. The words I like are `working with' them. We see them as our peers in computer science, we don't see them as people who need sponsoring, frankly. The funny thing about open source is that over the last five years, a lot of these organisations that you and I consider prominent have got a ton of support ­financial, operational through things like Tigris, SourceForge. They don't need sponsorship, what they need is more people developing code.

So what do we do? We have some code that we've released, on code.google.com, we have tried to spin up more patching activity from our teams at Google and in the local teams outside of Google, we have a number of people who are now working on GCC and working on improving that, and programmes like the Summer Of Code... just making more code.

Because what it comes down to is, you can go on and on and on a lot about helping and sponsoring but there are concrete benefits that projects get from receiving code itself. And those benefits heavily outweigh the transitory benefits of simple things like travel and so on.

So I don't see the word `sponsorship' as being appropriate. Because sponsorship also implies stewardship. We don't want to run open source, that's not who we are. I have to tell you, I've admired how IBM has gone about this. They've for the most part not screwed up: they haven't taken things over, they haven't managed to break anything, they've done a lot of good work. We're not going to use that as a model for what we want to do, because we're different companies, but I really want to get code out there, I don't want just... money. Money's not enough.

LXF: OK then, can you list the way in which Google is `working with' open source? You have Summer of Code, Google APIs ­

CD: We release code. So when we create software at Google we have a number of different tools that we use We have our own malloc, for instance we have a number of performance tools, we have our own hash table implementation, we have some libraries in Python, which sort of act in a functional way to the way that Elisp might...

And we've released these tools so that anyone can make better software. You could release just API examples and all the rest into open source, and that's great, but they can't exist without Google. We wanted to release things that would not just allow you to work better with us, but allow you to make better software, period.

LXF: What's the preferred Google licence?

CD: We usually use the BSD. We've released things under the LGPL as well. We try to make sure that people are able to use what we release, and GPL and LGPL is good for that, but the GPL keeps proprietary software vendors from using it too, so often our engineers don't select it. We do let the engineer choose the licence itself. If they feel passionately about it they can choose whatever they want; we're more interested in getting code out there than in any big religious dispute.

We also choose the BSD because it's a little easier from our perspective and with our staffing levels to keep track of it ­ you don't have to. With the GPL you have to have more of a stewardship role with that piece of software, whereas with the BSD you can say, "Here it is, take it, have fun", and forget about it in a lot of ways.

LXF: How much of Google's in-house developed software goes open source in the end?

CD: Oh, not much right now. We've only really got started on releasing code. But we release a bit more every month. I have a number of engineers who come to me every week, every month, whatever and say, "Hey, I'd really like to release this." I outline how we do it, you know, make sure there are author-style Readmes, the licence files, all that stuff is correct. We make sure that all the legal people are happy with what we are doing, and then we release it.

LXF: But in four years' time, will Google be a 100% open source company?

CD: Oh, no. There's no way. There are some things we can't open source, because they're either licensed and not open to people, or it would be wasteful to open source it.

For instance, there is some of our software that unless you have a data centre with, say, more than a hundred computers, you can't really use it. Or if you don't have a data centre that's exactly like ours in the way that we've architected it, you end up spending the whole time just relaying the software, and there's no real point to it. If it's not going to see some concerted broad use, there's no real point in releasing it. Because it takes time to release software, and if someone wants to release a piece of software I'd rather have them release something that people can use immediately rather than having to go out and buy a data centre that's just like ours and use hardware that's just like ours.

And then there are other things. We're never going to release PageRank [Google's trademark system for ranking web pages in its search index], we're not going to release things like that, because to release them would ruin them. If you release how you do the ranking function, suddenly every web scrambler in the world screws up the rank and Google search becomes useless. We don't want to do that.

LXF: So it's consigned to obscurity?

CD: You could put it that way, but it's not true. It's not. Here's the thing. It's very easy to disparage security, but it also shows you don't understand the web market. I hope that doesn't sound too rude, but the fact of the matter is that there are people out there who work every day to try to solve the index for spamming, and if you make their job easier you lose the war. It's just this ongoing battle that we have to fight so that we can continue to provide good quality search results. And we think we're doing a good job at it, but it would become that much harder if we released these functions.

I also don't like releasing code that isn't under active development at Google, because our engineers are so busy that if someone says they want to release this thing they're not actively working on, we're just going to be throwing it over the wall and it won't be very useful. I like to release code in such a way that it doesn't leave outside developers hanging.

LXF: You have to be able to provide some maintenance for it in the long term?

CD: Exactly.

LXF: You said previously that Google uses open source extensively throughout its search engine. What kind of thing?

CD: We use a lot of what you would consider Linux. Linux kernel bintools, the compiler chain, Perl chain, SSH, SSL, that kind of stuff. We don't use a lot of Apache, which is a surprise to a lot of people, because we have our own web servers, but there are other things we do use.

We get a lot of use out of the kernel. It's fantastic having an open source kernel, too, because if we want to make a change we just make a change, you know? If we're wondering why there's some strange behaviour going on we can figure it out because we just look at the code. It's fantastic.

Then we use TomCat, we use a number of different projects all over.

LXF: Do your engineers contribute back as they need to?

CD: As they need to, and a bit of as they want to. We kind of do more of the as you want to, because we all like to see more patching happen. That's coming. It just takes time.

LXF: Do you find as it stands, though, that you end up maintaining quite a long patch list to the kernel?

CD: For the kernel, yes, because there are a lot of our patches that the mainstream kernel would never, ever want. I'm not kidding. But more importantly, you wouldn't want to run them. Because there are changes to some very fundamental things in the Linux kernel so that it works well within our environment, a lot of things in the networking stack, and that kind of stuff. We actually.... as we're doing further development on it, Walt Drummond, who runs that group, and I are working together to see how many different ways we can get the patches out there, even the ones we consider to be dirty and awful, and maybe just posting them on Google Code even though nobody on the kernel team will ever, ever accept them. But we'd have up them up there so that if people do want to do the kind of things that we're doing, they can apply those patches.

LXF: Or maybe do it properly, perhaps: make the same changes in a nice, clean ­

CD: You're assuming there is a "properly", for us, that's the same as the properly for you. But that's the great thing about open source software, right? We can determine our own software destiny, our own software suitability. Because what's suitable for us is likely not suitable for you. You're not going to have the same kind of profile that we need from our kernel.

LXF: I guess that at least putting the patches out there gives someone like Red Hat saying, "Look, we've got the AS version that some customers could really use" ­ it gives them the option at least to turn it down.

CD: Yes, I would like to give them more options for sure. I also want to express what we do, so that people who are looking at kernels and looking at using Linux can say, "Oh, Google uses it", you know? There's some power to that, and it's nice ­ I want people to know that Google uses open software and loves it. Because we do. And maybe that will change one person's mind about open source software and they'll start using it too.

LXF: It's not open source, but there are some great hacks on the Google Maps API. The one where you can go to Chicago and map out the prostitution or whatever. You can see the crime in any city, anywhere ­ it's brilliant!

CD: Yeah, it's cool stuff.

LXF: Would you say that as Google extends its `claws' ­

CD: Google's a monster!

LXF: Sorry, as Google extends its reach into other areas, there'll be more software like PageRank that you won't open source? How about Google Maps?

CD: The thing that a lot of people don't realise is that Maps isn't free for us. We're not just using government data, we're using commercial databases, we have to pay for those. So we had to come up with a deal where we could let people use our maps on their own sites, which is very unusual ­ normally when people do maps APIs they are directing people to their sites. But we wanted an experience where people could take ownership of it, and really use it the way they want to use it.

So Maps is all contracts and deals and legal things, and it really does limit how you can use that. It can't ever really be open, in the way that other things could be.

And here's the thing a lot of people don't realise: when you build something at Google, we've got this terrific layer of infrastructural code that provides web services and all the rest. So for us to release these things that you consider Google, we have to release these things down here.


LXF: It's all quite inter-linked, is what you're saying.

CD: Extremely so. But that's also part of what makes Google so much fun to work for. Internally you can create a service and launch it very quickly if you want, this is why Google Labs is so fun.

If somebody said, "I really want to launch a Moon Maps version," they could, because we already have all that software. It's not hard to launch a new service if you have a really good idea. LXF

        Chris DiBona
   talks about Summer of
  Code, server stacks and
not prostitution in Chicago at
  http://www.linuxformat.co.uk/
      mag/dibona.html.
        Don't miss it!