Greg Kroah-Hartman - interview

From LXF Wiki

INTERVIEW GREG KROAH-HARTMAN

THE KERNEL COLONEL

Who drives Linux driver development? Meet Greg Kroah-Hartman ­ kernel champion, devfs headsman and inexplicable lover of PCI Hotplug subsystems.

Greg Kroah-Hartman is of a rare breed ­ he actually enjoys developing hardware drivers. This is just as well as he's responsible for many of them, as well as the underlying subsystems they connect to. USB, PCI, I2C and the virtual filesystem sysfs are all part of Greg's domain. Much of the transparent driver functionality we take for granted in the Linux kernel is thanks to Greg's stewardship of many of these technologies, and he's keen to demystify kernel and driver development. In the introduction to O'Reilly's Linux Device Drivers he wrote, "driver development is not a scary and forbidding place," and he does all he can to bring more developers into the inner sanctum of Linux development. Will you be his latest convert?

Linux Format: Along with Chris Wright you're pioneering the third, 2.6.x.y branch of the Linux kernel now, where there's the super tree, the unstable tree, then yours. But who needs this extra layer when we've all been getting along perfectly well without it?

Greg Kroah-Hartman: Well, it turns out that a lot of people actually run www.kernel.org kernels, and they rely on that, they don't rely on their distro. So we want to make it easy for them to have bugfixes, and we want our testers to have bugfixes. Security updates are a big problem: when we have a security patch, some people don't want to buy a patch, they just want to download a whole new kernel and go. So that's a big concession and we know we have to fix that. We have a set of rules that seem really nice and strict. It's working out so far ­ the tag team approach with Chris and I doing it, personally speaking, has worked out really good ask Chris to see if it's working out for him. It seems that users like it. The distros like it because they can base their kernels on it. They don't have to carry those individual little patches.

As an example, your kernel is based on 2.6.1 1-4...

LXF: I switched to Robert Love's inotify kernel. That's probably why it's so stable actually! So, is the kernel going to be this way permanently?

GKH: As permanent as anything is: it's working out now, and if it doesn't work out in the future, we'll... change. It's not like anything is set in stone. You try to adapt.

LXF: You have mentioned before that other OSes are using Linux drivers. Syllable, IBM K42 ­

GKH: Hurd...

LXF: And Hurd, yes! Do you think that embodies the whole spirit of code sharing, or is it just really a bad thing in the long term?

GKH: No, I'm surprised that we're not sharing entirely. The IBM K42 guys don't want to write a driver ­they want to work on whatever they're working on in their experimental kernel. I'm not sure what they're doing, but they want their machines to work, so they have to have a driver. Nobody likes writing drivers. Some of us do, but a lot of people who are researchers don't, yet they want to get up and running without worrying about playing drivers.

LXF: If no one likes writing drivers, is that because it's so hard to do? Hard to debug?

GKH: I don't think so. I enjoy it, that's what I do. It's different; people traditionally look at drivers as low-end, bad, something you give to the new person coming into the company.

But a kernel is made up of three things: it will handle your memory, handle your I/O, then you're touching the hardware ­ that's the drivers, everybody needs them: you have to have a driver to get your keyboard to work. They're very important, yet traditionally they've been very low on the pecking order of what you write.

Hopefully, over the years Linus has got a bunch of really good people who have changed that, and our drivers are known for stability overall, so we're known for some really good stuff. Networking has been really, really good; SCSI, really good; USB is excellent ­ we support more new devices quicker than any other operating system. We did USB 2.0 support before anybody else did. Lots of other odd things we got before any other OS. Bluetooth, for instance.

LXF: There was quite a curious situation in the gap between Windows XP Service Pack 1 and Service Pack 2. SATA came out and SUSE supported it, Fedora supported it, Mandriva supported it. Everyone was saying how hard Linux is to install, but of course when you tried it on Windows, before SP2 came out it wouldn't do anything ­ it said, `No hard disks found.' You couldn't do anything, and suddenly Linux is actually easier to install than Windows ­ it will find all of your hardware for you.

GKH: Yeah ­ have you ever tried to install Windows?

LXF: I have to do it on this thing [indicates his laptop]. It's not easy.

GKH: Yeah, we support hardware faster. All the hardware developers use Linux to do the hardware bring-out. IA-64 was done on Linux, x86-64 was developed on Linux. They can do that, so the hardware system-level guys love Linux. They have the source code, they can see what's wrong with their hardware... The PowerPC guys have been doing some great work. They just released a paper on bringing up Linux on giant multi-processor PowerPCs with no firmware, no BIOS on there. They didn't have to wait for the BIOS guys, the hardware guys can get straight in to it.

So anyway, drivers are important, and hopefully they'll keep stable, because that's what everybody complains about. The drivers on my machine are probably different to the drivers on your machine. I use different ID drivers in it than you do, probably because we've got different hardware­ you use a different mouse controller.

LXF: It seems like the kernel has undergone a lot of security changes recently, the way it's handled. Don't you have a security website where people can submit patches to or comments on without it being publicly reported just yet?

GKH: Oh, we have the security@kernel.org mailing list.

LXF: How many people get that? Not many, I'd guess?

GKH: No, there's like five people on the security team. It's private, but it you look at the rules on it, it's not to be kept private for very long. It's like: "You send it here, we know about it, we'll fix it as soon as possible and get it out." And that is new, because there's a group called vendor-sec, a mailing list with all the different distributions and a lot of people that get together to coordinate those security updates.

Red Hat, SUSE, Mandriva, all get the security update on the same day, so traditionally they've always sorted it out. In the past we've done security things for the kernel through that ­now we're just made it easier. You find a security problem in the kernel, you can come here, make it easier for people to report things. All the other projects, Mozilla, Apache, have security mailing lists.

LXF: But how many are on the list?

GKH: The security list? Five or six.

LXF: So... you, Linus Torvalds, Andrew Morton, Alan Cox...

GKH: Chris Wright's in charge of it.

LXF: That's five already!

GKH: Maybe it's six. I don't know; it's very small.

LXF: Is Mark Cox, the security response guy at Red Hat, on it?

GKH: No, it's not for the distros. We will let the distros know when there's a problem coming out.

LXF: You've said that at the Kernel Summit in 2004 you touched a third of the kernel. I worked this out ­ that's 1.2 million lines and 850,000 lines removed, which is extraordinary. That seems like a gigantic rewrite, almost.

GKH: You have to take those numbers with a grain of salt. They're metrics. They could be adding and replacing exactly the same lines ­ but generally they're not... They're adding new drivers, revising kernel APIs, making things better.

LXF: How long did that take?

GKH: Eight months.

LXF: Eight months? For 1.2 million lines of kernel?

GKH: I should tell you the number of changes. Each individual patch counted as one, so I had a number of those that went in each one of the new kernel releases, and the number just kept getting bigger and bigger ­like 3,000 different changes. And that's a better way to look at it in terms of how logical change happens.

LXF: Certainly the changelog is getting pretty big ­ I think for 2.4.10 it was 1.5MB. That's huge.

GKH: Maybe we've been waiting too long between those releases. We realised that that's a load of stuff to happen at once.

LXF: My definition of stability is that things are not added and removed quite so drastically...

GKH: That's a very traditional view of software engineering, that things stabilise over time and nothing is going to change. That's not what we're doing. We have to support new hardware, we have to add new features, we have to fix things and we keep adding new things. You look at what happened between 2.6.0 and 2.6.8: big lists out there of all the different things that got added. We did a lot of new things, and arguably it's the best kernel that we had put out until that point in time, and everybody loves it, so it's worth it.

We're not having to backport stuff any more. That took up so much engineering time and energy from the kernel developers. I don't think people realise that. I don't ever want to do that again. The 2.4 issues really, really sapped people. We don't have to do that now: we're happy.

LXF: Although to be fair I think that Red Hat and SUSE have this five-year period ­ seven years from Red Hat I think it is now ­that they'll support you for...

GKH: Yes, they will, they'll give you an enterprise kernel that will be guaranteed for x number of years not to change, they say that they're not going to break the kernel API there ­and they guarantee that ­ and that's something that their customers want. So that's great.

LXF: But someone, somewhere will be backporting stuff for the next five years?

GKH: No... if you look at the rules and what they do, each one is different. Like, you can't add new features, for example. You can't support new hardware ­ it enters maintenance mode after x number of years, I don't know what it is. Other operating systems support old stuff with new hardware, but that's really hard. But that's up to those customers, and that's a real need, so that's great. There's also a need to always be on the leading edge, and other distros provide that.

LXF: It seems that some parts of Linux don't change very often though. Things like the init scripts are probably the slowest part of my bootup. There are projects coming out like InitNG, which are going to parallelise it, make it lighting fast. But at the moment, it's just really slow.

GKH: I have to disagree there. Init is doing some cool things. Fedora laid down: "Let's make our boot times faster, how do we chart this stuff?", and they had these boot charts so they could show where they're spending the time, and optimise that. Gentoo has a totally different rewrite of init scripts in parallel and dependency track, and it's been written differently before. The Red Hat and SUSE guys are working on something using D-BUS and everything's event-driven. Init is changing, and fast boot times are definitely something that people want.

LXF: Devfs was disabled in 2.6.13. Was that a prelude to it to being removed entirely?

GKH: That's right. You know, I started working on the driver model about three, four years ago with Pat Mochel. He wanted to get power management working properly, I wanted to get dynamic devices and persistent naming working, because devfs can't do persistent naming. [If] you plug in two USB printers, you power down and you power back up and they might come back on in different order. You start printing to the wrong colour printer ­ that's not a good thing. You want to have persistent naming, and Linux didn't have that ­ devfs didn't provide that.

So I did the driver model for a filesystem like udev to enable persistent naming. Now that we have udev, all the distros ship it, everybody's using it; we don't need devfs. And I have a series of patches that rip it out. It's like 8,000 lines of code removed from the kernel.

It was a great thing to be in there at the time ­ it prompted us to get some stuff done, but it had some incurable problems. I talked to the BSDs about theirs, and they have their issues too.

LXF: Are they yanking it also?

GKH: No, they're happy with theirs. They have other issues, but they like their devfs. Ours is written differently. It was never really maintained properly over time. The maintainer disappeared for three years ­ unmaintained code suffers a lot of bit-rot.

LXF: One of the things you're very firm on is binary drivers. Could you just explain why binary drivers ­say, the Nvidia driver­ are illegal?

GKH: The Nvidia driver on its own is not illegal. It's very simple; talk to a lawyer. I'm not a lawyer. The GPL explicitly defines linking.

LXF: Merging GPL code with non-GPL code?

GKH: Yes, when it links ­ because you need to do that when you load a module ­ when you're linking code into the kernel you get one system image that's covered under the GPL. It's not a grey area.

LXF: And that's illegal?

GKH: It's illegal. In the past, Linus used to have some rules, like, if you wrote your code on a different operating system, we'd allow that. It was a kind of exception, it was never necessarily codified. And Linus has come out in the last couple of years and said that's not true any more.

LXF: So how do some drivers get around this?

GKH: You can do things that are illegal if nobody sees you.

LXF: Sure, but you said that the Nvidia driver isn't illegal itself.

GKH: Because they don't ship anything that's illegal. You as a user do all the compiling and linking together. You cannot pass that compiled object on to somebody else without breaking the GPL.

LXF: There's a huge lawsuit waiting for all these Linux users...

GKH: Not to pick on Nvidia. They're not the only one who does that: a lot of people do that.

LXF: What do you think about Ndiswrapper?

GKH: Ndiswrapper is an awesome hack. Legally, again, you're linking two bits of code together that are the wrong licence. But it's an awesome hack, and I'm amazed that it works. On the technical side, I'm thrilled by it.

LXF: Does it make your life easier?

GKH: No, it doesn't make my life easier at all. Binary drivers make our lives hell. Users report problems in their kernel and if they've got a binary driver in there we don't know what it is it could be writing over any part of the kernel and cause it to oops [crash] or something bad to happen and we wouldn't know. Now, if you crash the kernel we do a report if you're running a binary driver and we'll basically say that we can't support that... if you have a problem, you're on your own. It has been known for people to modify that oops message to show that they're not running a binary driver, because they know they won't get any support for it.

LXF: You need some sort of checksum in there ­ but I guess it's open source, so that kind of security wouldn't work.

GKH: No, and I wouldn't want it to. It's not there to keep you from doing it, it's there to let us know that we won't be able to support you.

LXF: Apart from the potential `signed off by' thing [for tracking patches and changes in the kernel], how do you think the SCO lawsuit has affected development, if at all?

GKH: It might have affected Linux as far as the way that people perceive it, and use it, and in enterprises and in embedded and everywhere else, but as far as development goes, it didn't stop us at all. Our names got put on a few lawsuits and we had to talk to a bunch of lawyers, but that's about it.

LXF: There were no few weeks of doubt, spent quickly checking through source code?

GKH: No. Linux is the best documented large codebase around. It's all known exactly where this stuff comes from. You can trace the history of all these public changes back forever. So it's not like we don't know where the stuff comes from, we've always known where it comes from, it's all been out in the open.

You could turn around the other way: all these closed source operating systems, where are they getting their code from? How do we know they're not taking our code? I'm not saying that they are, but you know. I don't fear for our stuff at all. LXF