Sunday, 8 September 2013

Choosing a Linux Distro for an Embedded System

Many years ago, when my old employer Digital Equipment Corporation was trying to stave off the advance of Unix, it came up with the slogan "Unix ain't Unix". In other words, there were a lot of systems at the time (around 1988) that all called themselves Unix, but which in fact were all different - in management, in commands, and in the APIs they supported.

Fast forward to 2013, and Linux is really the only Unix that matters any more. (Yes, I know about BSD - more later on that - and I stand by my statement). But the statement "Linux ain't Linux" applies with just as much truth.

I've been using Linux as my main development environment for a couple of years now. I started with Ubuntu 11.04, for no better reason than I happened to have a DVD of it. It was a pretty decent system - it stayed up for weeks at a time and had a usable Windows-ish GUI. If only that were still true. Recent releases of Ubuntu rarely stay up for more than a day or two at a time, typically before the window manager dies leaving the rest of the system ticking away but completely unusable. And of course they made the incomprehensible decision to replace Gnome, which is dull but functional, with Spirit.  (There's an old - c. 1780 - and rather delicious quote that "Englishwomen's shoes seem to have been made by someone who has heard shoes described but never actually seen any". Ditto with Spirit and the Mac). The first thing I do whenever I bring up a new Ubuntu system is to replace it with "Gnome Classic". (The latest version of Gnome in turn seems to have been developed by someone who has heard Spirit described but never actually seen it).

For the last year I've been developing an embedded system for Internet traffic management and monitoring. From the beginning we've taken for granted that it would run on Linux. The question is, which Linux distro should we use? There are numerous choices: Ubuntu, Fedora, Red Hat, Centos, Arch, Gentoo - and those are just the well-known ones.

For sure Ubuntu is a poor choice. It's desperately trying to be a replacement for Microsoft Windows, and has way too much clutter and extra stuff for an embedded system. We're trying to keep the footprint small, both memory and virtual disk, and we really don't need to have three different GUIs, LibreOffice, three different database systems... you get the picture. So Ubuntu was out from the beginning.

I worked with a company that had selected Gentoo. The advantage of Gentoo is that you get to choose absolutely everything about the system, down to the tiniest details like which implementation of cron you use. The disadvantage of Gentoo is that you have to... do all that. It's true that it will give you an absolutely minimal system, tailored exactly as you need it, but it's a lot of effort, not to mention the learning curve. It might make sense when we're bigger. but right now we need everyone focussed on stuff that will really differentiate us.

Somewhere along the line we looked at BSD - we were using something at the time whose support was much better there than on Linux. What a nightmare! Everything has to be built from source - they have a repository system but it is 'temporarily out of service'. It's truly a system for hobbyists, like Xen.

I looked at Centos, and got as far as installing it on a system. Then I realised that it is rooted so far in the past that I'd almost have to dig out my stock of IBM punch-cards. In particular, it supports a truly ancient version of GCC (4.4 I think). We make extensive use of features from C++11, which means we need at least 4.7. There are ways to have a development environment which is distinct from the system's own build environment, but they look pretty terrifying and weren't something I wanted to try and get my head around - for the same reason I didn't want to become a Gentoo expert.

That left Arch. I'd heard good things about it, and it also tries to be minimalist, so it seemed to be the way to go. I  installed an Arch system without too much trouble, and got our system up an running on it. The only problem was log4cxx, which isn't available as a package and which wouldn't build from source either. Like much Linux software out there, it has a bunch of outdated assumptions about implict include files which don't work with recent versions of gcc. But the changes were simple and we quickly had a version that would build.

Networking in Arch is very quirky. It starts with ethernet devices, which instead of being called eth0 and so on, have names which reflect the PCI heirarchy like 'ep5d3'. It's a nuisance but not a major problem. But then it turns out they selected a completely different way to manage networking than other Linuxes. User administration is completely different, too. I'm sure the answer would be "but you can always build whatever you want and do it your own way." True, but not especially helpful.

Anyway, we persevered with Arch, and got our systems running. It took the passage of time to realise that Arch is constantly changing - as in, every day. An Arch system installed and configured today won't be the same as one installed tomorrow. Anything and everything can change - the kernel, the utilities, the drivers. When Boost 1.53 came out, Arch had it a few days later. Switching Boost versions is not something to be undertaken lightly, and indeed our system wouldn't build - some incompatible change involving locales, themselves a completely incomprehensible feature of Linux.

Our biggest problem came from trying to integrate the Intel DPDK package for high-performance user-space networking. Now, DPDK is essential to what we're doing. But  it is hardly a model of stability either, with a new version coming out practically every week. The combination of this with the ever-changing sands of Arch, especially kernel changes, just made it impossible to keep up. If we got things working on Monday, they'd be broken again on Tuesday.

We looked into somehow selecting our own stable intercept of Arch. In a VM environment, it's easy enough to build a master VM and just use that. But our system also has to run on bare metal, which is not so easy. There is, supposedly, a way to take a snapshot and make a private repository. But once again, the investment in time is just not something a tiny group like ours can afford to make if we are to ship a product in a reasonable time.

And so, with great reluctance, I made the decision last week that we will ship our product on Ubuntu. I know that it is really not the right choice for an embedded system. But it works, and it doesn't change on a daily basis. We're used to its quirks, like yet another gratuitously incompatible set of network configuration tools. Hopefully we'll have the luxury of re-examining this later on when we have more people and more time to look at it.