Sunday, 8 September 2013

Choosing a Linux Distro for an Embedded System

Many years ago, when my old employer Digital Equipment Corporation was trying to stave off the advance of Unix, it came up with the slogan "Unix ain't Unix". In other words, there were a lot of systems at the time (around 1988) that all called themselves Unix, but which in fact were all different - in management, in commands, and in the APIs they supported.

Fast forward to 2013, and Linux is really the only Unix that matters any more. (Yes, I know about BSD - more later on that - and I stand by my statement). But the statement "Linux ain't Linux" applies with just as much truth.

I've been using Linux as my main development environment for a couple of years now. I started with Ubuntu 11.04, for no better reason than I happened to have a DVD of it. It was a pretty decent system - it stayed up for weeks at a time and had a usable Windows-ish GUI. If only that were still true. Recent releases of Ubuntu rarely stay up for more than a day or two at a time, typically before the window manager dies leaving the rest of the system ticking away but completely unusable. And of course they made the incomprehensible decision to replace Gnome, which is dull but functional, with Spirit.  (There's an old - c. 1780 - and rather delicious quote that "Englishwomen's shoes seem to have been made by someone who has heard shoes described but never actually seen any". Ditto with Spirit and the Mac). The first thing I do whenever I bring up a new Ubuntu system is to replace it with "Gnome Classic". (The latest version of Gnome in turn seems to have been developed by someone who has heard Spirit described but never actually seen it).

For the last year I've been developing an embedded system for Internet traffic management and monitoring. From the beginning we've taken for granted that it would run on Linux. The question is, which Linux distro should we use? There are numerous choices: Ubuntu, Fedora, Red Hat, Centos, Arch, Gentoo - and those are just the well-known ones.

For sure Ubuntu is a poor choice. It's desperately trying to be a replacement for Microsoft Windows, and has way too much clutter and extra stuff for an embedded system. We're trying to keep the footprint small, both memory and virtual disk, and we really don't need to have three different GUIs, LibreOffice, three different database systems... you get the picture. So Ubuntu was out from the beginning.

I worked with a company that had selected Gentoo. The advantage of Gentoo is that you get to choose absolutely everything about the system, down to the tiniest details like which implementation of cron you use. The disadvantage of Gentoo is that you have to... do all that. It's true that it will give you an absolutely minimal system, tailored exactly as you need it, but it's a lot of effort, not to mention the learning curve. It might make sense when we're bigger. but right now we need everyone focussed on stuff that will really differentiate us.

Somewhere along the line we looked at BSD - we were using something at the time whose support was much better there than on Linux. What a nightmare! Everything has to be built from source - they have a repository system but it is 'temporarily out of service'. It's truly a system for hobbyists, like Xen.

I looked at Centos, and got as far as installing it on a system. Then I realised that it is rooted so far in the past that I'd almost have to dig out my stock of IBM punch-cards. In particular, it supports a truly ancient version of GCC (4.4 I think). We make extensive use of features from C++11, which means we need at least 4.7. There are ways to have a development environment which is distinct from the system's own build environment, but they look pretty terrifying and weren't something I wanted to try and get my head around - for the same reason I didn't want to become a Gentoo expert.

That left Arch. I'd heard good things about it, and it also tries to be minimalist, so it seemed to be the way to go. I  installed an Arch system without too much trouble, and got our system up an running on it. The only problem was log4cxx, which isn't available as a package and which wouldn't build from source either. Like much Linux software out there, it has a bunch of outdated assumptions about implict include files which don't work with recent versions of gcc. But the changes were simple and we quickly had a version that would build.

Networking in Arch is very quirky. It starts with ethernet devices, which instead of being called eth0 and so on, have names which reflect the PCI heirarchy like 'ep5d3'. It's a nuisance but not a major problem. But then it turns out they selected a completely different way to manage networking than other Linuxes. User administration is completely different, too. I'm sure the answer would be "but you can always build whatever you want and do it your own way." True, but not especially helpful.

Anyway, we persevered with Arch, and got our systems running. It took the passage of time to realise that Arch is constantly changing - as in, every day. An Arch system installed and configured today won't be the same as one installed tomorrow. Anything and everything can change - the kernel, the utilities, the drivers. When Boost 1.53 came out, Arch had it a few days later. Switching Boost versions is not something to be undertaken lightly, and indeed our system wouldn't build - some incompatible change involving locales, themselves a completely incomprehensible feature of Linux.

Our biggest problem came from trying to integrate the Intel DPDK package for high-performance user-space networking. Now, DPDK is essential to what we're doing. But  it is hardly a model of stability either, with a new version coming out practically every week. The combination of this with the ever-changing sands of Arch, especially kernel changes, just made it impossible to keep up. If we got things working on Monday, they'd be broken again on Tuesday.

We looked into somehow selecting our own stable intercept of Arch. In a VM environment, it's easy enough to build a master VM and just use that. But our system also has to run on bare metal, which is not so easy. There is, supposedly, a way to take a snapshot and make a private repository. But once again, the investment in time is just not something a tiny group like ours can afford to make if we are to ship a product in a reasonable time.

And so, with great reluctance, I made the decision last week that we will ship our product on Ubuntu. I know that it is really not the right choice for an embedded system. But it works, and it doesn't change on a daily basis. We're used to its quirks, like yet another gratuitously incompatible set of network configuration tools. Hopefully we'll have the luxury of re-examining this later on when we have more people and more time to look at it.

5 comments:

Anonymous said...

All debian-based distros support debootstrap which will create a minimal file system for you (there are flavors, check manpages tutorials carefully!) which are quite small. Not 64 MB flash memory embedded but still quite small. And you get full apt-get support, of course.

Badger said...

This is a great post, and one I've been battling as well for a while
now...

I think you should also add Posix is dead and it's now
GNU/Linux. While there seem to be some niche areas where people are
still building to Posix spec, it's what GNU/Linux does that is
important... (I make the GNU point for glibc).

It used to be the case when my window manager died that I would switch
to a vty, kill it and it would restart. Of course at that point you
may as well reboot the whole system. But at least I could do an
orderly shutdown rather than risk trashing too much on the disk.

Ubuntu's Spirit is indeed super-schizophrenic. I went far back in time
to fvwm! I've always been tempted to run ratpoison to be honest, but
probably still need a mouse for web browsing...

Ah the pick-your-linux question. It is indeed a great one. I have
begrudgingly begun to accept that there were some very useful people
solving this type of problem for me in a "large networking company".

Of course, if you're extending and reselling some existing box, you
really need to take the vendor's upstream and modify it.. If you're
custom building H/W though, that's painful...

For the later I've been looking at :

https://www.yoctoproject.org/

- But that's really for a *real* embedded system, not one of these x86
boxen.

For the former, even for Ubuntu (or RHEL/CentOS), there's always the
"Server" option when you install, which seems to cut out all the GUI /
XWin crap.

But you're right though, going from Ubuntu's 3.5.0-40-generic, to
CentOS's 2.6.32-358.el6.x86_64, is a bit of a shock...

Did you know about:

https://fedoraproject.org/wiki/EPEL

e.g.:

https://dl.fedoraproject.org/pub/epel/6/x86_64/repoview/gcc-x86_64-linux-gnu.html

gcc-x86_64-linux-gnu-4.7.2-2.aa.20121114svn.el6.1.x86_64

- That is binary compatible with CentOS 6.

And there's this!

https://dl.fedoraproject.org/pub/epel/6/x86_64/repoview/log4cxx.html

Seems that CentOS is too cold, and Arch is too hot, are you looking
for the Goldilock's OS :-P ?


I've got a question though on this:

> There are ways to have a development environment which is distinct
> from the system's own build environment, but they look pretty
> terrifying and weren't something I wanted to try and get my head
> around.

I was wondering the same, so landed on :

http://www.shermann.name/2011/03/building-packages-for-centos-5-on.html
http://www.lucas-nussbaum.net/blog/?p=385

http://fedoraproject.org/wiki/Projects/Mock
https://wiki.ubuntu.com/PbuilderHowto


Of course my gut feeling is that it's much easier just to install a
"developer workstation" VM of the OS, and do the work in there...

That would mean 1 VM for each target of course eventually...

Perhaps fully embedded is easier than this type of packaging.

"Hopefully we'll have the luxury of re-examining this later on when we
have more people and more time to look at it."

HAHA Good one!

http://i.snag.gy/kdu77.jpg

Luigi said...

you indicate DPDK was essential for you, have you looked at netmap http://info.iet.unipi.it/~luigi/ which has similar features and is much less intrusive ?

cheers
luigi

Luigi said...

you mention DPDK is essential to you. Have you looked at netmap http://info.iet.unipi.it/~luigi/netmap/ which has similar features/perfoemance and is much less intrusive on the kernel so it may give you less trouble with portability ?

cheers
luigi

n5296s said...

@Luigi: we did originally try Netmap - in fact you and I discussed in the office of a well-known search company in Mountain View a few months back. In the end we went with DPDK because it is aimed at Linux rather than BSD and because it supported our use case of pipeline processing "out of the box". However it does impose quite a few constraints that I'd rather not have - it really wants to be your mini-OS for the app which absolutely doesn't suit us and takes quite a bit of working around.