Saturday 9 February 2013

Ubuntu + Xen = Disaster!

I'm writing this as I reinstall Ubuntu 12.10 on my new test machine. My idea - to test some networking software I'm working on - was to buy a new high-power desktop machine (done), install Ubuntu on it (done), then run Xen to create a simulated network.

Xen is really the open source software from hell. Nothing works the way it's apparently supposed to, and actually in the end nothing works at all. You search around on Google looking for people who've had the same problem, you try the fixes they suggest. Sometimes they work, taking you a few milliseconds further towards failure, other times they have no effect. It's just incredibly frustrating.

So now I'm reinstalling the system and I'll run VMware instead. It's a shame, I liked the idea of Xen, being free and open source and all that. But unless you just have way too much time on your hands, it's hopeless. For me, this whole business of building systems is a distraction from the important matter at hand, which is writing code for my own software. The less time I spend on it, the better.

The first problem was the book I bought. I was on a business trip for a week, so I thought the ten hours or so on the plane, and lonely nights in a hotel, would be the perfect occasion to get myself up to speed. I bought "Running Xen" by Matthews et al. The "et al" turns out to be extremely important. The book is terribly written, by Matthews' entire class of students, none of whom can write very well, and for sure can't write consistently. It never really quite tells you how to do anything, constantly distracting itself with dire warnings about all the bad things that can go wrong. Actually, given my experience with Xen, maybe that's not completely inappropriate.

I bought another book, "The Book of Xen". This is at least well written, and it probably isn't the authors' fault that none of the recipes they give for how to do things actually turn out to work.

Anyway, let's go back to when I didn't know how bad this all was. I installed Xen, made the necessary configuration changes, and rebooted. Woohoo! There I was running a hypervisor. I could type "xm list" and see it, complete with my one and only virtual machine (Dom0 in Xen-speak). So now, all I had to do was follow the recipe from the book, and I'd have my software - which was already running on the bare machine - running on a VM.

Everybody tells you that if you want your VM to use a file as its virtual disk, you absolutely shouldn't use the "loopback driver", you should use something called blktap. So naturally that's what I tried to do. I copied magic incantations into various configuration files, created my virtual disk, and tried to mount it.

Nothing worked. I tried all sorts of combinations of things, and of course I Googled all the error messages and the like. I found lots of suggestions, but nothing that actually helped. That was when the curtain started to open on the fire and brimstone of open source hell. There were loads of responses along the lines of "ah, but you need to install xxx then edit /etc/yyy". But xxx doesn't work on Ubuntu. Google. "xxx doesn't work on Ubuntu, but you can do xxxzzz instead". Well yes, but "xxxzzz" doesn't actually do what "xxx" does. And so on. One article even says "the great thing about Xen is that there are so many different ways to do the same thing." Maybe they had their tongue in their cheek. In any case they were wrong, because the truth is that there are so many ways to fail to do the same thing.

I was about to give up when I suddenly thought to try the dreaded loopback driver, which is supposed not to have good performance. (I've heard that Butler Lampson, one of the great minds in computer science, once said "Performance is a characteristic of a correctly functioning system" - in other words, it doesn't matter what the performance is, if it doesn't actually work. When I had the opportunity to ask him he denied having said it, but it's true anyway).

And... it worked! Replacing "tap:aio:" by "file:" suddenly had everything working correctly. I could mount my virtual device, treat it just like a disk, copy my host system onto it. Now I was ready for the next step, boot the VM with an exact copy of the Ubuntu 12.10 system that was running as the host.

I even tried going back to blktap for the guest system, figuring that maybe there was some kind of conflict with running it in the host. Well, that of course got nowhere. But I edited the config file and... it got another five whole lines further in the console log before failing in a different way.

I Googled the error messages, as you do with open source (open source would just never work at all without Google). The problem was that the root file system wasn't getting mounted. This may (or may not) have something to do with the way Xen deals with booting guests. Instead of doing what the hardware does, and executing files from the guest's disk image, it boots using the host kernel, then somehow flips over to the guest in mid-boot. Why it does this I don't know, but it creates whole chapters in the textbooks explaining how to deal with kernel incompatibilities and the like. And evidently I'd just been bitten by one of these - even though I was using the same system for the host and for the guest.

To cut a long story short, I tried a lot of things - Pygrub, virt-manager - and none of them would work. They all failed in some incomprehensible way, and Googling just produced confusing, conflicting advice, which typically started with "rebuild the kernel..." or "install this completely different toolset".

Yet Xen is in widespread use by cloud hosting companies - for example, by Amazon. I can only suppose that if you have the resources to experiment with different distros, kernel builds, toolsets and configurations, you can eventually find a combination which works. But it certainly isn't for the casual user like me, who just wants to get something working in a day or so. It may be that Ubuntu, or its latest version, is part of the problem. The Ubuntu web pages seem to say it should all work, but there are also a lot of references to things that don't quite work the way they're supposed to under Ubuntu.

I suppose I should have known, really. I worked for a while with a company which had a virtualized software product. They'd started with VMware as the base, and then customer pressure had forced them to port to Xen - just before I joined and found myself responsible for it. It was a nightmare - nothing worked, Citrix (the owners of Xen) were incapable of providing support, and the Xen open source community just laughed when we asked them for advice. "Oh, you're using the xyz toolset - nobody uses that any more, everyone is using pqr. The latest release is pretty good, it mostly works and there are quite a few patches for the stuff that isn't really there yet." The project over-ran by months and only "worked" thanks to numerous hacks and workarounds.

So, here I am reinstalling the machine. I'm annoyed about the time I've spent trying to understand Xen, that I'll now have to spend getting to grips with all the utilities for VMware. But it surely can't be worse... can it?

1 comment:

Anonymous said...

I tried by myself installing Xen on a Ubuntu 12.04 and I found it quite difficult as well.
Then I discovered XCP
http://www.xen.org/products/cloudxen.html
which is the base on which the commercial Citrix XenServer is built. The nice thing is that it can be used through the same management tools (i.e., you can use Citrix XenCenter) and it exposed the same APIs.
The installable distribution is VERY easy to use, and it works! Give it a change installing it on a VM :-)