Tuesday 23 August 2011

Boost: a retrospective (part 1)

My love affair with Boost started with my first, self-appointed programming task at Anagran, the fan controller for our box. I wanted a table of functions, corresponding to each of the temperature sensors. Some of these were parameterless, corresponding to unique items, while others were indexed by interface card number. I wanted to be able to put a "partly cooked" function object in the table, with the interface number frozen but other parameters to be supplied through the ultimate call. This is called a "partial function application" or "partial closure" in computer science.

STL provides C++ with some glimmerings of functional programming, with "memfun", "bind1st" and so on. It seemed like it ought to be possible to write something appropriate, but making it usefully generalized also seemed like a lot of work. Surely someone must have done this already!

Searching for it led me to Boost, "one of the most highly regarded and expertly designed C++ library projects in the world" as they modestly say at the top of the front page. It is however true. It's a huge collection of highly-generalized classes and functions for doing an amazingly large number of extremely useful things. It's an open-source project whose authors, while not anonymous, keep a very low profile. I can only assume they love a challenge (and have a lot of spare time), because they do some extremely tricky things, under the covers. But for the user, they're mostly very straightforward to use.

So over the last five years, I've discovered more and more that can be done with Boost. Although I've called this a "retrospective", I'm not planning to stop using it.

Boost makes extensive use of "template metaprogramming", which is a kind of compile-time computing. When C++ templates were invented, the idea was to allow simple compile-time parameterization of classes and functions, for example so you could write a "minimum" function to return the lowest of its arguments regardless of whether they were int, float, double or some user-defined class. As the concept evolved, it became possible to make very complex choices at compile time. In fact, you can write just about any program to produce its output directly from the compiler, without ever even running it, if you try hard enough. It's hard to get your head around, but fortunately you don't need to.

Function and Bind

These were the first Boost packages I discovered. Function defines a general, templatized function class. So you can define a variable as "function<int(foo*)>" and assign to it any suitable function. In particular, assign a member function of the foo class and all the right things will happen.

The Function class is useful, but it is the Bind class that really transforms things. You can take any function, bind some or all of the parameters to specific values, and leave the others (if any) to be supplied by a subsequent call to the bound object. This is exactly what I was looking for in my fan controller. For example, suppose you have a function "int foo::get_temperature<(double)>". Then you can write:

  function<int(double)> fn =
    bind(&foo::get_temperature, my_foo, _1);

to store a function which will apply its argument to the "my_foo" instance of foo, which you use for example as:

  printf("temperature at %f is %d\n", v, fn(v));

(Of course you shouldn't be using printf, you should be using boost::format, but that comes later). The "_1" is a placeholder, whose meaning is "take the first parameter of the final call, and put it here". Bind takes care of types, making sure that the actual parameter is (in this case) a double, or something that can be converted to it. If you want to, you can even apply bind to previously bound functions - though you might want to ask yourself why you're doing it.

This is absolutely perfect, for example, for callback functions that need to keep hold of some context. In C you do it using void* arguments, which is unsafe and generally wretched. This can be avoided in C++ by defining a special-purpose class, but that requires the caller to know about it, which ties everybody's shoelaces together more than is healthy.

The only problem with function/bind - which is true of any code that makes heavy use of templates - is that compiler errors become incredibly verbose and just about useless. A single mistake, such as getting a parameter type wrong, results in pages of messages, none of which gives you the slightest clue as to what you actually did wrong. The first time you compile a new chunk of code that makes extensive use of bind, you will typically get thousands of lines of errors, corresponding to just a handful of typos and the like. The trick is, to find the message line that gives you the actual source line - which is buried in there somewhere - then just go stare at the line until you figure out for yourself what you did wrong. The rest of the messages can be summarized as "you did something wrong on this line".

Part 2: The Good (things I just wouldn't live without)
Part 3: The Bad and the Ugly

No comments: