Thoughts from upside down: boost

Showing posts with label boost. Show all posts

Wednesday, 1 May 2013

Living without boost::has_member_function

I needed to write a function that would do one thing if its parameter class had a particular member function, and another if it didn't. It seemed like this should be straightforward with a bit of Template Metaprogramming (MPL). But it wasn't. Here's the story of how I found a solution.

In more detail... our software has lots of collections whose contents are all derived from the same class, named_object. There's a bunch of infrastructure which will dump objects and collections of them into various formats including highly abbreviated text for debugging, and Json for our Rest API. There are also a few collection classes which need to be used in the same way but aren't (and can't be) built out of the same class and collection structure.

The main collection base class provides a get_by_name function that does an efficient lookup, using the boost intrusive set class, and either returns the desired object or throws an exception if it isn't there. The other classes are typically held as linked lists, aren't derived from a common base, and don't provide a get_by_name function of their own since the details vary. However, they have enough in common - in particular, they provide a get_name function - that the stl::find_if function can be used to scan through the collection item by item, in O(n) time.

So what I wanted was a way to write a function that would see if the collection class had a get_by_name function, and use that if present. If not, it would use the fallback of stl::find.

Boost provides an extensive library called type_traits which can be used to do all sorts of clever things like this. For example, it's easy to provide a function which does something different if its parameter is a pointer, or supports ranking. But I looked in vain for the has_member_class metafunction. And a web search didn't find anything obviously helpful either. So... how to do it all by myself?

The key to much of metaprogramming is something called SFINAE (Substitution Failure Is Not An Error). This means that if an error occurs while evaluating the eligibility of a function (or a class partial-specialization), it isn't an error, it just removes that particular function overload from the candidate list. The Wikipedia article gives some simple examples. Using SFINAE, it's easy to define a function which will only be eligible if the required member function is present. One way is to define a template argument with a default which uses the member function in question:

template<class C,
typename C::value_type(C::*FN)(const string&)=C::get_by_name>
typename C::value_type __get_by_name(const C *c, const string &name)
{
return c->get_by_name(name);
}

Note that the type of get_by_name is spelled out in excruciating detail. This wasn't a problem in this case, so I didn't try very hard to avoid it. It seems to me that it ought to be possible, using a list of classes where every parameter is a distinct template argument, but I didn't look into it. And maybe not, which may explain why there isn't a boost::has_member_function. Anyway, this function will use get_by_name if it exists, and be deselected if it doesn't.

Now we just need the fallback function:

template<class C>

typename C::value_type __get_by_name(const C *c, const string &name)

{

typedef typename C::value_type V;

return *std::find_if(c.begin(), c.end(), [=](V v)->bool

{ return v->get_name()==name; });

}

This uses find_if with a simple lambda function to extract the right object. (For clarity I've omitted dealing with what happens if it isn't present. I've also assumed that the container deals in pointers, not references - which isn't the case for the Boost intrusive classes. That led to quite a lot of complication in my actual code).

Unfortunately though, it doesn't work. It's fine if C doesn't have get_by_name - the first function is deselected, leaving only the fallback version. But if it does have get_by_name, both functions are eligible and the compiler (gcc at least) declares the call ambiguous.

The rules for deciding how to order functions when more than one is available are complex, and probably only fully understood by people who write C++ compilers. Basically, a function is preferred if it is "more specific", i.e. if there is a set of parameters that will satisfy the other function but not this one. On that basis, it looks as though the first function should have preference over the second. But it doesn't.

At this point in my explorations I gave up. It wasn't that hard to write a get_by_name function for the other handful of collection classes. But it annoyed me and I kept thinking about the problem.

The very least specific function is one with completely unspecified arguments: "foo(...)". However such a function can't do anything, since it knows nothing about its arguments. Also, such a function only accepts POD arguments, not including references. Something a bit less drastic is required.

The solution was obvious when I saw it: add another, completely unused, parameter, which has explicit type for the preferred function, and is a template for the other one:

template<class C,
typename C::value_type(C::*FN)(const string&) C::get_by_name>
typename C::value_type __get_by_name(const C *c, int, const string &name)
{
return c->get_by_name(name);
}

template<class C, class I>

typename C::value_type __get_by_name(const C *c, I, const string &name)

{

typedef typename C::value_type V;

return *std::find_if(c.begin(), c.end(), [=](V v)->bool

{ return v->get_name()==name; });

}

A wrapper function just inserts a suitable argument:

template<class C>

typename C::value_type get_by_name(const C *c, const string &name)

{

return __get_by_name(c, 0, name);

}

Et voila! This selects the function using the class's get_by_name if it exists, and the fallback function only if it doesn't exist. Problem solved.

Thursday, 1 December 2011

IRP rediscovered - first steps in Template Metaprogramming

One of the nice things about the PDP-11 assembler was its powerful macro features. Not only could you do basic text substitution, you could create loops using the REPT directive, for a fixed number of iterations, or IRP, which iterated over a list of arguments. It was especially good for setting up data structures, which nowadays would be viewed as a rather crude application specific language (ASL). (Before I start getting hate-mail, yes, I know this was originally from the PDP-10).

For whatever reason, the designers of C eschewed all this and just went for simple text substitution. Every now and then I have a bout of nostalgia for the PDP-11 assembler, especially when trying to build elaborate descriptive data structures. Of course there's always M4 but the learning curve is huge. Actually I'm a long way down the forgetting curve for M4, a long while back I built a very elaborate set of macros for tracking register usage and many other things for some MIPS assembler that I wrote. But it was a long time ago.

Then just the other day I really needed the old REPT directive. I've been working on a very interesting algorithm design problem, for reducing low-density parity check codes (LDPC) to a form where they can be encoded by practical hardware. The innermost loops of this algorithm are extremely performance critical - by nature this is an O(n^3) problem (i.e. the complexity increases with the cube of the size of the code). For a realistic sized code of say 32K data bits, the innermost part of the algorithm gets executed several billion times. Normally I'm content to let the compiler worry about the details of code optimization - today's compilers (gcc and MSVC) do a wonderful job. But in this case, saving a single instruction could cut seconds off the execution time, so it was worth digging a bit deeper.

Of course the first part of optimization is to use the right algorithms and data structures. I'd already done all that, cutting the execution time by a factor of thousands compared to our initial, simple implementation. Now I was looking to shave off another factor of two by paying attention to the details.

One such detail was to unfold the critical inner loops, replacing them by linear sequences of instructions with no tests or jumps. After some careful crafting of data structures, the loops were extremely tight, less than ten instructions. One of the loops has a large repeat count, so it was easy just to do it in gulps of 16 at a time. At that level the loop overhead is negligible, and when the remaining number is less than 16, the last few can be done one at a time.

The other loop was trickier though. The number of iterations is small, in the range 6-20, so the whole loop has to be done at once. A quick experiment showed that gcc implements a switch statement using a jump table, so it would be quick to dispatch to the right unrolled loop. But how to generate the code without tediously repeating the same statements over and over?

That was when I thought of using metaprogramming, i.e. programs that run at compile time rather than at execution. The idea is to declare a template class, parameterized by an integer that tells it how many instances you want. The resulting code looks like this:

template<int I> struct repeat
{
    void apply(vector<operation> &ops, vector<operand> &v)
    {
        ops[I-1].do(v);
        repeat<I-1>().apply(br, v);
    }
};

template<> void repeat<0>::apply(vector<operation> &ops, vector<operand> &v) { };

The details of what's being done aren't too important here. "op" is a vector of operations, which says what to do and which operand vector element to apply it to. We want to make sure that each operation in the vector is applied.

The "apply" function first does the operation corresponding to its parameter, then recursively invokes the class with a parameter of one less. But how to get the recursion to stop? This is where the specialized function declaration comes in. The compiler will always choose an explicit specialization over the generic definition, so when the parameter reaches zero, this empty function is selected and the recursion stops.

The code that uses the class looks like this:

switch (ops.size()) {
case 6:
    repeat<6>().apply(ops,v);
    break;
.
.
.
case 20:
    repeat<20>().apply(ops,v);
    break;
default:
    for (auto opi=ops.begin(); opi!=ops.end(); ++opi) {
         opi->do(v);
    }
    break;
}

I happen to know that the vector size will normally be in the range 6-20. The default is there so the code will work, albeit less efficiently, if it isn't. If you really had no idea of the limits, you would first deal with chunks of say 16 at a time, then finish off the remainder using the above technique.

It looks as though this will produce horrific code, with the recursion and everything else. If you compile without optimization, for debugging, indeed it does, with a deep nest of function calls, each with its own call, entry and exit sequences. But if you turn on full optimization, gcc produces exacly what you would if you hand coded - just the exact set of instructions required to implement each iteration of the loop. (I imagine MSVC would too, though I haven't tried it). You'll notice that the "repeat" object is instantiated, but since it has no content, this doesn't actually do anything.

To the real experts in metaprogramming (all dozen of them), this is child's play. But for the casual visitor to the topic, like myself, it's a neat technique that can save a lot of tedious and error-prone repitition. As I expected, unrolling this innermost of inner loops saved about 5% of the execution time, which is a useful contribution to my overall target of 50%.

Friday, 26 August 2011

Boost: a retrospective (part 3) - the Bad and the Ugly

In part 2 I talked about my favorite elements of the Boost libraries. Boost is wonderful, but even so there are things that are not so good. These, the ones which (in my opinion) are best avoided, form the subject of this post.

Serialization

I wrote a while ago about my frustration with this library. It seemed the perfect solution to a data pickling need I had, until I discovered that it can't cope with polymorphism. It claims to, but it randomly crashes deeply nested in incomprehensible function calls if you try. There may have been a solution, but life is just too short to figure it out. The reason for all this is that its authors decided to invent their very own subclassing scheme, completely orthogonal to the one that C++ uses. They may have had their reasons, but it's a complex subject and clearly they missed something.

Asio

If you've ever needed to do low-level socket I/O, you've probably been tempted to write an object wrapper around the C function calls and data structures. You may even have taken a look at Boost to see if they have already done this. In which case, you'll find that they have. I've certainly been down this path, and discovered Boost Asio at the end of it.

You will next discover that Asio is extremely complex, with all kinds of interacting classes that you have to be aware of and create. I spent a day or so trying to get my head around it, finally getting to the point where I felt safe putting fingers to keyboard. Then I discovered that despite all that complexity, it couldn't do what I needed. This was nothing fancy, just listen on a port, and create a thread to handle each TCP session as it arrives. Turns out Asio has a race condition - by design - which can result in missed connections. Some searching showed that there's a workaround for this, but it's complex and requires even more delving into its complexities - and isn't without its own problems anyway.

I had a long meeting to attend, so I figured I'd print the documentation and peruse it during the meeting. Over 800 pages later, my meeting had finished anyway, but the printer still hadn't. At this point, I decided that anything which takes 800 pages to describe it - for such a relatively simple function, this isn't Mathematica after all (1465 pages) - just can't be worth the learning curve.

I wrote my own family of socket classes. It actually took me less time to write and debug than it did to print the Asio documentation, never mind read it! I've been very happily using them ever since. Probably, you will do the same, but if you'd like to use mine, you're welcome. You can find them here.

The Build System

Everyone knows Make. It's convoluted, nearly incomprehensible, and a syntactic nightmare, but everyone has used it and can bodge their way out of a tight corner if they need to.

But why use something everyone knows, when you can invent something unique of your own? Sadly, this is the path that Boost took. They have their own unique build system called Bjam. I'm sure it's very elegant compared to Make - it would take a huge effort not to be - but it's still very complex, and poorly documented too. In fairness, it does (mostly) "just work" if you need to build Boost from sources. But if for whatever reason you do need to get under the covers, woe betide you.

I discovered this when I needed to cross-build Boost for our embedded processor. This is always tricky because of the config stage, where the build system looks to see what capabilities the system has, where things are located and so on. For a cross-build, of course, you can't auto-discover this just by poking around at the system you're running on. That part went OK, though. However editing the build files to pick up the right cross-compiler, cross-linker and so on, was just impossible. I found quite a bit about it on the web, but never quite enough to make it work.

Fortunately, our hardware ran a complete Linux system and with a little fiddling we could just build it native on our box. But if you can't do this - and most embedded systems can't - then you can forget using Boost. Which is a shame.

Wednesday, 24 August 2011

Boost: a retrospective (part 1)

My love affair with Boost started with my first, self-appointed programming task at Anagran, the fan controller for our box. I wanted a table of functions, corresponding to each of the temperature sensors. Some of these were parameterless, corresponding to unique items, while others were indexed by interface card number. I wanted to be able to put a "partly cooked" function object in the table, with the interface number frozen but other parameters to be supplied through the ultimate call. This is called a "partial function application" or "partial closure" in computer science.

STL provides C++ with some glimmerings of functional programming, with "memfun", "bind1st" and so on. It seemed like it ought to be possible to write something appropriate, but making it usefully generalized also seemed like a lot of work. Surely someone must have done this already!

Searching for it led me to Boost, "one of the most highly regarded and expertly designed C++ library projects in the world" as they modestly say at the top of the front page. It is however true. It's a huge collection of highly-generalized classes and functions for doing an amazingly large number of extremely useful things. It's an open-source project whose authors, while not anonymous, keep a very low profile. I can only assume they love a challenge (and have a lot of spare time), because they do some extremely tricky things, under the covers. But for the user, they're mostly very straightforward to use.

So over the last five years, I've discovered more and more that can be done with Boost. Although I've called this a "retrospective", I'm not planning to stop using it.

Boost makes extensive use of "template metaprogramming", which is a kind of compile-time computing. When C++ templates were invented, the idea was to allow simple compile-time parameterization of classes and functions, for example so you could write a "minimum" function to return the lowest of its arguments regardless of whether they were int, float, double or some user-defined class. As the concept evolved, it became possible to make very complex choices at compile time. In fact, you can write just about any program to produce its output directly from the compiler, without ever even running it, if you try hard enough. It's hard to get your head around, but fortunately you don't need to.

Function and Bind

These were the first Boost packages I discovered. Function defines a general, templatized function class. So you can define a variable as "function<int(foo*)>" and assign to it any suitable function. In particular, assign a member function of the foo class and all the right things will happen.

The Function class is useful, but it is the Bind class that really transforms things. You can take any function, bind some or all of the parameters to specific values, and leave the others (if any) to be supplied by a subsequent call to the bound object. This is exactly what I was looking for in my fan controller. For example, suppose you have a function "int foo::get_temperature<(double)>". Then you can write:

  function<int(double)> fn = 

    bind(&foo::get_temperature, my_foo, _1);

to store a function which will apply its argument to the "my_foo" instance of foo, which you use for example as:

  printf("temperature at %f is %d\n", v, fn(v));

(Of course you shouldn't be using printf, you should be using boost::format, but that comes later). The "_1" is a placeholder, whose meaning is "take the first parameter of the final call, and put it here". Bind takes care of types, making sure that the actual parameter is (in this case) a double, or something that can be converted to it. If you want to, you can even apply bind to previously bound functions - though you might want to ask yourself why you're doing it.

This is absolutely perfect, for example, for callback functions that need to keep hold of some context. In C you do it using void* arguments, which is unsafe and generally wretched. This can be avoided in C++ by defining a special-purpose class, but that requires the caller to know about it, which ties everybody's shoelaces together more than is healthy.

The only problem with function/bind - which is true of any code that makes heavy use of templates - is that compiler errors become incredibly verbose and just about useless. A single mistake, such as getting a parameter type wrong, results in pages of messages, none of which gives you the slightest clue as to what you actually did wrong. The first time you compile a new chunk of code that makes extensive use of bind, you will typically get thousands of lines of errors, corresponding to just a handful of typos and the like. The trick is, to find the message line that gives you the actual source line - which is buried in there somewhere - then just go stare at the line until you figure out for yourself what you did wrong. The rest of the messages can be summarized as "you did something wrong on this line".

Part 2: The Good (things I just wouldn't live without)
Part 3: The Bad and the Ugly

Friday, 5 November 2010

boost serialize - not such a good idea!

I'm a big fan of the boost libraries for C++. Mostly, they are a huge productivity gain - things like regex, function and bind can reduce the work involved in a complex program by half or more. So when I needed to pickle a very large (gigabytes) and complex data structure between restarts, boost was the obvious place to look.

Boost serialize certainly looked like the answer. Just add a few lines to each class saying what you what to save and restore, then with a single function call, you can pickle the whole structure and later reload it. It takes care of loops, diamonds and all the other things that happen in real-life data structures. And if you don't like the default way of saving something, it's easy to write your own. How good can it get!?

Well, that's the theory, and it could have been the practice too. But it isn't. For some reason known only to themselves, serialize's authors decided that the C++ inheritance mechanism wasn't for them. They invented their own completely parallel mechanism for dealing with polymorphism and subclasses. The effect is that trying to save my structure results in an exception thrown from somewhere in an enormous depth of function calls. I persevered, and tracked down what was happening. For some reason - and it's just impossible to plough through all the code and figure out the details - it silently ignores some of the calls to the subclass registration process. After several days of trying to figure out the details - on a live application because that's the only place the data can be collected - I have finally given up. I'll live without this capability in my program.

Unfortunately this is a common problem with boost. It seems to be de rigeur to invent new ways of doing things even though they are more complex and don't work especially well. The build system is another case in point - instead of using make, known and hated by generations of programmers, they invented their own, bjam, which is unknown and incomprehensible. If it works for you, great. If not, forget it.

I'll carry on using boost, but you do have to be selective. Unfortunately.

More thoughts on Boost here.