Monday 14 March 2011

boost intrusive - wonderful!

I wrote a while ago about my disappointment with boost serialize, so it's only fair to redress the balance and say how delighted I am with boost intrusive. The STL list, map and set classes are extremely handy, but to those of us old enough to have cut our teeth on assembler, there's something rather unsatisfactory about all that behind-the-scenes memory allocation going on.

Back in the good old days (like in VMS or RSX), pretty much every data structure began with a pair of forward/backward pointers which could be used to link it into a list. All the key lists in the system (active processes, paged memory allocations, devices, ...) were held like this, with a listhead having the same structure. Insertion and removal are dead easy, and the relatively small size of the systems meant that linear searches weren't really a performance problem.

When I first started writing C++, I wrote a generic linked-list class like this, and an AVL tree class as well. I reused them for a while but then started using STL. But there was always this nagging feeling about those little memory blocks being manipulated behind the curtain.

Recently I've been working on something where memory usage is a problem. I thought those days were behind us, but this one (correlating information about network flows gathered through Netflow) was using up all the memory (8 gbytes) on the machine. Something had to be done. I put all the data structures on a diet, figured out how to get rid of stale information, and so on. But those 32 bytes being wasted for every list entry suddenly seemed important. I'd seen boost intrusive in passing but it had always looked a bit complicated. Time to read it seriously... and discover just how simple it all is.

Those old doubly-linked list structures from RSX... easy:

class my_structure : boost::intrusive::list_base_hook<>
{...

and then:

typedef boost::intrusive::list<my_structure> my_structure_list;

And that's it! You can have sets, too, with lots of nice facilities for lookups, and space/performance tradeoffs, and many other things. For example, I like to keep sets of things indexed by something in the structure. I have a much-reused class for doing this, built on std::set. But lookups require the construction of a whole data structure just to carry the key. I wrote a hack to make this reasonably efficient, but I don't like it.

They thought of that. The advanced lookup and insert functions just let you lookup using the key, without constructing an instance of the structure. The necessary C++ in the structure definition looks a little odd first time, but once you're used to it, you'd never do it any other way.

I'm gradually migrating all my recent code to use boost intrusive. It is just so much better than the STL classes.

Yeah for boost! More thoughts about it here.