Sunday, 11 December 2011

Algorithm Design: Efficient LDPC Encoding (Part 4: Optimization)

In Part 3, I described the implementation of the algorithms for reducing an LDPC code to an encodable form. At that point, all the algorithms were the most efficient possible. The only remaining performance gains would come from improving the code. This is rarely worth spending too much time on, but in this case the overall performance is completely dominated by two inner loops. One iterates through a sparse representation of a row, adding it to a dense row. The other iterates along the elements of the dense representation, adding them. Halving the time spent in both of these loops - just a few instructions each - will halve the execution time of the whole algorithm. So it's worth taking a close look.

Let's start with the add-sparse-to-dense loop. The original code used conventional STL iterators to scan through the elements of the sparse row, then for each element, converted it to the offset-and-mask combination for the particular bit number, and applied it using an xor operation. It's the obvious way. But each sparse row is added to a dense row tens of thousands of times, so it's worth considering whether any part of this operation can be amortized.

The final solution was to pre-calculate a vector containing the offset-and-mask for each entry. The latter was actually represented as a class, called "bitref". In the source code, this results in a vector<bitref>, which is iterated through in the usual way. The compiler is nevertheless clever enough to inline all this and reduce the inner loop to just four machine instructions: two to extract the offset and mask, one to perform the xor operation, and one to advance to the next entry. Not bad. Performance was improved substantially, reducing the time for phase 2 of the algorithm by a factor of about three.

There remains the overhead of the loop. Given the tiny size of the content, the two instructions of loop overhead, and their impact on pipelining, are worth worrying about. In assembler, the obvious way to unroll the loop would be to lay out sequential instructions corresponding to its maximum size, then jump into these at the appropriate point corresponding to the loop size (i.e. the number of entries in the vector). This is difficult to do in C++ though.

In the end I came up with something similar, but with a distinct unrolled loop for each possible size. This was done as my first tiny adventure in C++ template metaprogramming, and is described here. The compiler is smart enough to translate the switch statement based on vector size into an indexed jump, which then executes the entire loop in a straight line. That gave me about another 5% improvement.

Having got the inner loop as tight as possible, it was time to think about the next layer of the loop. Gcc does a good job of inlining functions when it's the right thing to do, but examination of the assembler output (-S option) showed that it was not inlining a couple of critical functions here. I played around with the compiler parameters that control inlining for a while and things got a little better, but I could just not convince it to inline one critical function. Of course the "nuclear option" of making it a macro always exists, but I really wanted to avoid that. I tried the "flatten" function attribute for the outer loop, which tells the compiler to inline absolutely everything, but after the compiler had run for half an hour or so I stopped it. I think it got put off by all the calls to boost::format that I use in my debug log macros.

Eventually, I found a minor rearrangement of functions that got everything inlined. That gave me another 5% or so performance improvement.

That dealt with the inner loop of phase 2, adding sparse rows to dense rows. In phase 3, the inner loop is adding dense rows to dense rows. Unrolling this loop was easier since it is always over the whole length of a dense row - over 64K bits, or 2K operations. There's nothing to be gained by completely unrolling such a loop. Instead I changed the code to do it in "gulps" of 16 entries at a time, then used a normal loop to deal with the remainder at the end. I also rearranged things here so that the call to the inner loop was fully inlined.

And that is about as far as things can be taken. The original C++ code took about 400 seconds for a column-weight 3, 32K data bit code. The final code takes under 7 seconds. I never ran a column-weight 5 code to completion with the original code - it would certainly have taken thousands of seconds, maybe much more. But now, it runs in about 45 seconds.

Of course there's a price to pay for all this. One of the first principles of writing maintainable systems is never to keep the same information in more than one way. This code violates that all over the place - for example, the sparse and dense representations of rows. But without this kind of approach, the code would be unusable anyway, so its maintainability wouldn't matter much. It has certainly been one of the most interesting bits of programming I've undertaken in a long timer.

Thursday, 8 December 2011

Algorithm Design: Efficient LDPC Encoding (Part 3: Implementation)

In Part 2 I described the algorithms that need to be implemented in order to transform the base matrix of an LDPC code into a reduced form that can be used by a practical encoder. As I mentioned in Part 1, we originally built a very straightforward Python implementation, where the matrix was represented literally as a bunch of rows of 0s (mostly) and 1s (rarely). Extrapolating from its performance with toy-sized codes, it would have taken months or years to reduce a life-sized (>32K bits) code. We needed something a bit faster, like a few seconds, so I set out on a C++ implementation. C++ is naturally 20-50 times faster than Python, but it would take a lot more than that.

The first, obvious, step was to change the representation to a sparse array, where only the 1 values are held explicitly. The Python code spent most of its searching arrays of 0s trying to find the occasional 1, adding a further O(n) to its execution time.

During the first phase, all the work consists of swapping rows and columns. To support this efficiently, the sparse array consists of "bitnodes" representing a 1. They are linked into lists both for the row and for the column, and contain pointers back to each of these. This means that when rows are swapped, the columns get to find out about it with no further work, and vice versa. The implementation makes extensive use of the Boost intrusive library, about which I've already eulogized. In the original implementation, the row and column lists were held in order, though I ended up rethinking this later. Here is the structure of a bitnode:

   class bitnode
        typedef bi::set_member_hook<bi::link_mode<bi::auto_unlink> > row_hook_t;
        typedef bi::list_member_hook<bi::link_mode<bi::auto_unlink> > col_hook_t;
        row_hook_t row_hook;
        col_hook_t col_hook;
        row_t *my_row;
        col_t *my_col;

        // member functions follow

Note the use of the intrusive member hooks, which allow the same structure to be linked into several lists (or sets). The backpointers to the row and column allow the row and column numbers to be tracked as they are swapped, which would not be the case if they were held explicitly.

This basic implementation worked well for codes with a column weight of 3, taking about 300 seconds to transform a 32K bit code. For a column weight of 5, though, which results in a much larger gap, it was unusable.

A little instrumentation showed that all the time was spent adding rows together. In the set-based implementation of sparse rows, every addition involved either the creation or the deletion of a node in a tree, a relatively expensive operation. The solution was to switch to a dense representation for the gap rows only. So, just before starting phase 2 (elimination of the ones in the F region of the matrix), the gap rows are converted to a dense representation, with one bit per possible position. This is simple enough in theory but took a lot of reworking of other structures, such as the columns. It was worth it, though: the time dropped to around 60 seconds for the column weight 3 codes, and to around 300 seconds for the column weight 5 ones.

Adding a sparse row to a dense row means walking the bitnodes in the sparse row and xor'ing the corresponding bit. Adding a dense row is just a tight loop xor'ing the 32-bit words together, an O(n) operation. These two inner loops are the key to performance - we'll come back to them later.

As always, when you speed up one part, you find another bottleneck. In this case it was phase 1 again. The best column to swap when making the diagonal was selected by simply scanning them linearly, which is obviously expensive. The solution was to keep a constantly-sorted list of the best one - actually a priority queue, implemented yet again as a boost intrusive set. However this changes constantly - when a row has been incorporated into the lower triangle, the columns it contains now have one less 1 in the region of interest. Increasing the gap also affects it. Fortunately, the row structure makes it easy to update just the columns that are directly affected, which is O(b), and then to correct their position in the list. Hence the total operation each time is O(b log(n)) which is much better than than O(n) as previously.

For a column weight of 3, this made phase 1 practically disappear as a performance concern, as I expected. But for a column weight of 5, it was still taking the majority of the time - which I didn't expect. Further analysis showed that keeping the columns in order was very expensive. Every time a row was moved to the gap, every column had to be re-sorted. On further thought, there is only one time when it helps for a column to be sorted, which is when it is being processed as the diagonal element. So just sorting it there, once per column, would work just as well and remove an O(n) element from the algorithm With this change, phase 1 moved down into the noise - for a column weight 3 code, it is about 4% of the total time.

At this point there are no further fundamental improvements to be made - the order of work to be done for each phase cannot be reduced. Further improvement can only come by coding optimizations, which will be discussed in Part 4.

Tuesday, 6 December 2011

Algorithm Design: Efficient LDPC Encoding (Part 2: Algorithms)

In Part 1, I described the problem we are trying to solve, taking a sparse matrix and solving the corresponding system of simultaneous equations (around 33000 of them) so that we can build an efficient hardware encoder for Low Density Parity Check (LDPC) codes.

Efficient encoding requires that the original sparse matrix be transformed such that all the encoder has to do is calculate a number of parity checks. Most of these are very sparse, so they can use shared hardware. A small proportion (about 3% in a typical code) are dense, i.e. they have about the same number as 1s and 0s, and so cannot share hardware.

The resulting transformed matrix is called the "reduced" matrix, and when it is complete it has the following form:

|            D           |    E   |           F            |
|            A           |    B   |           C            |

Rows in the D/E/F part are called the "gap" in the literature. Initially the reduced matrix is set to be identical to the base matrix, and the gap is empty. In a matrix representing a system of simultaneous equations, such as this, rows can be swapped without changing the meaning, as can columns. Also, rows can be added together (although columns cannot be). In binary addition, 1+1=0. We use these facts to rearrange the matrix into reduced form, by the following steps.

1. Transform part C into "lower triangular" form (LTF), in which everything above the main diagonal is zero. This can be done by swapping rows and columns. At each step, we look for a column that has just a single entry above the current diagonal row, then swap it with the current diagonal column. Finding a suitable column is the key to performance at this step.

2. Sometimes, we can't find such a column. This is how the gap gets created. We choose a column with the smallest number of such entries and swap that. Then we exchange rows so that the populated rows move into the gap area.

3. When this part is finished, C is in lower triangular form, but the gap is not. The next task is to complete the task for the gap, by emptying F altogether and getting E into lower triangular form. So far, all rows are still sparse, since no row or column has been changed apart from ordering.

4. For each row in F, we eliminate all bits using Gaussian elimination. Starting with the rightmost one bit, we add the corresponding row from C which has this as its rightmost bit (i.e. on the diagonal). We repeat this, moving leftward, until the F part of each row has been emptied. In the process, the rest of the row becomes dense, with on average as many 1s as 0s.

5. We now have F empty, and we need to transform E into lower triangular form. We do this by Gaussian elimination again, this time using rows from the gap. We start from the bottom and work up, creating the diagonal as we go, so that we don't put back bits that we have already eliminated.

6. Now we're done. E and C between them have a neat diagonal line with nothing above it. F is empty. A and B are still sparse, but D and the lower triangle of E are dense. All the bits in columns in A and D are data bits. The check bits, in the remainder of the matrix are generated from these.

Let's take a look at the performance of each of these steps. First we need to define some terms:

b: the number of 1 bits in a single row in the base matrix. This is small, and independent of the size of the code. It's also referred to as the row weight. We refer to the number of 1s in a single column as the column weight.

g: the number of rows in the gap region (D/E/F). Although this is much smaller than the number of data bits, it is directly linked to it. For a column weight of 3, it is about 3.3% of it. Hence anything which is O(g) is also O(n), though with a much smaller actual value.

n: the total number of rows.

The task falls into three phases:

-- Phase 1: rearrangement of rows and columns to create the C region. This has to be done once for each row (less the gap rows), and each time, we have to select the best available row. If we simply scan the rows looking for the best one, this will be O(n), making the overall task O(n2). We'll explain later how this can be made O(n log(n)). In addition we have to create the gap. Rippling a row up into the gap is O(n), and has to be done for each gap row, so the total task is O(n*g). In principle this is O(n2), but because g is so much smaller than n, with suitable design it can be kept small, comparable with the O(n log(n)) time of the row rearrangement.

-- Phase 2: eliminating all the 1s in the F region. There are g rows to deal with, and once the process starts they quickly become dense. Hence O(n*g) row additions are required, where one (the gap row) is dense, and the other, coming from the C region, is sparse. The amount of work per addition is O(b), making the whole task O(n*g*b).

-- Phase 3: eliminating the upper half of the triangle in the E region. There are g rows, and O(g) bits to be eliminated in each row, so there are O(g2) additions. Since these involve adding dense rows to each other, the amount of work per addition is O(n), making the whole task O(n*g2) - or in other words, O(n3). For small codes, this phase is dominated by phase 2, but as the code size increases it starts to dominate the total time. This is especially true if larger row or column weights are used, since the gap becomes proportionately larger (about 10% of the total rows for a column weight of 5).

These are the fundamental limits of the algorithm - no matter how clever the design, the three phases will have complexity O(n log(n)), O(n*g*b) and O(n*g2) respectively. The trick of a good implementation is to achieve these limits, and to minimize the actual values in each case. Part 3 discusses how this was done.

Algorithm Design: Efficient LDPC Encoding (Part 1: Background)

I've been working lately on a system design which requires, among other things, highly effective and efficient error-correcting codes (ECC). We've decided to use a Low Density Parity Check (LDPC) code. These are currently considered to be the best "soft" ECCs, i.e. where there is information about the reliability of each received bit as well as its putative value. The story behind LDPCs is interesting: they were invented by Robert Gallager in his PhD thesis in 1960, but they were way beyond contemporary computing power. It didn't help that when he wrote the definitive textbook on ECCs in 1966, he didn't mention them! So they languished, forgotten, until a decade ago. By then TurboCodes had been independently invented. They also provided a means for "near Shannon limit coding", i.e. extracting as much data from a noisy signal as theoretically possible.

LDPCs have two properties which led to the problem I needed to solve. First, there is no formula that provides the best code for a given set of constraints (block size and code rate). You can use the same general scheme to build ten different codes, the details being decided by a random number generator, and some will be significantly better than others. That means that to find the code you want to use in practice, you need to generate a whole bunch of them and try them out over a large number of messages and error densities.

That leads to the second problem. An LDPC starts out as a very sparse matrix, describing a large number of parity checks each of which covers a small number of bits - hence the name. We want to have 32768 bits of user data, and a reasonable configuration is to have each bit covered by three checks. If we use a half-rate code (same number of data bits and check bits) then each check covers six bits. So we have a matrix where each row is 64K bits long and has just six 1 bits.

The matrix doesn't say anything about which bits are data and which are check bits, only that a valid codeword has to satisfy all the checks. So given 32K data bits, the way to generate the corresponding 32K check bits is to solve the 32K simultaneous equations that the sparse matrix implicitly describes. Easy!

Well, no, not easy at all. The practical use of LDPCs requires a transformation of the matrix into something that normal hardware or software can encode in a linear and reasonable amount of time. Solving the equations directly is an O(n3) problem, i.e. the time required increases with the cube of the number of unknowns. So we have to some preprocessing on the matrix to get it into a form that the hardware can work with. There's an excellent paper by Qi and Goertz describing how to go about this. The algorithm it describes is, not surprisingly, also O(n3). This needs to be run for every trial code, and we would like to try hundreds of them.

Our first attempt at coding the algorithm was written in Python, using the obvious data representation, i.e. a big matrix containing mostly 0s and a few 1s. It was written so we could understand the algorithms and piece together a complete system, rather than for performance. On a "toy" code of a few hundred bits, it took a couple of minutes to run. On slightly larger codes - nowhere near the size we need for our system - it took most of the day. By extrapolation, to generate a life-size code would have taken months or years.

Clearly, we needed an implementation more focused on performance - not just code optimization, but selecting algorithms to minimize the time at step of the algorithm. And that is where it begins to get interesting. More on that in Part 2.

Thursday, 1 December 2011

IRP rediscovered - first steps in Template Metaprogramming

One of the nice things about the PDP-11 assembler was its powerful macro features. Not only could you do basic text substitution, you could create loops using the REPT directive, for a fixed number of iterations, or IRP, which iterated over a list of arguments. It was especially good for setting up data structures, which nowadays would be viewed as a rather crude application specific language (ASL). (Before I start getting hate-mail, yes, I know this was originally from the PDP-10).

For whatever reason, the designers of C eschewed all this and just went for simple text substitution. Every now and then I have a bout of nostalgia for the PDP-11 assembler, especially when trying to build elaborate descriptive data structures. Of course there's always M4 but the learning curve is huge. Actually I'm a long way down the forgetting curve for M4, a long while back I built a very elaborate set of macros for tracking register usage and many other things for some MIPS assembler that I wrote. But it was a long time ago.

Then just the other day I really needed the old REPT directive. I've been working on a very interesting algorithm design problem, for reducing low-density parity check codes (LDPC) to a form where they can be encoded by practical hardware. The innermost loops of this algorithm are extremely performance critical - by nature this is an O(n^3) problem (i.e. the complexity increases with the cube of the size of the code). For a realistic sized code of say 32K data bits, the innermost part of the algorithm gets executed several billion times. Normally I'm content to let the compiler worry about the details of code optimization - today's compilers (gcc and MSVC) do a wonderful job. But in this case, saving a single instruction could cut seconds off the execution time, so it was worth digging a bit deeper.

Of course the first part of optimization is to use the right algorithms and data structures. I'd already done all that, cutting the execution time by a factor of thousands compared to our initial, simple implementation. Now I was looking to shave off another factor of two by paying attention to the details.

One such detail was to unfold the critical inner loops, replacing them by linear sequences of instructions with no tests or jumps. After some careful crafting of data structures, the loops were extremely tight, less than ten instructions. One of the loops has a large repeat count, so it was easy just to do it in gulps of 16 at a time. At that level the loop overhead is negligible, and when the remaining number is less than 16, the last few can be done one at a time.

The other loop was trickier though. The number of iterations is small, in the range 6-20, so the whole loop has to be done at once. A quick experiment showed that gcc implements a switch statement using a jump table, so it would be quick to dispatch to the right unrolled loop. But how to generate the code without tediously repeating the same statements over and over?

That was when I thought of using metaprogramming, i.e. programs that run at compile time rather than at execution. The idea is to declare a template class, parameterized by an integer that tells it how many instances you want. The resulting code looks like this:

template<int I> struct repeat
    void apply(vector<operation> &ops, vector<operand> &v)
        repeat<I-1>().apply(br, v);

template<> void repeat<0>::apply(vector<operation> &ops, vector<operand> &v)  { };

The details of what's being done aren't too important here. "op" is a vector of operations, which says what to do and which operand vector element to apply it to. We want to make sure that each operation in the vector is applied.

The "apply" function first does the operation corresponding to its parameter, then recursively invokes the class with a parameter of one less. But how to get the recursion to stop? This is where the specialized function declaration comes in. The compiler will always choose an explicit specialization over the generic definition, so when the parameter reaches zero, this empty function is selected and the recursion stops.

The code that uses the class looks like this:

switch (ops.size()) {
case 6:
case 20:
    for (auto opi=ops.begin(); opi!=ops.end(); ++opi) {

I happen to know that the vector size will normally be in the range 6-20. The default is there so the code will work, albeit less efficiently, if it isn't. If you really had no idea of the limits, you would first deal with chunks of say 16 at a time, then finish off the remainder using the above technique.

It looks as though this will produce horrific code, with the recursion and everything else. If you compile without optimization, for debugging, indeed it does, with a deep nest of function calls, each with its own call, entry and exit sequences. But if you turn on full optimization, gcc produces exacly what you would if you hand coded - just the exact set of instructions required to implement each iteration of the loop. (I imagine MSVC would too, though I haven't tried it). You'll notice that the "repeat" object is instantiated, but since it has no content, this doesn't actually do anything.

To the real experts in metaprogramming (all dozen of them), this is child's play. But for the casual visitor to the topic, like myself, it's a neat technique that can save a lot of tedious and error-prone repitition. As I expected, unrolling this innermost of inner loops saved about 5% of the execution time, which is a useful contribution to my overall target of 50%.

Favourite Restaurants #4: Kaiten Sushi, Shinbashi, Japan

When I first started travelling to Japan, I would generally stay at the Shiba Park Hotel. My business there was at the Japanese national standards body, whose offices were just across the street from the Tokyo Tower and a short and pleasant walk through the Shiba Park itself from the hotel.

In the evening a longer walk - fifteen minutes or so, one subway stop - led to the Shinbashi area. This is a maze of tiny side streets, packed with minuscule restaurants that fill with salarymen (the Japanese word for middle-class office workers) at lunchtime. After work they're back, for a beer or two with their colleagues, and a plate of noodles or sushi before setting out on their long commute to the distant suburbs. It was in one of these, many years ago, that a friend who was learning Japanese managed to order a plate of chicken sashimi (yes, just raw chicken) with a bowl of what tasted like rotten strawberry jam.

Close to the main square at Shinbashi Station, the one with the steam locomotive in it, is a kaiten sushi restaurant. Kaiten - written 回転 in Japanese - just means "turning round". You've probably been to one - instead of ordering from a waiter, you have a conveyor belt in front of you covered in little dishes of sushi. You take whatever you want, and at the end they figure out your bill by counting the plates. This system depends absolutely on having a very high turnover. It takes only a short while, maybe 15 minutes, for the fish to start to dry out and look distinctly unappetising. Kaiten sushi tends to be a lunchtime thing, when there are big crowds in a short time.

An additional benefit of course is that you don't need to be able to speak the language. Assuming you can recognise the things you like - or don't mind taking a risk - you just pick things out as they pass.

A further sophistication of the same idea is to replace the conveyor belt by a canal with little boats carrying the plates of sushi. This was an American invention - Isobune Sushi in San Francisco's Japantown claims to have invented it, though for all I know so does every other boat sushi restaurant in the country. To my great frustration, I have never been able to work out what makes the boats move round the canal.

But back to Shinbashi. We first went to the Kaiten sushi on our first trip together to Japan (though we'd both travelled to Japan before). It's a very unassuming place, full of salarymen during the week and shoppers at the weekend. It's important to go when it's busiest, before about 1.30 - as I explained before. Sometimes that means a bit of a wait, then you squeeze onto two tiny stools (if there are two of you of course - though it's very common for people to eat there on their own), squashed between the other diners. Service is minimal, though courteous and attentive anyway since this is Japan. Every three places or so there's a hot water tap, a pile of cups and a box of teabags (o-cha - green tea - of course), along with a chopstick dispenser, napkins, soy sauce and pickled ginger. You just take what you need and wait for your favourite sushi to roll by. If you want beer or sake, you have to order that.

In the middle of the island, three or four sushi chefs toil continuously, replenishing the dishes. If you watch them carefully you can see what they are making, usually in batches of half a dozen or so dishes, and if it's something you're waiting for, you can prepare to grab it quick. The normal protocol is just to take things from the belt, but if you want something that isn't there or is a bit special, you can ask one of the chefs and they'll make it for you.

When you've had enough, you just stand up and walk to the door. The cashier shouts to the other staff, who counts your dishes and shouts back the price, you pay - usually in cash - and that's it. There's nor formality to it and of course, in Japan, no tipping.

For some reason we really took to this place. Every time we go to Japan we manage to squeeze in a visit. It hasn't changed in the 20+ years we've been going there, although I guess the staff must have moved on. Each time we dread that it will have closed - so many of our favourite spots in Tokyo have closed and been replaced by office buildings, like the "Rubbery Pancakes" breakfast spot next to the Shiba Park. But, so far, it has still been there every time.

Thursday, 8 September 2011

Dedicated Wallpaper Screen - everyone should have one!

There are all sorts of reasons why a person might need cheering up. Luckily, there are also all sorts of ways to cheer yourself up. For example, there are books which make me laugh out loud no matter how down I'm feeling - which really does make the black cloud go away, at least for a while.

One which I can really recommend is to collect all of your favorite photos, and have a second screen on your desk which shows you a randomly changing selection of them. They don't have to be pictures that you've taken yourself, of course, but there's something especially cheering about seeing places you've been, cool things you've done, happy times you've had...

It occurred to me a while ago that you could easily hook up a screen like this, to a superannuated computer. Gee, you could even build a little desktop gadget with a screen and a microprocessor... then of course before I could do anything about it, Philips brought one out. They were rapidly followed by a bunch of no-name Chinese products with an extra twist - they contained a virus that infected your computer as soon as you connected them. Neat trick. I had an actual Philips one, which was malware-free but not explosion-free. One day I noticed that it was a funny shape, and when I took it apart the internal lithium-ion battery had exploded, with enough force to bend the solid metal frame. It has never been quite the same since, and anyway there's no longer room for it on my desk.

At the same time I bought the picture frame, I started assembling my favourite pictures - the same ones as in my Flickr account, with the addition of some family pictures that I don't particularly want to share with the world at large. I found a wallpaper changer that worked well for me, and whenever my computer was idle, they'd scroll by in front of me. It was great, but there's one problem: if I'm not actually using the machine, I'm probably not looking at it. And if I am using it, of course, the wallpaper is invisible under the clutter of a dozen windows.

Fast forward to my new Linux system. At the same time I bought it, I also bought a new monitor. There was nothing wrong with the old one, except a broken stand elegantly patched up with cable ties, but the new one has more pixels, and bigger is always better (n'est-ce pas?). So, I suddenly had a spare large format (1600 x 1200) monitor, and thanks to a superhuman feat of tidying, a space for it on my desk.

It took me a while to get round to making this all work, though. There are people who have had multi-monitor systems for years, but I've always preferred to have a single large one - hence the present 1920 x 1200 display.

It seemed obvious to me that I would need a second graphics card, so off I went to Fry's and bought a low-end one, a Realtek HD5450. I've got used to how well Linux deals with new hardware, so I just plugged it in and expected it to work. My optimism was misplaced, however. I'm running a recent version of Linux (11.04, Natty) and all the bits weren't in the right place, fixed with a bit of googling. No matter what I did, though, the system only used the built-in graphics on the motherboard.

A bit more googling showed me how to change the BIOS setup so it would use the new card - but then it would only use the new card. There was no way to get it to use both of them at once.

There was a seriously heart-stopping moment in the middle of all this. There was one BIOS setting that resulted in a psychedelic display as the system booted - a patchwork of constantly changing colors. Eventually the system came up normally - but how could I change the BIOS settings again? It seemed for a short while as if the only thing to do would be to buy a new motherboard! Luckily, unplugging the new card magically made everything work again.

In the end the solution was simple. You can run two monitors off the same graphics card, just plugging them into two different sockets. And I didn't even need the new card - thank you Fry's for an extremely liberal return policy.

It still wasn't completely over, though. Convincing Linux to run both displays wasn't at all obvious. Not only did I have to tell both Gnome (the desktop system) and the monitor drivers, but I had to do it in the right order - otherwise one undid the changes to the other. The X-windows configuration had to be changed manually, using the magic command "sudo dpkg-reconfigure xserver-xorg". Somehow this reads your mind, figures out what you're trying to do, then generates the corresponding xorg.conf file.

Finally I had it all working. Still not there, though. The "wallpaper" monitor is to the left on my desk. But Gnome wants to put everything on the left screen, unless you've explicitly moved it to the other one. There's a box you can click to make a different screen primary, but it only has a limited effect. Luckily, most X apps remember where you last put them, so it's just a question of "training" the apps you use most. Every now and then, though, I click on something, and can't understand why nothing has happened. Then I look at the wallpaper screen, and it's over there. So I call it, "here Fido", over it comes, and another app has got it figured. (Well, actually I drag it with the mouse).

And what a constant pleasure it is, to see all these pictures of things I've done and places I've been. While I've been typing this, I've had...
  • the tiny landing strip in the remotest part of Baja California, whwere we went whale spotting
  • a beautiful, moss-clad waterfall in the Oirase Gorge in northern Japan
  • a still-life composition of freshly caught fish in the market in Cap Breton, France
  • several shots of the Golden Gate Bridge, taken from different aircraft at different times, including the ones taken while flying under it in the heli
  • a visit to Potsdam, Germany in 1986, when it was still East Germany, with just a single smelly Trabant visible in the whole of a huge plaza
...and of course lots more. The screen in the picture at the top is showing a public footbath at Sakurajima in the very southern tip of Japan. What a visual feast!

Monday, 5 September 2011

Memorable Meals #1: The Governor's Lunch, K, Japan

A few years ago my then-employer decided to open a development centre in Japan, and asked me to take care of making it happen. As a confirmed lover of Japan, I was delighted to do it. It was initially going to be quite small - and as it turned out, it has stayed that way - so the initial office was in central Tokyo, in one of the sales offices. But at the time, there was talk of expanding to something much bigger, maybe hundreds of engineers, and of opening a second office later on, outside Tokyo. This led to an invitation to one of the Japanese provincial towns, where the prefecture had established an advanced research centre for computer science. I'll be discreet about the actual place, just like in Japanese (and Victorian) novels, and call it K.

As a consequence, I was invited with our Japanese country manager to visit the town and its research centre. Also on the agenda was lunch with the Governor of the prefecture (roughly the equivalent of a US state).

There was a lot involved in setting up our operation, and I was in Japan for three weeks. Luckily we were able to work things so my wife came over with me, and we rented a very nice apartment in the Aoyama district of Tokyo. That's a story for another time, but meant that we both went along to K. We took the train, starting with the Shinkansen (bullet train) line from Tokyo out towards Niigata on the Japan Sea coast. It was February - Tokyo was cold, around freezing. The train trundles through the Kanto plain for about an hour then suddenly plunges into an enormous tunnel, over 20km long.

When it came out, we were quite unexpectedly in a true winter wonderland, with huge banks of snow beside the tracks and enormous snowflakes falling gently to ground. We changed to another train, which followed the valley for a while, then plunged into another giant tunnel. At the other end it was still snowing and we thought we must be high in the mountains still - until we saw the sea. Something I know now - but didn't then - is that the Japan Sea coast gets huge amounts of snow - tens of metres are common, even at sea level. The journey continued along the coast, past small fishing towns and villages, still in falling snow. I love travelling by train in Japan, and this was perfect. By the time we arrived at K it was dark.

The next morning, the day of the lunch with the Governor, was fine, though cold. Isabelle went out shopping and sightseeing - the town has a famous park dating back to the samurai era. It was really bitterly cold and there was snow everywhere.

The lunch was very nearly a disaster before it even started. The country manager was horrified to see me on my own. "Where is your wife?" he asked, shocked. It turned out that she was expected at the lunch too - which was a surprise to me, since in Japan business is entirely conducted between men. Women in the professional workplace are treated as honorary men, but families remain unknown even to colleagues who have worked together for decades. Luckily I managed to track her down - thank goodness we both had rented Japanese cellphones. We snatched her up in the main shopping street, in a kidnap scene from a bad movie.

The restaurant could have been anything from the outside, but once inside we realised that it was an extraordinary place. It had been there literally for centuries, since the days of samurai warlords. It's the kind of place that foreigners just never see, that you see on Japanese soaps when the political bosses get together to fix something behind the scenes. Being Japan, there is absolutely nothing ostentatious or showy about it, everything is in the details.

We took our seats around the table - or rather, non-seats. The Japanese tradition is to sit cross-legged on woven grass mats, or tatami. However even the Japanese find this uncomfortable, and increasingly you find an invisible hole under the table where you can put your feet, as you sit conventionally on the edge of the tatami. In this case, it was even heated to keep our feet warm. Each place had a menu card, and ours had been translated into English. The polite conversation began. It was difficult - we had a translator, but it's difficult to be spontaneous when every remark has to be translated. In addition to the Governor, there was also the head of the research institute that we would visit in the afternoon.

I came close to making a big mistake. There'd been a program on the television the previous night about the railway that used to run to a nearby rural town, very nostalgic with shots of old people coming home from the market, interviews with schoolchildren trying to make a museum out of the station. By way of trying to make relevant conversation, I mentioned it. What I couldn't know was that another nearby long rural line was about to close, no doubt the reason the program had been shown. Rural railways are a very emotional topic in Japan - they were being built until relatively recently, in fact this one only opened in 1964, and the Governor had heard more than enough about the topic lately. The language barrier came to our aid as he defended the decision to close the line.

Every dish was exquisite, served with charm and elegance. They were all delicious. At a refined meal like this, there are numerous dishes, all served separately and cleared before the next one arrives. That's unusual in Japan, where it's more common to bring most dishes at the same time, and the notion of a western-style course is much more fluid.

Well, there was one dish that caused us some difficulty. The Japanese name is "konowata", pickled entrails of sea slug. Remarkably, you can reuse the sea slug afterwards - its entrails grow back again, a useful evolutionary trait as it turns out. Luckily for us, it is astoundingly expensive, about $50/kg, which means we only got a tiny amount of it. It was the centrepiece of its course, but could be readily swallowed without touching the sides.

There were 15 dishes in total, each served on a special plate or dish which no doubt has some traditional significance. Beer and sake were served throughout, though we drank very little considering what was in store for the afternoon. Finally the meal came to an end and, after the usual polite formalities and much bowing, we went out into the snow.

It's unlikely I'll ever experience another meal quite like that one. The evening meal, in a hotel with the heads of some local computer companies, was utterly unremarkable. The next day we returned to chilly Tokyo, by plane this time, but the memories of the Governor's Lunch will be with us forever.

Wednesday, 31 August 2011

Boost: a retrospective (part 4) - the Curious

In part 2 and part 3 I talked about the best and the worst (in my opinion, naturally) of Boost. Here are some interesting things which don't fall into either of those categories.

Boost Units

Does it make you uneasy to use the same type - say float or double - to represent a whole bunch of things which are fundamentally different, like length, time, volume? Or related but measured differently, like millimeters, feet and miles? It has always made me vaguely uncomfortable, and of course it has led to some spectacular disasters (not mine!). But doing something about it would be a lot of work. Defining, say, a millimeter class would be easy, but handling all the legitimate operations involving more than one unit would just bury you.

Enter Boost Units, which has a completely generic understanding of all these things. All of the meta-arithmetic, like knowing that distance divided by time gives speed, is done at compile time using some very heavyweight template metaprogramming. But you don't need to know about that. You just declare d, t and v as furlongs, fortnights and furlongs_per_fortnight respectively, and dividing d by t gives you v. Simple. Define t2 as seconds and assign it to t2, and seconds will automagically be converted to fortnights (slightly more than a million to one - so one microfortnight is conveniently close to a second, a fact used in one obscure corner of DEC's VMS operating system).

I put this in the "curious" category, rather than the "good", only because I've never had a chance to use it myself, being a systems kind of a person rather than say a mechanical engineer. But if I ever get round to rewriting my robotics code in C++, I will certainly use it.

Shared Pointer

Memory leaks are the bane of C programming, along with buffer overflow. They can be largely avoided in C++ by using auto_ptr to represent ownership of a structure. But this breaks down if there is not a single owner, for example if an object needs to be passed on to another function and then forgotten. It's just about guaranteed that a program that works this way will have leaks, even if they only occur in obscure error conditions.

Reference counts are a partial solution, but they just replace one problem with another since now everyone has to be disciplined about adjusting them. And of course they-re intrusive - the object has to have a reference count, and know to delete itself when the count drops to zero.

boost::shared_ptr tries to provide a solution to this, by keeping a behind-the-scenes reference count object. On the face of it, it looks perfect. If you are dealing with all-new code, and you keep solid discipline about never using a regular C-style pointer to the objects, maybe it even is perfect. I've used it for managing buffer pools.

I put this in the "curious" category because of what happens if you have to deal with a less structured environment. You can extract the raw pointer easily enough, to pass to a function that expects it. As long as that function never expects to take ownership, that's fine. Above all it must never delete the object, obviously. But there's a more subtle problem. If you have code which uses a mixture of raw pointers and shared_ptr's, there's a risk of creating a second shared_ptr from a raw pointer. And that is catastrophic, because now there are two reference counts, and whichever one goes to zero first will delete the object, leaving the other with a dangling reference and, microseconds or days later, a mysterious segfault. Guess how I know.

Proponents of the class would obviously argue that this is something you should simply never do, that you should have the discipline to avoid. But if you had perfect discipline, you wouldn't need the class in the first place - you could just remember at all times who controls the object, and be sure they delete it if they need to. So really all it has done is replace one way to shoot yourself in the foot with another.

Really the only solution to this is to keep the reference count in the object. Boost provides a class called intrusive_ptr which supports this, but I find the approach kind of backwards. I preferred to write my own base class for the referenced object. More on that in another post.


The "sentry" is a programming paradigm for making sure that you undo everything you do, extending the "resource acquisition is initialisation" paradigm. The "do" part is done in the constructor of a sentry object, the "undo" part in its destructor. This ensures that the "undo" will always happen, even in the face of exceptions, return or break statements and so on. The classic example is locking, and indeed boost::thread provides a mutex::scoped_lock class which does exactly this.

But there are many other use cases, and the details of the do/undo operation vary quite a bit. For example, it's common in C to have a function that sets an attribute value, returning the previous value. The undo operation is to call the same function, with the saved value.

It's easy to write a sentry class for some particular case, like the mutex lock. It's not hard to write a generic sentry for a particular kind of do/undo - and indeed I have written a bunch of these.

But it seems to me that what would be ideal would be a generic sentry template class, that would figure out from the template arguments what kind of do/undo it is dealing with. This is beyond my own template metaprogramming skills, or at least beyond the learning investment I'm willing to make. But it does seem odd that it isn't part of Boost.


There are often times where it would be convenient  to have a small, anonymous function - for example, the ordering function passed to a sort operator. Java and Python both provide ways to do this, which in computer science is called a "lambda function". The new version of the language, C++0x, also supports this.

But until that's available, C++ requires you to explicitly define a function, generally nowhere near the place where it's used. This just makes code harder to read and maintain.

boost::lambda is an ingenious attempt at solving the problem, pushing template metaprogramming to its utmost limits. The basic idea is to define a placeholder for a parameter. Then, simply using the placeholder implicitly declares a lambda function. Conventionally, the placeholders are "_1", "_2", etc. Simply writing "_1*2" generates a function that returns twice its argument - regardless of the type of the argument you supply later, as long as it supports multiplication of course. For trivial functions like this, Lambda works very nicely. (Although boost::bind also uses this syntax, and inexplicably, the two trip over each other. There's a workaround, by #defining an alternative syntax for lambda. But it's odd that Boost let this slip by).

Unfortunately, C++ doesn't provide a clean syntactic way to do a lot of things that ought to be very natural, like calling overloaded functions. So, although the authors have put a huge effort into trying to make language features work, in the end Lambda is more of a curiosity than a general purpose facility. I've used it to construct arbitrary combinations of filter functions based on user-supplied criteria, for which it did the job nicely and much more simply than any alternative I could think of. But you need to find the right application.

Tuesday, 30 August 2011

Worst Ever Dining Experiences #4: South Kensington, London

Before I bought my flat in London, we often used to stay at a boutique hotel in South Kensington called Number 16. It was a converted row of houses, all terribly English. Eventually it priced itself out of what we thought was reasonable, considering the small if pretty rooms, and we swapped our allegiance for the Royal Garden. There are leafy squares to the south of Old Brompton Road, and plenty of restaurants and useful shops within walking distance. South Kensington station is close by, and the South Ken Museums are a 10 minute walk away. All in all, a nice area.

We'd just arrived from somewhere. We were tired, and didn't want to hike across London. Somewhere, we saw something favourable about an Italian place, in one of the side streets by the tube station.

It seemed fine, typical of London neighbourhood Italian restaurants. I can't remember what we ate, probably something involving veal or pasta or both. What I do remember is the wine we ordered, a bottle of Chianti Classico - a reliable standby with Italian food. When the bottle came, it was a Chianti but not a Classico. This is more than just a matter of a name - the "Classico" suffix represents a 50% or more increase in value and in quality. But it tatsed fine, and we weren't in a mood to make a fuss, so we drank it with our meal.

When the bill came, I noticed a line that said "Chianti Classico". I mentioned to the waiter that this wasn't what we'd had. The bottle was still there on the table for him to see. His reaction was a surprise, to say the least. He started screaming at us, accusing us of goodness knows what. I suppose he thought we were trying to get a cheap meal. We weren't of course, but we don't like paying for things we didn't get.

This went on for a while, and no doubt I yelled back at him, until eventually the owner came by. By this time I was certainly in no mood to pay for the "Classico" we hadn't had, and I told him so. He came straight out and accused me of trying not to pay. Eventually, getting fed up with whole scene, I suggested that he call the police.

Suddenly everything changed. He became as nice as anything. "I give you dinner for nothing," he said, "Next time you in London, you come here, I give you wonderful meal, best wine." Clearly, a visit from the police was not at all his idea of how the evening should end. He no doubt had a kitchen full of illegal immigrants, and probably quite a few health code violations. Restaurants are an ideal way to launder illegal money, too (as are nail parlours, but don't ask how I know that). Who knows what else he was afraid of.

So everything was amicable, and to profuse apologies we left. As you can imagine, we were quite bemused on our short walk back to the hotel.

The place was still there for quite a while afterwards, though it has gone now. Evidently the police took a while to catch up with it. Needless to say, we never did claim our free meal.

Thursday, 25 August 2011

Boost: a retrospective (part 3) - the Bad and the Ugly

In part 2 I talked about my favorite elements of the Boost libraries. Boost is wonderful, but even so there are things that are not so good. These, the ones which (in my opinion) are best avoided, form the subject of this post.


I wrote a while ago about my frustration with this library. It seemed the perfect solution to a data pickling need I had, until I discovered that it can't cope with polymorphism. It claims to, but it randomly crashes deeply nested in incomprehensible function calls if you try. There may have been a solution, but life is just too short to figure it out. The reason for all this is that its authors decided to invent their very own subclassing scheme, completely orthogonal to the one that C++ uses. They may have had their reasons, but it's a complex subject and clearly they missed something.


If you've ever needed to do low-level socket I/O, you've probably been tempted to write an object wrapper around the C function calls and data structures. You may even have taken a look at Boost to see if they have already done this. In which case, you'll find that they have. I've certainly been down this path, and discovered Boost Asio at the end of it.

You will next discover that Asio is extremely complex, with all kinds of interacting classes that you have to be aware of and create. I spent a day or so trying to get my head around it, finally getting to the point where I felt safe putting fingers to keyboard. Then I discovered that despite all that complexity, it couldn't do what I needed. This was nothing fancy, just listen on a port, and create a thread to handle each TCP session as it arrives. Turns out Asio has a race condition - by design - which can result in missed connections. Some searching showed that there's a workaround for this, but it's complex and requires even more delving into its complexities - and isn't without its own problems anyway.

I had a long meeting to attend, so I figured I'd print the documentation and peruse it during the meeting. Over 800 pages later, my meeting had finished anyway, but the printer still hadn't. At this point, I decided that anything which takes 800 pages to describe it - for such a relatively simple function, this isn't Mathematica after all (1465 pages) - just can't be worth the learning curve.

I wrote my own family of socket classes. It actually took me less time to write and debug than it did to print the Asio documentation, never mind read it! I've been very happily using them ever since. Probably, you will do the same, but if you'd like to use mine, you're welcome. You can find them here.

The Build System

Everyone knows Make. It's convoluted, nearly incomprehensible, and a syntactic nightmare, but everyone has used it and can bodge their way out of a tight corner if they need to.

But why use something everyone knows, when you can invent something unique of your own? Sadly, this is the path that Boost took. They have their own unique build system called Bjam. I'm sure it's very elegant compared to Make - it would take a huge effort not to be - but it's still very complex, and poorly documented too. In fairness, it does (mostly) "just work" if you need to build Boost from sources. But if for whatever reason you do need to get under the covers, woe betide you.

I discovered this when I needed to cross-build Boost for our embedded processor. This is always tricky because of the config stage, where the build system looks to see what capabilities the system has, where things are located and so on. For a cross-build, of course, you can't auto-discover this just by poking around at the system you're running on. That part went OK, though. However editing the build files to pick up the right cross-compiler, cross-linker and so on, was just impossible. I found quite a bit about it on the web, but never quite enough to make it work.

Fortunately, our hardware ran a complete Linux system and with a little fiddling we could just build it native on our box. But if you can't do this - and most embedded systems can't - then you can forget using Boost. Which is a shame.

Wednesday, 24 August 2011

Boost: a retrospective (part 2) - the Good

In part 1, I explained how I came to regard Boost as an essential part of C++ programming. There are some parts of Boost that I can be pretty sure I'll use in any decent sized program, to the point where I have a generic header file that pulls them all in and makes them accessible without even needing to ask.


Before Boost came along, you couldn't really use regular expressions in C/C++. Which is a great pity, because they are just incredibly useful especially if you have to deal with any kind of human input. There is a Gnu regex package, but it is GPL, so unusable in anything you plan to sell or keep to yourself, and it has a determinedly C-flavor interface which means you'd have to write a C++ wrapper round it anyhow. My Winlife32 program had to parse human-style text input without regex, and what an incredible pain the neck that was!

Lexical Cast

Can you remember how to use atoi, itoa, atof,  and all the other zillion variants along the same lines? No, neither can I - and actually quite a lot that you'd expect to find, don't even exist. lexical_cast to the rescue! To convert a string to any type - including your own types as long as they define a stream >> operator - just write lexical_cast<type>(str). If it can't be converted, you get an exception which you can use to trigger an appropriate error message.

In the other direction, lexical_cast<string>(value) will convert anything that has a stream << operator to a string. Simple, but indispensible.

Function / Bind

I already talked about these in Part 1. I can't imagine trying to write code without them now. Although they are proscribed by Google's internal coding standard, because they "would encourage functional style programming". I have no idea why that is supposed to be bad!


Printf is incredibly useful, but fraught with issues viewed from a 2011 C++ perspective. It's not type-safe, it's very fragile and can cause your program to just roll over and die. And of course there's no question of dealing with user-defined types. Boost Format is used pretty much exactly like printf, except it fixes all of these problems and more. For example:

cout << format("unit %d temp = %.2f deg C") % index % get_temp(index);

will do exactly what you'd expect (note the neat reuse of the % operator, similar to Python by the way). But actually, you don't need the "d" of "%d", because it knows it's dealing with an int, and will format it accordingly. Replace the int with a type of your own, having a stream << operator, and it will work too.

There's a lot more to it, if you want to use it - numerous extra formatting options, positional and named arguments.


The STL containers, and their Boost extensions, are incredibly useful. But iterating through their contents is so painful, syntactically. After the hundredth time you've typed something like:

for (vector<int>::const_iterator i=vec.begin(); i!=vec.end(); ++i)

you are really just about ready to scream. Boost Foreach to the rescue! You can replace all this with:

foreach (int i, vec)

(OK, I've cheated a little, my generic header file #defines "foreach" as "BOOST_FOREACH" just to make the code prettier). Notice that i is just an int, not an iterator, so you don't need to use '*' or '->' with it. This is especially neat for containers of pointers, which get very awkward. (Boost has another solution for those, too, the Pointer Container library - though I've always found it doesn't quite do what I need).

The new C++0x standard makes the problem go away, since it has a built-in container iteration syntax, as well as the auto type declarator. But Foreach will have saved a decade or so of tedious and ugly typing.


I've eulogised elsewhere about this. Suffice it here to say that you get all the convenience of the STL container types, without the behind the scenes manipulation of little extra memory blocks, and their associated run-time cost. For anyone whose system-programming teeth were cut in C or assembler (or Bliss!), this is just so much nicer.


Who hasn't struggled with all the complexities of date and time? Input, output, arithmetic - they're all a nightmare. Nearly all of these problems go away with Date_Time. Date and time arithmetic is simple, comparisons are simple. Unfortunately input and output are heavily tied into the C++ locale system, which is basically incomprehensible. It's much easier to write your own parsing and output code than it is to figure out how to make locales work. But that's a nit because for everything else, these classes are indispensible.


Just because I haven't mentioned them, doesn't mean I don't like the other bits of Boost (though see the forthcoming part on the Bad and the Ugly). Special mention should go to Thread and Python, which are both indispensible if you want to use threads or interact with Python, respectively. Each of them takes a fairly ugly C interface and wraps an elegant C++ interface around it. Python makes it trivial to combine Python and C++ code, or support Python scripting within a C++ app.

Part 3: the Bad and the Ugly

Tuesday, 23 August 2011

Boost: a retrospective (part 1)

My love affair with Boost started with my first, self-appointed programming task at Anagran, the fan controller for our box. I wanted a table of functions, corresponding to each of the temperature sensors. Some of these were parameterless, corresponding to unique items, while others were indexed by interface card number. I wanted to be able to put a "partly cooked" function object in the table, with the interface number frozen but other parameters to be supplied through the ultimate call. This is called a "partial function application" or "partial closure" in computer science.

STL provides C++ with some glimmerings of functional programming, with "memfun", "bind1st" and so on. It seemed like it ought to be possible to write something appropriate, but making it usefully generalized also seemed like a lot of work. Surely someone must have done this already!

Searching for it led me to Boost, "one of the most highly regarded and expertly designed C++ library projects in the world" as they modestly say at the top of the front page. It is however true. It's a huge collection of highly-generalized classes and functions for doing an amazingly large number of extremely useful things. It's an open-source project whose authors, while not anonymous, keep a very low profile. I can only assume they love a challenge (and have a lot of spare time), because they do some extremely tricky things, under the covers. But for the user, they're mostly very straightforward to use.

So over the last five years, I've discovered more and more that can be done with Boost. Although I've called this a "retrospective", I'm not planning to stop using it.

Boost makes extensive use of "template metaprogramming", which is a kind of compile-time computing. When C++ templates were invented, the idea was to allow simple compile-time parameterization of classes and functions, for example so you could write a "minimum" function to return the lowest of its arguments regardless of whether they were int, float, double or some user-defined class. As the concept evolved, it became possible to make very complex choices at compile time. In fact, you can write just about any program to produce its output directly from the compiler, without ever even running it, if you try hard enough. It's hard to get your head around, but fortunately you don't need to.

Function and Bind

These were the first Boost packages I discovered. Function defines a general, templatized function class. So you can define a variable as "function<int(foo*)>" and assign to it any suitable function. In particular, assign a member function of the foo class and all the right things will happen.

The Function class is useful, but it is the Bind class that really transforms things. You can take any function, bind some or all of the parameters to specific values, and leave the others (if any) to be supplied by a subsequent call to the bound object. This is exactly what I was looking for in my fan controller. For example, suppose you have a function "int foo::get_temperature<(double)>". Then you can write:

  function<int(double)> fn =
    bind(&foo::get_temperature, my_foo, _1);

to store a function which will apply its argument to the "my_foo" instance of foo, which you use for example as:

  printf("temperature at %f is %d\n", v, fn(v));

(Of course you shouldn't be using printf, you should be using boost::format, but that comes later). The "_1" is a placeholder, whose meaning is "take the first parameter of the final call, and put it here". Bind takes care of types, making sure that the actual parameter is (in this case) a double, or something that can be converted to it. If you want to, you can even apply bind to previously bound functions - though you might want to ask yourself why you're doing it.

This is absolutely perfect, for example, for callback functions that need to keep hold of some context. In C you do it using void* arguments, which is unsafe and generally wretched. This can be avoided in C++ by defining a special-purpose class, but that requires the caller to know about it, which ties everybody's shoelaces together more than is healthy.

The only problem with function/bind - which is true of any code that makes heavy use of templates - is that compiler errors become incredibly verbose and just about useless. A single mistake, such as getting a parameter type wrong, results in pages of messages, none of which gives you the slightest clue as to what you actually did wrong. The first time you compile a new chunk of code that makes extensive use of bind, you will typically get thousands of lines of errors, corresponding to just a handful of typos and the like. The trick is, to find the message line that gives you the actual source line - which is buried in there somewhere - then just go stare at the line until you figure out for yourself what you did wrong. The rest of the messages can be summarized as "you did something wrong on this line".

Part 2: The Good (things I just wouldn't live without)
Part 3: The Bad and the Ugly

VirtualBox - virtually complete: part 1

My new Linux machine is almost complete now. It has been running for a couple of months. Since I lost my laptop along with the company I worked for, and haven't seen a reason to buy another one yet, the Linux machine has become my main computing platform for just about everything.

However there are some things which can only be done on Windows. One is iTunes - I only use it as a backup for my iPhone, but for that it is indispensable. Another is the package that updates the navigation data for the plane. Then there is expensive software I bought for Windows and don't plan to buy again - Mathematica, CorelDraw and so on. Not to mention my HP printer/scanner which has no Linux driver. So I've had two machines sitting next to each other, with a  KVM switch to go back and forth when needed.

Of course the solution to this is obvious - virtual machines. I've been idly looking at VirtualBox, one of the open-source VM solutions, for a while. It took something odd to make me dig a bit deeper - the discovery that our electricity supplier (PG&E) has a web page where you can see your hourly consumption. It's fascinating to study, and one thing it showed is that my two desktop computers were accounting for about 60% of the house's background electricity usage. So, turn one of them off, instant 30% energy saving. (If only I could do something equally miraculous for the pool, which accounts for over half of our electricity).

I started by creating a second Linux system. That seemed less scary, though as it turned out it was harder than Windows. It's simple enough - tell VBox to create a virtual machine, then boot it with a Ubuntu CD in the drive and it pretty much goes all by itself. My plan for this VM was to use it as a web server for stuff I want to host at home, rather than through my web provider. It all went well, until I tried to set up a "shared folder".

In its pure form, the VM concept means that the VM runs in complete isolation in a bubble inside the real operating system. This is fine until you do actually want to share stuff. You can do it over the net, using FTP or whatever, but even that isn't "out of the box". By default, Vbox creates VMs using internal NAT addresses, so there is no way to access the VM from anywhere else, including the host. That can be fixed with a couple of clicks, selecting "Bridged Adapter" in the "Attachedto" drop-down, instead of "NAT". But still, it's a clunky way to do things.

So you can also create a "shared folder". This is just a regular directory on the host system, but it looks like a remote filesystem to the guest. It's easy enough to set up, but I just could not get it to work. I successfully mounted it just once. After that, attempts to mount it always failed with "no such device" (or some such). Even deleting it and creating a new one didn't work.

Finally I discovered that, because I'd ticked the "auto-mount" box, it was being automatically mounted in /media. Well, duh, I suppose.

With that done, my Linux VM was usable. Fortunately, because it turns out in my main (host) system, Firefox had autoupdated itself to V6 - and the Flash player doesn't work an more. No more stock charts from Google, no more news video clips from the BBC. This seems to be a problem that a handful of people have run into, with no solution. So for now, I just use an older version of Firefox running inside the Linux VM whenever I need Flash.

The next step was to install a Windows VM. But this is already too long, so that will be for another time

Saturday, 13 August 2011

IAS - the PDP-11 Interactive Applications System

(This originally appeared on my web page).

DEC's approach to operating systems for the PDP-11 was anything but disciplined. New ones got invented every time some engineer or marketing person blinked. In the early days, there was a real-time kernel called RSX-11A, designed for memory-resident applications in what we now call embedded processors. Features got added to this rapidly - code bloat is nothing new. By the time it got to RSX-11D it had a complete disk-based file system, a program development environment, and support for every peripheral in the Small Computer Handbook (and there were plenty of them - peripherals on the PDP-11 obeyed the same strategic imperatives as operating systems - see above). At this time, a bright young engineer called Dave Cutler decided that enough was enough, and set out to create a small system that would do the same, which he called RSX-11M.

Meanwhile, the PDP-11 also had a timesharing system very loosely based on TOPS-10, called RSTS/E. A senior engineering manager, newly installed in Geneva, decided that it would be a smart move to develop a system that could do both real-time and timesharing, based on RSX-11D. It was specifically targetted at the planned PDP-11/70, which was a kind of super-11. Since he had newly moved to Europe (his name was David Stone, by the way) he gave the project to the European Software Engineering group that he had just invented in Reading, England. This was about the time that I joined DEC, and after a few misadventures I found myself assigned to the project.

The system was to be called IAS, which if I remember rightly stood for "Interactive Applications System". It added to the RSX-11D kernel a clever timesharing scheduler, a bunch of security features, and a new command language. These were the days of MCR, a command language which makes even the Unix shell look lucid. (To delete a file you typed "PIP file/D" for example). The then-boss of software decided we needed a Digital Command Language, which of course later become a feature of VMS, but IAS was the guinea-pig. In fact, all DCL commands were translated into the corresponding MCR and the fired off to the appropriate utility. The command interpreter that did this was thrown together in great haste, and remains to this day the nastiest piece of software I have ever encountered.

I had tremendous fun on IAS. Like V1 of any system, it lacked just about every feature anyone wanted, and they all had to be added for V2. It says something for the team that in fact they mostly got there in V3, and they mostly worked. The team by the way consisted of about six people - that's probably about the same number that Microsoft has doing quality control on the stupid paperclip in Office 97. My first job was to write the driver for the latest new disk, the RK06. It had about the same capacity as a couple of floppies; four of them would fit in a six-foot cabinet. I was duly sent on the course for writing device drivers, but on the first morning I finished reading the manual and by the end I had coded the driver. It did various nifty things that nobody had done before on the 11, like overlapped seeks, and ended up becoming the basis for all future IAS and RSX-11D drivers although I remained unhappy with a lot of it.

My next job was to write a new terminal driver. Despite what I said about the command interpreter, the old terminal driver was pretty special too. Support for new features and new hardware had been thrown in over several years, and it was impossible to figure out how it worked. One story that sticks in my mind: I changed it to suppress nulls on formatted output (because of a misfeature in the command interpreter). Thereafter, if you typed rapidly, it would drop the first character of every other line. I never figured out why, I just removed the fix.

The terminal driver was one of the most enjoyable bits of work I ever did. It was all table driven, and in fact was object-oriented 15 years before it became fashionable. Thus adding a new device just meant writing some standard routines and plugging them into the tables. It seems obvious now, but it was pretty revolutionary at the time! I cooperated with the guy who was writing the driver for the new VMS system, so when I invented "read with prompt" (mainly to make the output on hardcopy terminals look prettier) this found its way into VMS, and ten years later was used in a way that I certainly never thought of to double the performance of All-in-1.

All of this found its way into V2. But by then, Cutler had decided that RSX-11M was going to take over the world. Since he was at the heart of things in Maynard, and we were 3000 miles away in Reading, it was pretty easy for him to get the sales and support people to listen to him. IAS did get some very loyal customers, including Boeing and the US Navy, who stuck with it long after Digital had tried to kill it.

In fact, as the cost of developing and maintaining software soared, Digital did try to rein in the PDP-11 situation. As a result, we had to combine IAS and RSX-11D into a "unified product strategy". (I seem to have spent a lot of time over the years taking products that were never meant to be the same thing, and making it look as though they were).

IAS had many features that RSX-11M didn't, such as a proper timesharing scheduler. This led to RSX-11M+, which was RSX-11M with a bunch of features intended to match IAS. This was really a stretch for 11M, in complete conflict with the "size is the goal" philosophy which Cutler had made into a rubber stamp that appeared on all of the early 11M design documents. Nevertheless, M+ had the visibility in Maynard and IAS didn't, and got the development funding. This meant that several new features first appeared in M+ and then had to be retrofitted into IAS.

One of these was "PLAS" (I forget what it was supposed to stand for), which gave programs access to the memory management features and was a kind of do-it-yourself virtual memory. It fell to me to implement the linker support for this. Now the linker was Cutler's first ever piece of software at DEC. It was very clever; doing memory layout for a machine as constrained as this could never be easy. In addition, it supported PSECTS which were modeled on what the IBM/360 did for memory management, and of course found their way into VMS as well. Thus a memory section could be shared between overlays, or not, and could be marked for code or for data, and could be overlaid to support Fortran COMMON or not, and so on. There were about seven different PSECT attributes, and so over a hundred different ways that memory allocation could be handled. And overlays - who remembers those? The linker had a write-only language called ODL (Overlay Description Language), which allowed you to set up hugely complicated overlay structures. As the PDP-11 address space (64 kbytes!) become more and more of a constraint, ever-fancier overlay techniques were invented, and since the linker had to handle them all its own overlay structure was the most complex of the lot.

But by this time the writing was on the wall for the PDP-11 as a general-purpose machine. The VAX and VMS had been a huge success and all serious investment went on them, rightly so. Personally I moved on after we released V3, in about 1979, but IAS retained an engineering group for a few more years. I think DEC continued to support it, for the benefit of the handful of big customers (like the US Navy) who were still using it, up until 1988 or so.

The PDP-11 spawned several great operating systems, the most famous nowadays being Unix. But IAS had something the others never did, a unique ability to support both timesharing and real-time applications at the same time. A big 11/70 - which is to say a megabyte of memory and a few tens of megabytes of disk - could give decent timesharing support to 20 or 30 users, and there were people who ran it at the limit of 64 users and seemed happy with it. Try telling that to the youth of today!

Tuesday, 9 August 2011

Python and Tkinter: wonderful

Having time on my hands at the moment, and no work commitments - that's another story though - I decided to start taking a new look at the robotics stuff I was playing with a year or so ago.

I'd written nearly all of the code to make a six-legged robot - a hexapod - walk with various different gaits and postures - the so-called inverse kinematics. It was in straight C, since I intended it run on a little embedded CPU which had no support for C++, nor floating point for that matter. And I'd built a development environment, including visualisation for the leg movements, using Visual Studio.

Things have moved on since then, though. For one thing I've pretty much switched to Linux for my computing environment. For another, the Roboard has really become the obvious onboard computer - it is now available with Linux, and it offers a full-function x86 including floating point, in a tiny size that will fit in my fairly small hexapod. And since it supports full GCC, I can write the code in C++. The C code is just so cluttered - I can't for the life of me imagine why anyone would prefer to code in C. It's full of irrelevant details that make it hard to read and even harder to get it to work. So, a rewrite is called for.

The only problem, is the GUI that I'd painfully created using the Visual Studio tools. Painful because there are 6 legs, and each has numerous parameters and state variables. I'd created the dialog box from hell. Every tiny change meant nudging numerous components around to get it to look right. What a pain. But it was done, for now anyway.

That was when I thought about Tkinter, which I've never used before. I've become a huge fan of Python in the last couple of years, using it for anything where performance is not a big deal. I also wrote a very powerful Python-based scripting system for my now-defunct employer, using Boost Python. So using Python and Tkinter for the GUI was kind of an obvious thing to do.

Somewhere in the mists of history I acquired Python and Tkinter Programming, which I think is the definitive book on the topic. I skimmed that, and with frequent help from Google - especially this site - started putting my new GUI together.

What a pleasure! Tkinter automatically takes care of making a reasonable layout, given some general guidance through the pack and grid functions. You no longer have to think about the minutiae of positioning, or spend ages getting boxes to line up with each other. I just couldn't help putting together a bit of infrastructure for collections of config variables, so they are now super-easy - just a list of names and default values and Python and Tkinter take care of everything.

In total it has probably taken me about 6 hours to get everything together - but that included learning Tkinter from scratch and writing quite a bit of infrastructure. And now I have everything I need to control my inverse kinematics, and have an animated visualisation of what it's doing.

I'll never do GUIs any other way now. Tkinter is wonderful!

Saturday, 6 August 2011

Favourite restaurants #3: Pizza Cresci, Cannes

My sister used to buy the Daily Sketch, a now long-forgotten English newspaper, on her way to work every day. This was a long time ago - she married and moved out when I was 11. When she came home I would seize it and read the cartoon on the back page, Peanuts. Among the many incomprehensible cultural references, to a child growing up in England in the 1950s, was the occasional mention of "pizza pie". Pizza was pretty much unknown in England back then - probably there were Italian restaurants in London that served it, but those were hardly the kind of places we could afford to go to. It would be a good few years before I'd find out what it meant.

Now, you can probably get pizza in every country in the world. Really it's amazing how quickly it has spread. Of course it was already commonplace in the US back then, which was why Charlie Brown took it for granted. I've eaten pizza in just about every country I've visited, there are times when you just need a break from the local food no matter how much you like it - as in Japan - and certainly if you don't, as in Korea.

Pizza's introduction to England was courtesy of Pizza Express, a London chain (originally) that made them before your very eyes, and made a very tasty pizza too. They even published a pizza cookbook, which worked surprisingly well considering that a domestic oven doesn't get anywhere near hot enough. Though my own introduction to pizza was at a local restaurant when I worked in Reading, Mama Mia - long since closed I'm afraid.

When we lived in France, we would make the pilgrimage every summer right across the south to the beach town of Hossegor - site of another favourite restaurant. It was a long drive - 8 or 9 hours, especially before the autoroute was finished and you had to dice with death on the three-lane stretch between Salon and Arles. By the time we got home we were exhausted and hungry. We would pile out of the car, leaving it packed to the gills with bags and often cases of wine that we'd stopped off for at Buzet, and cram into Isabelle's tiny Abarth to drive down to Cannes to eat.

Tradition had it that we always went to the same place, Pizza Cresci on the waterfront. Just the location is the stuff of dreams - right across the street from the harbour, packed with millionaires' yachts. Oh, and right next to the Municipal Police, hence easily recognised by the illegally-parked police cars, as you can see in the picture at the top. You might expect that in such a touristy location, the food would be mediocre. You couldn't be more wrong!

Pizza Cresci has, quite simply, the very best pizza I've ever tasted, anywhere in the world. I've been to the original pizza restaurant in Naples, and to some of the most famous ones in the US. They've all been good, but none has been quite as good as Cresci. My special favourite is their Pepperoni. They use a thin crust, crisp around the edges but deliciously soaked in melted cheese and oil in the centre. With a sprinkling of hot oil... just sinfully moist and delicious. Isabelle's favourite is something quite unique, an aubergine (eggplant) pizza, very thin slices of aubergine, a little cheese, and the same yummy thin base.

Of course we went there at other times too - if we were tired and just couldn't be bothered with eating at home, it was so easy. And it's huge (by French standards anyway), so even when it's packed at the height of the tourist season, you never have to wait long. But since moving to California, it's a wee bit less convenient and we hadn't been there for a long time. Then this spring, we visited Sorrento and Naples, then spent the weekend in Nice. Fresh from Napoli, the self-appointed capital of pizza, we decided to have lunch there. It was as wonderful as in our memories! The pepperoni pizza was delicious, the aubergine too (so I'm told), and as always with view of the Cannes waterfront.

Forget all the famous many-starred restaurants in Cannes, head straight for Cresci. It's the place to eat!