I've been working lately on a system design which requires, among other things, highly effective and efficient error-correcting codes (ECC). We've decided to use a Low Density Parity Check (LDPC) code. These are currently considered to be the best "soft" ECCs, i.e. where there is information about the reliability of each received bit as well as its putative value. The story behind LDPCs is interesting: they were invented by Robert Gallager in his PhD thesis in 1960, but they were way beyond contemporary computing power. It didn't help that when he wrote the definitive textbook on ECCs in 1966, he didn't mention them! So they languished, forgotten, until a decade ago. By then TurboCodes had been independently invented. They also provided a means for "near Shannon limit coding", i.e. extracting as much data from a noisy signal as theoretically possible.
LDPCs have two properties which led to the problem I needed to solve. First, there is no formula that provides the best code for a given set of constraints (block size and code rate). You can use the same general scheme to build ten different codes, the details being decided by a random number generator, and some will be significantly better than others. That means that to find the code you want to use in practice, you need to generate a whole bunch of them and try them out over a large number of messages and error densities.
That leads to the second problem. An LDPC starts out as a very sparse matrix, describing a large number of parity checks each of which covers a small number of bits - hence the name. We want to have 32768 bits of user data, and a reasonable configuration is to have each bit covered by three checks. If we use a half-rate code (same number of data bits and check bits) then each check covers six bits. So we have a matrix where each row is 64K bits long and has just six 1 bits.
The matrix doesn't say anything about which bits are data and which are check bits, only that a valid codeword has to satisfy all the checks. So given 32K data bits, the way to generate the corresponding 32K check bits is to solve the 32K simultaneous equations that the sparse matrix implicitly describes. Easy!
Well, no, not easy at all. The practical use of LDPCs requires a transformation of the matrix into something that normal hardware or software can encode in a linear and reasonable amount of time. Solving the equations directly is an O(n3) problem, i.e. the time required increases with the cube of the number of unknowns. So we have to some preprocessing on the matrix to get it into a form that the hardware can work with. There's an excellent paper by Qi and Goertz describing how to go about this. The algorithm it describes is, not surprisingly, also O(n3). This needs to be run for every trial code, and we would like to try hundreds of them.
Our first attempt at coding the algorithm was written in Python, using the obvious data representation, i.e. a big matrix containing mostly 0s and a few 1s. It was written so we could understand the algorithms and piece together a complete system, rather than for performance. On a "toy" code of a few hundred bits, it took a couple of minutes to run. On slightly larger codes - nowhere near the size we need for our system - it took most of the day. By extrapolation, to generate a life-size code would have taken months or years.
Clearly, we needed an implementation more focused on performance - not just code optimization, but selecting algorithms to minimize the time at step of the algorithm. And that is where it begins to get interesting. More on that in Part 2.