Wednesday, 23 June 2021

Kotlin for a Python and C++ Programmer

A while ago I got interested in Kotlin as a possible type-safe alternative to Python for our system. A lot of the non-performance-critical things are done in Python. As a lot of people have discovered, Python works well for small programs. But small programs have a tendency to get bigger, and to take on a life of their own. Maintaining large Python programs is hard, and refactoring them, for example to change the way objects relate to one another, is just about impossible. You're sure to miss some corner case which will show up as a runtime error much later.

Our CLI was the obvious candidate for experimenting with Kotlin. This is several thousand lines of Python and predictably, it has become hard to maintain. My first efforts with Kotlin were not very successful. It is based on Java and inherits some mis-features and design baggage from there, which stopped me from doing what I wanted. Also, the build environment is a nightmare.

Then recently I took another look at this project, and saw a way around my previous problem. As a result, I have since written a complete Kotlin implementation of the CLI. It's a nice piece of code and much easier to maintain and work with than its Python equivalent.

Here's a summary of the good and bad points of Kotlin, based on my experience.

  • Really, really good: the Kotlin language. It's a delight to use with lots of features that lead to compact, uncluttered code, yet totally type-safe. More on that later.
  • Very good: the IDE (called Idea). Intuitive and easy to get used to, and makes writing code just so easy. Only problem is that it occasionally crashes, taking the system with it.
  • OK but not great: libraries. Like Python, Java supposedly has libraries for everything. But finding them is hard, and figuring out how to use them from Kotlin is even harder.
  • Awful beyond belief: the build system, a dog's breakfast of several different tools (Gradle, Maven, Ant, who knows what else). As long as you stay within the IDE, life is mostly good. But at some point you generally need to build a stand-alone app. There is no documentation for how to do this, and what you can find online is confusing, contradictory and rarely works. Probably if you come from a Java background all this seems normal.

Idea: the IDE


Kotlin is joined at the hip to its IDE, which comes from the company that invented the language (Jetbrains). The same company also produces PyCharm for Python, and the two are very similar. It's everything you could ask from an IDE. The instant typing-time error checking has spoiled me completely, and now I expect Emacs to do the same when writing C++ - except of course that it doesn't.

The biggest problem, running it on Ubuntu, is that every so often it freezes and takes the entire GUI with it. If you can log in from another system, "killall -9 java" kills it and allows the system to keep going. Otherwise, you just have to reboot.

It does a good job of hiding the complexities of the build system, as long as you want to stay entirely in the IDE. But the problem with anything "automagic" is what happens when it goes wrong, or doesn't do what you need. The build system is a nightmare (see later) and the IDE offers no help at all in dealing with it if you want to create a stand-alone app.

It sometimes gets confused about which library symbols come from, and flags errors that aren't there. It still lets you run the compiler, so it's only a nuisance. It also isn't very helpful when you add a library that comes from a new place. You have to hand-edit an obscure Gradle file, and then restart Idea before it understands what you have done.

The Language


Once I got my head around it, Kotlin is the nicest language I have ever used. It particularly lends itself to functional-style programming, but there's nothing to stop you using it like C or Fortran. The few irritating things are a result of its Java legacy - fundamentally, Kotlin is just syntactic sugar over the top of Java. Some of the really nice things:
  • inobtrusive strong typing: everything's type is known at compile time, yet you rarely have to be explicit about types. The compiler does an excellent job of figuring out types from context, like auto in C++ but much better.
  • ?. and ?: operators: between them these make dealing with nullable values very clean and simple. ?. lets you write in one line what would take a string of nested if statements in C++ or Python. 
  • lambda functions: all languages now support lambda (anonymous) functions, but in both C++ and Python they're an afterthought, and it shows. In Kotlin they are an integral part of the way the language is meant to be used, making them a very clean and natural way to express things.
  • the "scope functions": a collection of highly generic functions that make it easy to do functional programming. For example, the 'let()' function allows you to execute some procedural style code using the result of a functional call chain. 'also()' makes it very easy to write chainable functions.
  • generic sequence handling functions: 'map()' does the obvious job of applying a function to every element of a sequence or collection. There are plenty more that simplify all kinds of common requirements, for example to trim null elements from a list after some processing.
  • string interpolation: the string "foo = $foo" replaces the last part with the value of foo, converted as appropriate to a string. That started with Perl and is now available in Python 3. Kotlin takes it further though, allowing complex expressions and figuring out the string syntax, e.g. "foo = ${x.getFoo("bah")+1}".
  • extension functions: it's easy to define new functions as if they were member functions of a class. They can only access public members of the class, but to the user they work exactly as if they were part of the base class definition. For example I wrote a function String.makePlural() which figures out the plural of a noun. This has always struck me as an obvious improvement but it has never even been considered for C++ (nor Python as far as I know).
There's nothing really bad about the language. The "generic" support is fairly feeble compared to C++ templates. In C++ a template parameter type behaves exactly as any other type, for example you can instantiate it. And no validity checking is applied until you instantiate the template, which is very flexible.

Kotlin's generics are built on the Java equivalent. You have to specify exactly what the template can and can't do in the function definition, meaning you can't for example write a numeric function using the normal arithmetic operators. (It doesn't help that there is no common supertype for integers and floats that supports arithmetic). Once you accept this, it's not too hard to work around it. But it's a shame.

Libraries


The great thing about Python is that you can find libraries to do just about anything. The same is supposedly true of Java, too. Since Kotlin can easily call Java, that should mean you can find libraries to do just about anything in Kotlin too. But you can't.

I first ran into this when trying to use a Rest API from Kotlin. Python has an excellent library for this, called Requests. After a bit of googling, I found that someone had ported it to Kotlin, calling it khttp. Then I spent a couple of hours trying to figure out how to get Kotlin to actually use it. That ought not to be hard, but you have to tell the build system where to find a library, i.e. a URL. And none of the documentation, for any library, tells you this. Or sometimes it does, but it's wrong.

I did finally get khttp working, and it was good. But when I returned to my project a few months later it had simply disappeared. It was a single-person project, and the maintainer had got bored with it and moved on to something else. There were bits and pieces about it on the web, and maybe you could get it from here and patch it from there, but it didn't look like a good path.

So I googled around some more, and found another library. It's called Fuel, and it does allow you to make Rest requests. But it is obscure and barely documented. For example, when Rest returns an error, there is no straightforward way to access the details. You can do it in a very clumsy way. But even then, it uses the type system in such an obscure way that there is no way to write a common function that will work across multiple request types (Get, Put, Post and so on). You have to repeat the same ten lines of ugly impenetrable code.

One of the much-vaunted features of Kotlin is coroutine support, allowing you to run lightweight threads that maintain their own stack and state. This looked useful to handle parallel Rest requests, which I needed. Even though it is documented as part of the language, it isn't really. It's part of a library, and has to be explicitly imported. But from where? Everything you can find on the web says you need to "import kotlinx.coroutines". But that doesn't work. Eventually I did figure it out, but I never did convince the IDE. That showed an error right up until I decided I didn't need coroutines anyway.

Another example: a CLI needs the equivalent of GNU readline, so commands can be recalled and edited. The good news is that someone has ported the functionality to Java, in a library called JLine. In fact, they've done it three times - there is JLine, JLine2, and JLine3. They're all different in undocumented ways. But anyway there's hardly any documentation. To find out how to show history (equivalent of the shell 'history' command) I ended up reading the code.

The experience with other libraries has been the same:
  • the documentation is between non-existent and very poor
  • figuring out where to get the library from is near impossible
  • even though there is probably a library for what you want, finding it is a challenge

The Build System


When you start writing Kotlin, you work entirely in the IDE and you don't even have to think about the build system. When you want to run or debug your program, you click on the menu, it takes a second or two to build, and it runs. Life is good.

But if you're writing a program it's probably because you want it to do something. And for that you most likely need to be able to run it without the IDE - from the command line, file explorer or similar. In the Java world, that means creating a .jar file. And that is where the fun begins.

You might reasonably suppose that the IDE would have a button somewhere, "turn this into a Jar file". But it doesn't, nothing like it. So you google it, thinking the menu item you need must just be buried somewhere. But no. What you find is incredibly complicated suggestions about editing files that don't even exist in your environment.

When you do finally manage to persuade something to create a Jar file, and you try to run it, you get a message about not having a manifest for something or other. If you're an experienced Java hand, this may mean something to you. All I know is that the IDE has agreed to build something, but missed out something vital.

Eventually, somehow, I managed to create a menu item that successfully built a runnable Jar file. Problem is, I have no idea how. When I created a trivial "hello world" program just for this exercise, I could never get it to work again. And then somewhere along the way I did something wrong, and the menu item disappeared, never to return.

Ironically, I did once find an article by someone from Jetbrains saying "of course when you write a program, you want to be able to run it without the IDE. Here's what you need to do." The instructions were simple, and they worked. The trouble is, no matter what search I do, I have never managed to find the article again.

Java programs were originally built using something called Ant. That was too complicated, so it was overlaid with another tool called Maven. Then that was too complicated too, so it was overlaid with something called Gradle. That came with its own language, but Kotlin invented a variant where the build requirements are described in a Kotlin mini-program.

So far so good, but all of these tools are mind-numbingly complicated and poorly documented for the casual user. Such documentation as you can find, assumes complete familiarity with the world of Java. Just because Gradle sits on top of Maven, doesn't mean you can ignore Maven. You still sometimes have to go and edit Maven files, which use XML. I've always viewed XML as an object language that only computers should deal with, just like Postscript or PDF. But the Java world is in love with it.

This all really starts to matter if you want to build your Kotlin program as part of some larger project - for example, our entire application. That is built with Make, and heaven only knows Make is a nightmare. But it's a familiar nightmare.

The IDE creates a "magic" file called gradlew. It isn't mentioned in any instructions, nor what to do with it. But a friend told me that './gradlew build' will build a stand-alone jar file from the command line - and sometimes it does. Luckily that worked for me "real" program, though when I tried it with a toy hello-world program it didn't.

Summary


Kotlin is great language, and a pleasure to use. Sadly, the nightmare build system, and the lack of any help from the IDE in dealing with it, means it is not really ready for prime time as part of a serious application or project.

Monday, 9 November 2020

My Little List of Useful Principles

This first appeared on my web page, but I thought it deserved to be repeated here on my blog. 

The David Stone Principle

“Never ask a question that has an answer you may not like.” Also expressed as “It is easier to obtain forgiveness than permission”. In other words, don’t ask if it is OK to do something, because the chances are there will be someone who will have some reason why you shouldn’t, and having asked the question (and got an answer you don’t like), you have placed yourself under an obligation to do something about the answer. Whereas if you just got on and did it, you could deal with any objections afterwards. This has two advantages. First, you’ve already done what you intended, and it is pretty unlikely that you will be made to undo it. Second, people are less likely to object after the fact anyway.

Harper’s Theory of Socks

Everybody who has ever packed a suitcase knows that no matter how full the suitcase, no matter how difficult it is to close, there is always some crevice where you can squeeze in one more pair of socks. Those familiar with the Principle of Mathematical Induction will immediately see that it follows that you can put an infinite number of pairs of socks in a single suitcase.

If this is obviously fallacious, it is less obvious why. But in any case it is a useful riposte to the executive or marketing person who wants to add just this one tiny extra piece of work to a project.

Law of Ambushes

I heard this one from Tony Lauck, but he claims to have got it from someone else. Think of an old-fashioned Western, with the good guys riding up towards the pass. They know the bad guys are up there somewhere, and they’re looking every step of the way, scanning the hilltops, watching for any movement, peering around twists and turns in the trail. Suddenly there’s a dramatic chord and the bad guys appear from nowhere, guns blazing. Of course the good guys triumph, except the one you already figured was only there to get shot, but the point is, ambushes happen and take you by surprise even though you expect them, even though you’re waiting for them every second. And they always come from where you weren’t expecting and weren’t watching.

The Lauck Principle of Protocol Design

This one is a little technical, but it is so fundamentally important to the small number of people who can benefit from it, that I include it anyway. Communication protocols (such as TCP) work by exchanging information that allows the two, or more, involved parties to influence each others’ operation. When designing a protocol, you have to decide what information to put in the messages. It is tempting to design messages of the form “Please do such and such” or “I just did so and so”. The problem here is that the interpretation of such messages generally ends up depending on the receiver having an internal model of its partner’s state. And it is very, very easy for this internal model to end up being subtly wrong or mis-synchronised (see the Law of Ambushes). The only way to build even moderately complex protocols that work is for the messages to contain only information about the internal state of the protocol machine. For example, not “please send me another message”, but “I have received all messages up to and including number 11, and I have space for one more message”. There are legitimate exceptions to this rule, for example where one protocol machine has to be kept very simple and the other is necessarily very complex, but they are rare and exceptional. As soon as both machines are even moderately complex, this principle must be followed slavishly.

The Lauck Principle of Building Things That Work

If you don’t understand what happens in every last corner case, every last combination of improbable states and improbable events, then it doesn’t work. Period. Yes, you may say, but it is too complex to understand all of these things right now. We will figure them out later as we build it. In this case, you are doomed. Not only does it not work, but it will never work.

The Jac Simensen Principle of Successful Management

Get the right people doing the things they’re good at, and then let them get on with it. It sounds simple, but it is rarely done thoroughly in practice. It's applicable to all levels of management but especially at more senior levels where there’s a lot of diversity in the tasks to be undertaken.

The Principle of Running Successful Meetings

Write the minutes beforehand. If you don’t know what outcome you’re trying to achieve, you stand little chance of getting there.

Harper’s Principle of Multiprocessor Systems

Building multiprocessor systems that scale while correctly synchronising the use of shared resources is very tricky, Whence the principle: with careful design and attention to detail, an n-processor system can be made to perform nearly as well as a single-processor system. (Not nearly n times better, nearly as good in total performance as you were getting from a single processor). You have to be very good – and have the right problem with the right decomposability – to do better than this.

Harper’s Principle of Scaling

As CPU performance increases by a factor of n, user-perceived software performance increases by about the square root of n. (The rest is used up by software bloat, fancier user interface and graphics, etc).

The Delmasso Exclamation Mark Principle

The higher you go in the structure of an organisation, the more exclamation marks are implicitly attached to everything you say or write. So when a junior person says something, people evaluate the statement on its merits. When the VP says it (even in organisations and cultures that aren’t great respecters of hierarchy and status, like software engineering), everyone takes it much more seriously. It means that as you move up the organisation, you have to be increasingly careful about what you say, and especially you have to be increasingly moderate (which doesn’t always come naturally!).

The Dog-House Principle

A dog-house is only big enough for one dog. So if you don’t want to be in the dog-house, make sure somebody else is. I first heard this applied to family situations (specifically to someone’s relationship with his mother-in-law) but it seems more generally applicable.

Mick's Principle of Centrally Managed Economies

There are three reasons why centrally managed economies don’t work. The first is obvious, the second less so, and the third not obvious at all. This principle was formulated by a friend of mine during the dying days of the Soviet Union. Its applicability to centrally-managed economy is obvious, but it should be borne in mind whenever an organization’s success model involves the slightest degree of central planning.

The first problem is that they assume a wise central authority that, given the correct facts, can figure out the right course of action for the next Five Year Plan. It is fairly obvious that such wisdom is unlikely to be found in practice.

The second problem is that even if such a collection of wisdom did exist, it would only succeed if given the correct input. In the case of the Soviet Union, this means the state of production in thousands of factories, mines and so on, as well as the needs in thousands of towns and villages. But all of this input will be distorted at every point.

The lowliest shopfloor supervisor will want to make things look better than they are, while the village mayor will make things look worse so as to get more for his village. And at every step up the chain of management, the information will be distorted to suit someone’s personal or organizational agenda. By the time the Central Planning Committee gets the information about what is supposedly going on, it has been distorted to the point where it is valueless.

The third problem is the least obvious. Suppose that by some miracle an infinitely wise central committee could be found, and that by another miracle it could obtain accurate information. Its carefully formulated Five Year Plan must now be translated into reality through the same organizational chain that amassed the information, down to the same shopfloor supervisor and collective farm manager. At every step the instructions are subject to creative interpretation and being just plain ignored. The Central Tractor Committee, knowing the impossibility of getting parts to make 20,000 tractors, adds an “in principle” to the plan. The farm manager, knowing that his people will never get enough food supplies to live well through the winter, grows an extra hundred tons of corn and stocks it. And so on.

Acknowledgements

Tony Lauck led the Distributed Systems Architecture group at DEC, and was my manager for several years. As a manager he was pretty challenging at times, but as a mentor he was extraordinary. He had (still has, I guess) the most incredible grasp of what you have to do to get complicated systems to work, or perhaps more accurately what you have to avoid doing. At first encounter, spending a whole day arguing over some fraction of the design of a protocol seemed like pedantry in the extreme. It was only later that you came to realise that this is the only way to build complex systems that work, and work under all conditions. With the dissolution of DEC, the “Lauck School of Protocol Design” has become distributed throughout the industry, to the great benefit of all. A whole book could be written about it, citing examples both positive and negative – were it not for the fact that Tony is still very much alive, BGP for example would have him spinning in his grave.

Jac Simensen was my boss (or thereabouts) at DEC for several years. It would be an exaggeration to say he taught me everything I know about management, but he was the first senior manager I saw in action from close-up, and one of the very best managers I’ve ever worked for. He certainly gave me an excellent grounding when I quite unexpectedly found myself managing a group of nearly 100 people, by a long way the biggest group I’d ever led at the time.

Friday, 11 September 2020

Bread



Today's Loaf

At the start of the shelter-in-place order for the Bay Area I decided to try my hand at making bread. Me, and tens of millions of others. I got started thanks to a friend who gave me a bag of Italian Doppio Zero flour, and thanks also to a small pack of yeast I happened to have. Both ingredients had completely disappeared from supermarket shelves. I found a recipe on the web - which turned out to be seriously flawed. Still, my first effort was pleasant to eat, and encouraged me to keep trying.

Six months have now passed. I've made bread twice every week since then, on Friday and Sunday mornings, which amounts to about 50 loaves. I think that now I've got the hang of it. There are really only two ingredients in bread, flour and water, plus of course yeast. Yet there are amazing variations in what you get with only small changes in the ingredients.

But I'm getting ahead of myself. My two-pound bag of Doppio Zero was quickly exhausted. We had some all-purpose flour, but bread should be made with proper bread flour, which has a higher protein content than normal flour. The protein is what turns into gluten, which is what gives bread its structure and texture. Normally you can buy it in the supermarket, but not in March 2020.

Looking online, I discovered a high-end flour producer (Azure) who claimed to have ten-pound bags of bread flour available. I ordered one, and hoped it would arrive quickly. But it didn't. When I chased them, they assured me it was on its way, but delayed due to the problems arising from the pandemic. That seemed fair enough, but it didn't help me.

I looked some more, and discovered that I could get a fifty-pound sack of flour from King Alfred, the top name in flour in the US. It seemed crazy to buy that much, but it didn't cost all that much and it would solve my problem. I placed the order, intending to cancel the Azur order when the new order shipped.

You can guess what happened. Literally within minutes of the King Alfred confirmation, Azure sent me a shipping notice. The two showed up within a day of each other.

The First Attempt - just wheat flour,
and horribly over-hydrated
I had a few packets of supermarket yeast, but given we couldn't know when bread ingredients would reappear on the shelves, I needed more. Through a similar sequence of events as the flour, I ended up with two packs of yeast as well, a total of three pounds - enough for about 160 loaves. On the bright side, it keeps for a long time. Incidentally the Fleischmann stuff doesn't make very good bread.

Tricks


A bunch of tricks I've learned along the way...

One thing you quickly discover with bread is the importance of the "hydration", which is to say the amount of water. Too little gives you a very dense bread, while too much delivers decent bread but the dough is a sticky mess that won't hold any kind of shape. I've found 71% works very well, for example 340 ml of water with 480g of flour. This may seem over-precise, but when on occasion I've got sloppy and used an extra 10ml (2%) of water, the dough is really different.

Early on I tried adding hazelnut flour to the normal wheat flour. I add 30g of it to 450g of bread flour. That gives a delicious nuttiness to the taste, and also contributes to the crispness of the crust. I tried walnut flour too. That gives a different taste and less of a crust, but it's interesting too.

At first I tried to knead the bread by hand. It's very satisfying, but it takes a long time and makes your wrists ache. Now I put the flour (and a tiny amount of salt) in the mixer, add the yeast starter, then slowly trickle in the remaining water while the mixer runs. I leave it for ten minutes, occasionally stopping the mixer to scrape the dough off the mixing hook. After that a quick, one minute hand knead finishes everything off and gets the dough to the right texture.

Personally I like bread to have a crisp, crunchy crust. It's tricky to get that to come out right. It all has to do with the way the starches react in the early stages of baking. Industrial bread ovens have a mechanism for injecting copious amounts of steam at the right time. The idea is that in the early stages, the surface is kept moist by steam condensing on the relatively cool dough. This promotes the right reactions in the starch, leading eventually to the Maillard reaction which turns starch and sugar into delicious light brown caramel.

Since I don't have an industrial oven, I have to improvise. I put a shallow pie dish of water in the oven when I turn it on. By the time it is at its operating temperature of 500°F (250°C), this is boiling nicely, creating a very humid atmosphere in the oven. Then, when I put the bread in, I empty half the water onto the floor of the oven. This fills it with steam (and generally makes a bit of a mess on the floor too). I leave the pan in the oven for the first five minutes of baking time. When I open the oven to remove it, a hot blast of scalding steam emerges - showing that it has done its job.

I cook hazelnut bread for a total of 29 minutes, 5 with water and the rest without. This results in a perfect, crunchy crust, just beginning to turn deep brown in the darkest places along the top, yet moistly soft inside. Walnut bread does better with a couple of minutes less. Really the goal is to take it out just before it burns.

It has been a challenge to get bread to be the right shape, which for me means roughly circular and 3-4" (80-100 cm) across. If you stretch the dough to the shape you want, it has an annoying tendency to have "memory" and go back to its original shape in its first couple of minutes in the oven. Finally what I have found works is to flatten the dough, as part of the final "knocking back" which removes over-large bubbles. I work on the flattened, pizza-like dough to get it the right length, then fold it over and roll it like a giant sausage roll to get the circular shape.

Even so it happens sometimes that a loaf "explodes" - it develops a big split along one side. This doesn't affect the flavour but it's not very pretty. Cutting slits across the top, half an inch or so apart and quite deep, helps a lot. The other important thing is to make sure the dough joins together properly. Generally I sprinkle flour around when working with dough. Thats coats the surface and makes it stick less, but it also stops it sticking to itself when you roll it up. A sprinkling of water (not much!) helps, and massaging the join together.

I generally split off some of the dough to make a couple of rolls. About 80g of dough gives a little roll, perfect for breakfast, with a disproportionate amount of deliciously crunchy crust.

At first I had problems with bread sticking to the baking tray. A piece of parchment paper covering the tray solves that problem. Surprisingly, considering that the ignition temperature of paper is famously "Fahrenheit 451", it chars a little at 500°F but doesn't burn.

Recipe


I use the following ingredients to make a "one pound" loaf:
  • 450g of King Arthur bread flour
  • 30g of ground hazelnut flour
  • a pinch of salt (about 3g - the amount is fairly critical and a matter of personal taste)
  • 8g of yeast
  • 5g of sugar
  • 340ml of water
The water and flour can be adjusted as long as they are in the same proportion.

I mix up the yeast and sugar along with 30ml of water and 20g of flour and leave them somewhere warm (around 40°C, 100°F) for 15-30 minutes. That gets the yeast going well. This is mixed in with the remaining solid ingredients and the remaining water prior to kneading.

Since getting up at 4am isn't really my thing, I make the dough the evening before. Once it is kneaded, I leave it to rise for a couple of hours, then put it in the fridge overnight. I generally get up briefly around 6am, and use that to get the dough back out and let it warm back up to room temperature by the time I do the final stages starting some time between 8 and 9. A couple of times I have started too late for that, and left it out overnight. It doesn't seem to make much difference to the final result.

Flour


I was very surprised, a couple of weeks ago, to realise that I was near the end of my fifty pound sack of King Arthur flour. When it ran out I switched to the ten pound bag of Azure. This turned out to give a completely different bread! The Azure flour is grey rather than white. The bread is denser, tastes different, and has a less crunchy crust. Obviously this is all a matter of personal taste, but both of us greatly prefer the King Arthur flour. Now that flour is easy to obtain again, I have bought another ten pounds of King Arthur. That seems to give even better results than the original sack, though I have no idea why.

Sourdough


My friendly English baking neighbour once gave me a sourdough starter. This is supposed to have all kinds of mystical, magical properties. You have to feed it - to the point that if you go away for a few days, you have to arrange with the cat sitter to feed the sourdough as well. There's something very primal about it all, which I think is its attraction.

It also totally failed to work. Luckily some conventional yeast added just before going to bed did work. I was feeling a bit badly about what I'd say to my neighbour. Then she reported the exact same experience.

So much for sourdough.

Sunday, 30 August 2020

Some Network History - Open Systems Interconnection (OSI)

The standards for Open Systems Interconnection (OSI) were a big part of my job from 1980 until 1991. This is a very personal view of what happened, and why it all went wrong.

Background

It's hard to remember now that computers were not always networked together. When you buy a $10 Raspberry Pi, or a $50K server, it's connected to the Internet as soon as you turn it on. Not only can you find cute kitten pictures, but it will load new software and all sorts of behind-the-scenes things you probably aren't even aware of.

It wasn't always so. In the 1970s, "computer" meant a giant mainframe, typically with a whole building or floor of one to itself. They cost a fortune, and they were self-contained - they didn't need to communicate with anything else. The nearest thing to networking was "Remote Job Entry" (RJE) - typically a card reader and a lineprinter, with a controller, connected over a high-speed data line. High speed as in 9600 bits/sec, or about a thousandth of typical WiFi bandwidth. It would take a long time to load even a single kitten picture at that speed. These were used in places that needed access to the computer, but couldn't justify the cost of one - branch offices, remote buildings on a campus and so on.

Each of the mainframe companies - IBM and the "BUNCH" (Burroughs, Univac and others) - did RJE their own way. There were no standards or industry agreements, even though they were all doing exactly the same thing. Communication was over a "leased circuit" - a dedicated, and horribly expensive, telephone line directly between the two places. There was nothing that could be called a "network".

The company I worked for, DEC, was the pioneer for smaller computers - minicomputers. These were inexpensive enough that you could have several, which typically needed to share data - for example to run the machines in a factory. For this it had defined its own network architecture, called DECnet, which was the first peer-to-peer commercial network ever. It allowed DEC's VAXes and PDP-11s to communicate with each other, to share files, access applications and various other things.

They also needed to access data held on the mainframe. For this, we wrote software that pretended to be an RJE terminal. To get data, we would send a pretend card deck that ran a job to print the file, then intercept the "lineprinter" output. A similar ruse would send data in the other direction. At one point I was responsible for all these strange "emulation" products. There was one for the IBM 2780 terminal, and one for each of the other mainframe manufacturers. They were a nightmare to maintain, because none of these RJE protocols was documented. They had been worked out by reverse engineering the messages over the data link. So we were constantly running into special cases that the original code didn't know about.

X.25 - The First "Open" Networking

The first inkling of something better came along in the mid-70s. The world's phone companies - at that time still nationalised "PTT"s - had got together through CCITT, their standards body, and come up with something called X.25. This allowed computers to connect just like on the telephone or telex networks. No prior arrangement was needed, you just sent a message which was the equivalent of dialing a phone call, and then you could send and receive data.

My first networking job at DEC, in 1979, was to implement X.25 for the PDP-11 and the VAX. Just a few countries had networks - the UK, France, Germany, and the US, which had two incompatible ones. Although there was a "standard", it had so many options and variations that every network was different and needed its own variant of the software. It was also expensive to use, with a charge for every single byte of data. Getting a connection was a challenge, since the whole concept was such a novelty for the behemoth monopoly PTT organisations.

Apart from the technical difficulties of X.25, there was a much more fundamental problem. As one industry wit put it at the time, "Now I've taught my computers to talk to each other, I find they have nothing to say." There was no standard way to, say exchange files, or log in to a remote computer. Manufacturers could write their own, but that defeated the object of the "open" network in the first place.

There were a couple of efforts to improve this situation. In the US the Arpanet had been funded by the government in 1969, to connect research and government laboratories. It was this that ultimately led to the Internet, but that was a long way off in 1980. There was a similar effort in the UK, led by the universities, to develop standard protocols for common tasks. Each one was published with a different colour cover, so they were called the "Colour Book Protocols".

OSI is Invented

Having a different standard in every country wasn't a great idea either. International standards for all kinds of things have been produced by the International Standards Organization (ISO) since its creation in 1947 - everything from railway equipment to film standards (the ISO film speed for example). Their work included computers. ISO 646, also known as ASCII, was the first standard for character codes. It was the obvious place to put together standards that would be accepted world wide.

The effort needed a name, and "Open Systems Interconnection" (OSI) was selected. 

By then, the concept of protocol "layers" was well established. X.25 had three layers: the physical layer that dealt with how bits were sent across the wire; layer 2 (data link) that got data reliably across a single connection; and layer 3 (network) that took it through the network via what are now called routers. The first task of the ISO effort was to come up with a formal model of protocol layering. This is probably the only piece of the effort that anyone has still heard of, the "seven layer model" published in 1979 as ISO 7498.

The first four layers of the model - as described above, plus the "transport" layer 4 - were already well accepted and not controversial, though the details of their implementation certainly were. The last three layers were however more or less invented out of nothing and weren't aligned at all with the way application protocols were built, then or now.

The "session layer" (layer 5) was conceptually imported from IBM's SNA architecture, though all the details were completely different. It was extremely complicated, reflecting things like the need to control half-duplex (one direction at a time) modems. There wasn't a single application protocol that used it to do anything except simple pass through.

The presentation layer's overall goals were never very clear. What it turned into was a universal data metadata and encoding, called ASN.1. It was useful, in that it allowed message formats and such to be expressed in terms of datatypes rather than byte layouts. But it was vastly overcomplicated for what it did.

The OSI Transport Protocol

My own involvement with OSI started in 1980. Definition of the OSI transport protocol was taking place in an obscure Geneva-based group called ECMA. DEC wanted to be involved, and sent me along. My first meeting was at the Hotel La PĂ©rouse in Nice. The work was already well advanced. To call it a dogs' breakfast would be a big disservice to both dogs and breakfasts. There were groups who thought the transport protocol should rely entirely on the network for reliability, and others who thought it should be able to recover from a limited class of errors. Other arcane distinctions, including the need for alignment with CCITT - the telco's standards club - meant had it had no less than four separate "classes", which in reality were distinct protocols having no more in common than a few parts of the encoding.

My task was to add a fifth. All of the work so far was intended to work in conjunction with X.25, which provided a "reliable" network service. If you sent a packet it would be delivered or, exceptionally, the network could tell you that it had been unable to deliver something. It would never (in theory anyway) just drop a packet without telling you, nor misorder them. DECnet, as well as the emerging Arpanet, made a different assumption. They kept the network layer as simple as possible, and relied on the transport layer to detect anything that went wrong, and fix it. That meant a more complex transport protocol. This incidentally is how the Internet works, with TCP as the transport protocol.

I spent the next 18 months designing the "Class 4 Transport Protocol" (the others were numbered from 0 to 3, don't ask), TP4 for short. It worked exactly the same as DECnet's equivalent protocol, NSP, and TCP, but the encoding had to be compatible, as far as possible, with the other classes. However the operation was completely different. Practically speaking, a complete implementation of the OSI transport protocol required five completely separate protocol implementations.

I got a lot of guidance and help within DEC, but at ECMA and later ISO I was on my own. Nobody else cared about TP4, nor understood it. That suited me perfectly. It was published in 1981 as ECMA-72.

Maybe because I was really the only one doing any technical work in the group, when the current chair was moved on to another project by his company, I was asked to take that on. It was quite an honour - I was only 28, in the world of standards which (as in politics) tends to be dominated by people towards the end of their careers. That also meant that I got to attend ISO meetings, representing ECMA, the beginning of a long involvement. 

ISO adopted the ECMA proposal for the transport protocol, all five incompatible classes of it, without any technical changes. It was later published as ISO 8073.

Around this time I took up DEC's offer to move to the US for a while, to lead a team building software to connect to IBM systems using their SNA architecture. At least, that was what I was told. In reality, they already had someone for the job, and I was just backup. That gave me plenty of time to work with the network architecture team there, the people responsible for the design of DECnet. The team was really smart and had a big influence on my career, at DEC and subsequently.

ISO meetings were held all around the world, hosted by the various national standards bodies (like BSI, ANSI and AFNOR) and their industry members like IBM and DEC. In those early days I went to meetings in Paris, London, California, Washington DC, Tokyo and others. 

The day before the California meeting, in Newport Beach, we had a very hush-hush meeting at DEC. It was the only time I was in the same room as the CEO and founder, Ken Olsen, along with our genius CTO, Gordon Bell, and our head of standards. The occasion was a meeting with the CEO of ICL, the British computer company which was still important then, and a high powered team on his side. ICL was convinced that IBM was trying to take over computer networking and impose SNA on the world. That would be a disaster for us, since SNA was very firmly oriented to the mainframe world and not designed for peer-to-peer computing at all. Ken was readily convinced that salvation lie in the creation of international standards that IBM would be obliged to follow, which is to say OSI.

This completely transformed my role in things. Until then, my standards work had been an interesting diversion, the kind of thing that large companies do pro bono for the good of the industry. I thoroughly enjoyed it but nobody at DEC really cared much. Suddenly, it was a key element of the company's strategy, with me and a handful of others at its heart.

In 1983 something extraordinary happened. We were invited by China to have our meeting there, the first international technical meeting that China ever hosted. That meeting, in Tianjin, deserves its own article.

The OSI Network Layer

Shortly after the Tianjin meeting there was a shake-up in the way the various working committees were structured, which left the chair of the network layer group (SC6/WG2) open. This was by far the most complex area of OSI. The meetings were routinely attended by nearly 100 people. It was also extremely controversial, and from DEC's point of view the most important area. I was astounded when I was asked if I'd be willing to chair it. I later learned some of the negotiations behind this from Gary Robinson, for many years DEC's head of standards and an extremely wily political operator. (He was responsible for the tricky compromises that allowed Ethernet and other LAN standards to go ahead despite enormous fundamental disagreement - Token Ring and Token Bus were still very much alive). In essence, the other possible candidates, all much more qualified and experienced than me, had too many enemies. I hadn't yet made any, so I became chair of what was officially ISO/IEC JTC1/SC6/WG2, the OSI network layer group, and went on to acquire plenty of my own enemies.

The problem with the network layer was a complete schism between the circuit view of things and the packet view. The telcos had built X.25, at great expense, and saw that as the model for the network. The user of the network established a "connection", and packets were delivered tidily and in order across the connection. The packet view, which included DEC, was that the network could only be trusted to deliver packets, and then not reliably, and should make no effort to do any more. It could safely be left to the transport layer to fix up the resulting errors.

In OSI-speak, these were respectively the "connection-oriented network service", or CONS, and the "connectionless network service", or CLNS. By the time I arrived there had already been years of debate and architectural hypothesis about how to somehow combine these two views. This had generated one of the most incomprehensible "standard" documents of all time, the "Internal Organisation of the Network Layer" (IONL, ISO 8648). The dust was just about beginning to settle on the only way forward, which was to allow the two to progress in parallel. There was no compromise possible.

The telcos hated this, because it pushed their precious X.25 networks down into a subsidiary role underneath a universal packet protocol, making all of their expensively engineered reliability features unnecessary. From our (DEC) view, this was far better than the complex engineering required to somehow stitch together an "internet" from a sequence of connections. Building a network router is hard enough. There's no need, or point, to make it even harder.

So by the time I was in charge of things, we had two parallel efforts. The CLNS side was led almost entirely by DEC, with excellent support from others in the US. As a result we were able to make rapid progress. We came up with a relatively simple protocol with no options, variants and all the other horrors than bedevilled OSI. It was standardized as ISO 8473, the Connectionless Network Protocol (CLNP). 

As chair, I had a duty to be non partisan. On the other hand, I had no duty to actively help the CONS camp. Between the complexity of X.25, the additional complexity of trying to use it as an internet protocol, and internal divisions within the camp, they had little chance of success. After years of work they never did come up with anything that could be built.

That said, this schism did enormous damage to OSI, and was a major factor in its ultimate demise. To us at DEC it was obvious that CONS was a doomed sideshow, but to an observer it just showed a complete inability to make decisions or come up with something that could be built.

DECnet-OSI

That really highlights the basic flaw of the OSI process. Creating complex technology in a committee just doesn't work. It's hard enough to get a network architecture right, without having to embody delicate political compromises in every aspect of the design. Successful standards like TCP, IP and HTTP/HTML were designed by a single person or a small group under strong leadership. Where possible, we did the same thing at DEC. For example the routing protocol for OSI, universally called "IS-IS", was developed by a small team at DEC, and it still works. With modifications to support IP as well as OSI, it is still used by many of world's large telcos. We managed to get that through the OSI process with hardly any changes.

At DEC we had whole-heartedly adopted OSI as the future of networking. DECnet, our very successful networking system, was rebranded DECnet-OSI and was to be completely restructured to use the OSI protocols. We even persuaded James Martin, a well-known author of IBM-oriented textbooks, to write a book about it. That probably deserves its own article too. As it turned out, DECnet-OSI never really happened. That was more to do with internal engineering execution problems than with OSI itself, since we carefully picked only the bits that could be made to work.

The OSI Transaction Processing Protocol (or not)

In 1987 I got involved in another part of OSI. IBM had never really tried to influence the OSI lower layers or to try to make them like SNA. But suddenly they came up with the idea of imposing it on the upper layers. SNA had a very complex upper layer structure, mostly oriented around traditional mainframe networking like remote job entry. But they had finally woken up to peer-to-peer networking and added something called LU6.2 to support it. Their idea was to make LU6.2 an integral part of OSI, so that all applications of OSI would in effect be SNA applications. It was a good idea from their point of view, and was very strongly supported by senior management there.

We knew this was coming because of the way ISO works. It started as a "club" of the national standards bodies, and to a large degree still is. This means that proposals can't be submitted directly to ISO, they have to pass through a national standards body - or at least, they did at the time, things have changed a bit since then.

The question was, what to do about it? IBM were heavily constrained by the existing standards and projects. If they had come along with this five years earlier, it would have been much harder to stop, but now they had to find an empty spot they could introduce it to. This they did, under the guise of "transaction processing". So at the 1987 meeting in Tokyo, there was a "New Work Item" for transaction processing, as another application layer standard. To this was attached all of the IBM contributions, which is to say LU6.2 warmed over.

I got a call about a month before the meeting from DEC's CTO, saying, "John, we need you to go and stop this." In the standards process it is almost impossible to stop anything. Once a piece of work is under way, it will continue. Actually terminating a project or committee is virtually impossible. Typically committees continue to meet for years after they no longer serve any useful purpose. So if you want to stop something, you have to either divert it into something harmless, or ensure that it makes no progress.

An experienced chair knows that there are some people who, while working with the very best of intentions, will just about guarantee that nothing ever emerges. It's just the way they're made. I have had the good fortune to know several. You may ask, why "good" fortune? The answer is that if you don't want something to work out, you arrange for them to be put in charge of it. I couldn't possibly say whether something like this may have influenced the failure of the CONS work to deliver.

For IBM's LU6.2 proposal, though, this would not work. They had put some technically strong people from their network engineering centre in La Gaude, France in charge of it. In truth I had little idea what I would do until I got to the meeting. It turned out that there were three camps:

  • IBM and others who liked the idea of LU6.2 being part of OSI
  • Those who thought that making it part of the standard would act against IBM's interests, by making it easier to compete with them. While these people were "enemies of IBM" and in some sense on the same side as me, as far as this meeting was concerned, they were my opponents. For example, France's Bull was in this camp.
  • Those who didn't want it. This turned out to be just me, and ICL.
So I was hardly in a position of strength. In addition, I hadn't been able to make any official contribution to the meeting ahead of time. On the other hand, the people IBM had sent knew little about OSI and the way the upper layers had evolved. They seemed to believe they could do as they had, for example, with Token Ring (and as DEC and Xerox had with Ethernet as well) - just show up with a spec and get it approved as a standard. But things had already gone way too far for that. There were already too many bits and pieces of protocols and services defined.

This was their Achilles' Heel. In the end it was remarkably easy to divert the activity to a study of the requirements for transaction processing (and it turned out there weren't any), and how they could best be met with existing OSI work. Only then would extensions be studied. This was instant death to the idea of just sticking an OSI rubber stamp on LU6.2.

That all makes it sound very easy, though. I was on my own against a large group of people who all wanted me to fail. It was one of the toughest things I'ver ever done. Luckily there were a lot of DEC people and other friends at other parts of the meeting, so the evenings and weekend were very enjoyable as usual. 

There was one person at the meeting who genuinely frightened me. He was incredibly rude and aggressive during the formal meeting, to the point where it became very personal. It was a ten minute walk from the meeting place, just opposite the Tokyo Tower, to our usual hotel, the Shiba Park. I spent those ten minutes looking over my shoulder to be sure he wasn't following me.

That had an interesting consequence. The head of the US delegation was from IBM, and very much of the old school. He was close to retirement and, like most standards people of that era, very much a gentleman. A few weeks later, I was invited, along with DEC's head of standards, to a meeting at IBM's office in New York City. There the IBM guy apologised profusely, and very professionally, on behalf of both IBM and the United States - even though the person in question didn't work for IBM.

I don't exactly remember what happened after that meeting, but I think IBM just quietly dropped the idea and it faded away.

OSI Management

DECnet had powerful remote management capabilities, essential in a networked environment. We knew that if OSI was to be useful, it had to have the same. There was a management activity but for years it had been very academic and gone nowhere. There were some smart people in the UK who wanted management to work too, and between us we came up with everything required: a protocol, and a formal way to specify the metadata. In the end it never got implemented, because OSI was already struggling by the time it was ready. But it was a nice piece of work. It also got me to several interesting places I otherwise would have no reason to go to.

Why Did OSI Fail?


My final OSI meeting was in 1991, in San Diego. By then I had moved to a new job in the company and was no longer involved with the DECnet architecture. In any case the writing was on the wall: the OSI concept would happen, but it would happen through the Internet protocol suite under development in the IETF. DEC officially made the change shortly afterwards.

Why was OSI such a total failure? It was the work of hundreds of network experts, many of whom really were the top people in their fields. Yet hardly a single trace of it remains. On the other hand the concept of universal computer interconnection has been a huge success, way beyond the dreams of the OSI founders. All they hoped for was the possibility of open communication, they didn't expect it to be a constant feature of the way we use computers. The only thing is, this is all done using the protocols developed by the IETF and loosely called TCP/IP.

OSI was way too complex, with too many options and choices. It was a nightmare to implement, made worse because this was before open source caught on. Some companies tried to make a living selling complete OSI protocol stacks, but that was never really a success. At DEC we had a full OSI implementation several years before DECnet-OSI, but hardly anyone bought it - only a few academic and research users.

I think the main reason was that there was no compelling use case. That seems hard to believe now, but in 1990 it was a chicken and egg situation - until the connectivity was available, there was no use for it. My old boss at DEC said the main reason TCP/IP took over was that Sun was shipping it as part of their BSD-based software, and it was just there, free and available. Because of that, people started to find uses for it. That also happened to coincide with the invention of the World Wide Web in 1990. It was only a minuscule shadow of what it has become, but was a reason to be connected.

By 1995 it was obvious that the future of networking lay with the IETF and TCP/IP. In Europe there were still efforts to keep OSI alive, but without manufacturer support they went nowhere. Around 1997 I was paid to write a study of why the IETF had been so much more successful than ISO. The simple answer is that while IETF is a committee, or actually a collection of numerous committees, each individual standard is produced by at most two or three people. It is then discussed and may get modified, but it is not "design by committee". That is less true now than it was in 1995 - all organisations tend to become sclerotic with age. But back then its motto was "rough consensus and working code". It got stuff done.

Conclusion


From a personal point of view, OSI was one of the most interesting things I've ever done. It taught me a great deal about how to lead in situations where you have absolutely no official authority. It took me on many, many journeys to fascinating places around the world. It also provided my introduction to the woman who would later be my life partner, though that isn't part of this story.

It can be endlessly debated whether OSI was a complete waste of time and effort, or whether it postponed open networking long enough for IBM's SNA to lose its predominant role, making room for TCP/IP. We will never know.

Thursday, 13 August 2020

The Doing Nothing Contract, or How Not to Run Large Projects

 Soon after I left DEC, in 1995, I got involved in what would have been the biggest Systems Integration (SI) project they had ever done. This is the story of the project.

I joined DEC when I left university, 20 years earlier. It was a fantastic place for an engineer to work, and I enjoyed nearly every day I worked there. But in 1995 it was obviously going downhill - it was acquired first by Compaq in 1999, and then by HP - and I found a way to make a decent exit. While I was looking for another job, I started a consultancy business which turned out to keep me busy for the next four years.

About a year later I ran into a former colleague on a plane. He told me about a project that they were working on for a major European telco. It was going to be huge. Did I know anyone who might be able to help? Well, yes, there was me. I think he knew that and was just being polite. I did point out that my daily rate was over double what DEC would normally pay. This wasn't a problem, he said, because they wanted to assemble an elite team of top-level architects to get the overall design right. Within a week I had a purchase order for three months of my time, 40 hours per week, at my usual high daily rate.

The following Monday I showed up at the DEC office in Reading, England. There were two other people in the "elite" team. One was Dave, who I knew quite well - like me, he had already spent 20 years as an employee.

The project was very interesting, on an oft-repeated theme. Telephone networks have always been built using proprietary systems built by specialized suppliers like (then) GEC and Alcatel, costing at least ten times more than normal contemporary computers. The client had figured out that this was just a big distributed computing application, and wanted to run their national telephone and data network using off the shelf computer hardware and, as far as possible, software.

It seems like a wonderful idea, but it has been tried several times and so far has never really worked (which I guess is a spoiler for this article). The problem is that, even now and certainly 25 years ago, these telcos were used to being almost their suppliers' only customer. They could make incoherent or outrageous demands, confident that their suppliers would have to follow. The price tag reflected this, but the people making the demands - the engineers - weren't the people paying the bills, so those dots never got joined up inside these huge bureaucracies. A typical interaction would go:

Supplier:   the project will use database X and transaction software Y
Telco:       that's no good, we need features P, Q and R that X and Y don't have
Supplier:  we could add those, but it's bespoke engineering and will add (lots) to the cost
Telco:       no, we want off-the-shelf software, we don't want to pay for custom development
Supplier:  X and Y is what's on the shelf, you want something else, you have to pay for it
Telco:       (utter incomprehension)

In its latest iteration, this has led to the ETSI NFV (Network Function Virtualization) project, which in its 8 years of existence, so far, has yet to deliver an actual functioning network.

Anyway... our mission was to use off the shelf DEC computers and software to build the switching control system at the heart of the network. It isn't really a hard problem. The basis of it is extremely simple: take a number that someone has dialled (this was a while ago) and translate it into a series of simple instructions to the physical switches, like "connect channel 92 of trunk 147 to channel 128 of trunk 256".

The only thing that makes it hard is the scale - this has to work for millions of users and concurrent calls. But even then, none of these actions have to be closely synchronised. It isn't like, say, Facebook. where something you upload needs to become instantly visible to a billion users around the world.

Within a week, Dave and I had figured out how to put the available software components together and have a working, scalable prototype within a couple of months. Turning that into a production system would be a much bigger job, needing integration to the telco's dozens of management systems, but that was all low risk, routine stuff. We started writing code.

What we had completely failed to take into account, working in our cosy little office, was the DEC bureaucracy. In a much larger open-plan office nearby was an already-large team of project managers, program managers, project documentation specialists, and for all we knew telephone sanitizers as well. They operated in blissful ignorance of any actual technical details, as they came up with the cost estimates that would be at the heart of the formal bid for the project.

DEC has always been thought of as a computer manufacturer, but they had a large and thriving SI business as well. Over the years they had built up a series of procedures and processes for managing these projects. They were mostly pretty small - integrate a driver for a new piece of hardware into an operating system, or build a user interface around a database application. But some were big - tens of person-years - and a few were really big, like our telco project.

So part of the process was to know when the project was too big for the current level of project management. When that happened, the project would get escalated to the next tier of project management. They would bring in their own team to look at the design, the business aspects, the risk, and everything else.

The first thing the new team would do is to multiply all the existing work estimates by two or more, just as a matter of principle because that is nearly always right. (A very successful SI company CEO who I knew years ago always multiplied all engineering estimates by pi. He claimed it worked every time). Then they would add a whole new layer of program managers, project documentation specialists, telephone sanitizers and all the rest. Then they would start looking at the details, invariably resulting in another factor of two or so.

Our project had already been through two such escalations. Realistically this was probably a 50 person-year project, but the estimates were already in the hundreds, maybe 20 times the original figure. As a result it triggered yet another escalation, to the ultimate level, the corporate Large Projects Office (LPO) in Geneva. 

The LPO was to SI what J K Rowling's Dementors were to Hogwarts. Their job was to suck all joy, and possibility of success, out of a project. I have no idea whether they actually delivered any Large Projects, but I doubt it. Within a week they had doubled all the existing estimates and added yet another layer of program management and the rest. The project had now reached a size - approaching 1000 person-years, 10 times any realistic estimate - that just flat-out terrified the country management. A project on this scale, if it went wrong - which was just about guaranteed - could take the whole company down, and certainly result in some very senior people needing to seek new career opportunities.

The whole team was called together in a large meeting room. DEC had decided to no-bid the project. Permanent employees would be reassigned as soon as possible, while all contractors were terminated immediately.

This is where things got surreal for Dave and myself. We went to some project management type and pointed out that there was no provision for termination in the purchase orders we had received. DEC had bought 90 days of my time, and 180 days of Dave's, just as if they had bought a thousand cases of beer.

"That," said the project management type, "is covered by our standard terms and conditions. It is implicit in the purchase order."

"Maybe," we replied, "but what isn't in the contract, isn't in the contract. The only contract we have is the PO. And the PO has no mention of any standard terms and conditions."

They quickly accepted that we had a point. "OK, but in that case you will have to accept to work on any other project for the duration of the contract."

That was fine by us. A week or so later, such a project cropped up. We started reading documents and figuring out what was needed. But a day later we were called into our original project manager's office.

"The new project is refusing to pay your rates. They are much higher than the normal DEC contract rate, and they won't pay. They say we have to pay because we agreed to the abnormal rate. But we're not willing to subsidize other projects. So you must stop work on the project immediately. And you are not to work on anything else either. You are forbidden to work on any project except the one you were originally hired for." And that had been cancelled, there was absolutely nothing to be done for it. So we were forbidden to do any work at all.

It was Dave who came up with the name "The Doing Nothing Contract". I was commuting from Nice at the time, flying to England on Monday and back on Friday. DEC insisted on me being at the office to do nothing - I was not permitted to do nothing from home. That was the beginning of the weirdest few weeks of my professional life. I'd found a hotel just outside town, an idiosyncratic converted farmhouse with low rates and enormous rooms furnished entirely from estate sales of dubious quality.

Dave (who did live fairly locally) and I would roll up to the office around 10, read the news and chat for a while, then around 11.30 go off to a pub for lunch. We'd be back by 2.30, spend another hour or two nattering (no web to surf back then), and go home. I still had plenty of friends from when I lived in Reading, so I never spent an evening on my own. I put on about five pounds during the brief period this lasted.

After a couple of weeks, I got another call from the project manager.

"Look, this is silly." I agreed. "How about if we pay off half the remaining contract, and call it quits?"

That was fine by me, I already had other work lined up and this was just free money. I left that afternoon and didn't return. Dave made the same suggestion, but his contract had a lot longer to run, so they said no.

We stayed in touch. A few days later, they called him in and told him they would simply terminate the contract "in breach", which is to say they would just stop paying him. Dave had been at DEC for years and knew exactly how things worked on the inside.

"But if you do that, I'll sue."

"Sure, yes, off the record, that's what we'd advise."

"And DEC never contests things like that, so you'll just settle, and end up paying the full amount, plus costs."

"That's true. But that will come out of a different budget, not ours."

Even they could see the silliness of all this, though. Soon afterwards they settled with him on the same basis as myself. He walked away with tens of thousands in unexpected cash, and went back to his day job doing IT projects for insurance companies. And that was the end of the Doing Nothing Contract.

Friday, 31 July 2020

Building Cisco's Japan Development Center

Definitely the most interesting thing I did in my time at Cisco was to start an engineering team in Japan. How that came about is a story in itself.

My job at Cisco had nothing to do with Japan. I ran the router software group, IOS, which was a seriously full time job. It comprised about 500 people, mostly at the corporate HQ in San Jose, California, but plenty spread around the world including the UK, where I was initially hired, India, France, and several locations in the US. My move to the US coincided with a total, absolute hiring freeze after the crash of 2000. Senior management understood that we needed to grow the group, though, so every time some remote acquisition turned out to be surplus to requirements, we would acquire bits of the team. I had people in Colorado, North Carolina, up north in Sonoma County, and sundry individual contributors working from wherever they happened to have been hired.

Senior management was expected take on various odd tasks that had nothing to do with the day job. One such assignment that came my way was giving the opening keynote speech at the company's customer conference (Cisco Live) in Japan, in 2003.

Back in the 1980s, international standards work had taken me to Japan several times. During the 1990s my wife went there often for the same reason, and I would sometimes tag along. But this was my first visit since before I'd joined Cisco, in 1999. There were things I'd forgotten since my previous visit, like when the airport bus arrived at a hotel and the staff ran to meet it, then stood and bowed as it pulled away. You get used to this, but after a five year interval it surprised me again.

I was greeted very courteously by the head of marketing in Japan, who has since become a very good friend. It's assumed in Japan that foreigners will need their hands held at all times. It takes a lot to convince them that you can safely use the metro and railway system without getting lost. I think it would be a serious loss of face to mislay a visiting Vice President, so even if they believe you they are reluctant to let you try it. Consequently, I had been met at my hotel - the New Otani - and accompanied to the conference location.

I had a carefully prepared presentation - it had never occurred to me to ask for help or any kind of corporate guidance. But it was only in the hour before I gave it that I learned it was "the" keynote for the conference. I made a few hasty changes and it seemed to work OK. I spoke in English but there was simultaneous translation to Japanese. I asked the head translator, a very distinguished Japanese guy in his 50s, how fast I should speak. "About a quarter as fast as your CEO" was his answer. In fact it was very easy to pace myself. Every single person in the audience of hundreds was listening to the translation on headphones. There was enough sound leakage that I could tell when the translator had stopped, and start the next sentence.

That trip led to a couple more to meet Japanese customers, and that in turn led to a fairly surreal activity. We were trying to convince one of the big Japanese operators to switch to Cisco for their core network. Part of this was a technical collaboration around mobile networking for which I was the corporate sponsor. Every three months we would have a meeting, mostly in or around Tokyo though sometimes in the US, with half a dozen people from each side. Their technical team would present what they wanted to do, and how, and our technical team would respond. The two teams totally disagreed, but it didn't matter. At the end the customer's VP and I would both give little speeches saying how impressed we were with the spirit of cooperation and the progress that had been made. And then three months later we would have exactly the same meeting, with exactly the same presentations, and exactly the same speeches. We got some truly excellent Japanese meals out of it.

It was all worth it though. After several years of this, and long after I had left Cisco, we won the business, worth hundreds of millions of dollars.

You'll gather from this that I loved Japan - and still do - and was very happy to have good reasons to go there as often as possible. I got to know the country manager - now sadly no longer with us - quite well. Our only disagreement was over how I should address him. He wanted me to do it the American way, using his first name. In the Japanese culture first names are used only by childhood friends and immediate family, and even then not always. I just couldn't bring myself to do it, and always addressed him in the Japanese way as Kurosawa-san. If we had met in the US, it would have been different.

Over dinner on one trip he told me that it was his dream to have a corporate engineering activity in Japan. It was a constant ding against American suppliers that they did no R&D in the country, and he wanted to counter that. I thought it was a great idea, but when I presented it to my management back in California they practically laughed in my face. It would be expensive, we wouldn't be able to find or hire the right people, it made no sense, and so on. So that was that.

But Kurosawa-san was resourceful, and at some point, knowing that he had the practical support he needed from me, he managed to convince the CEO, John Chambers. That changed everything. Suddenly I was told to make it happen.

The biggest challenge was to find someone to run it. The key to any remote team like this is to find someone who understands the corporate thinking, and who also understands the local culture. Generally this is impossible, which is why so many remote teams fail miserably. I had a very lucky inspiration. One of the UK team which I'd inherited when I joined Cisco was bored and looking for something new. A Norwegian called Ole, he still had some Viking blood, one of life's adventurers looking for the next Big Thing. It helped a lot that he already knew some of the Cisco Japan team. He was signed up almost before I'd finished asking him.

The other big challenge was to assemble the nucleus of the team. But that turned out to be much easier than I'd expected. In 2004 Cisco's prestige was high and people were keen to be part of its product development team. There were people already working for Cisco Japan, who had taken jobs in support for want of anything better, who were happy to move into engineering. Through personal contacts we found engineers at Japanese companies who were happy to make the change. One of the new team was Japanese but working for Cisco in California. Quickly I had a nucleus who could be trusted to grow the team - although, in the end, it never did grow.

We had to put the team somewhere. Initially we borrowed some space from the country sales operation, in Shinjuku, but the hope was that eventually it would reach a hundred or more people. For that it made sense to think about a location outside Tokyo, which led to our "fact finding" trip to Kanazawa in Ishikawa prefecture, and our amazing lunch with the prefectural governor that I've written about before.

Everything came together in spring 2005. We had a team, we had an office, and we had someone to run it all. I went to Tokyo for three weeks to get it all started, and luckily my wife was able to come with me. Rather than stay in a hotel, we rented a very pleasant apartment in the Aoyama district of Tokyo. It's the closest I have ever come to living in Japan. We shopped for food in the local supermarkets, an interesting experience for my wife who neither speaks Japanese nor can read any of the characters. Most things can be identified from pictures on the labels, but she needed my help to distinguish salt from sugar and flour. We had a wonderful time there, one of the most memorable trips of my life.

For the next year or so, I visited Japan every three months. It led to some complicated itineraries, since I generally combined them with a visit to the team I still had in the UK. Cisco had rented an apartment in Tameike for Ole and his wife, absolutely vast by Japanese standards, a ten minute walk from the New Otani where I stayed on every trip. Each room was bigger than a typical Tokyo apartment. I spent many memorable evenings there, though the next mornings were sometimes a bit hazy. Apart from the team itself, he'd built a "support network" in Japan who helped him and all of us with every aspect of things.

Since I was in Japan so often, I got to know the country sales team well too. I visited several important Japanese customers as "the man from HQ". I would sit there in total incomprehension as the "fireside chat" meeting ran its course, but apparently just my presence made a big difference.

It was important for the team to know their colleagues in California, and we arranged for them all to visit at the same time. The trip happened to coincide with Halloween, and we arranged a fancy-dress party at home. One of the team's hobby was traditional Japanese kimono, and she had brought with her a complete outfit. She looked amazing, delicate and beautiful in the Japanese tradition, and definitely took first prize by popular acclaim.

I left Cisco about a year later, and Ole decided to return to Europe at the end of his two year contract. A local manager was hired. But by then Cisco had lurched into much more aggressive expense control, and the planned expansion never happened. The country manager retired, a victim of corporate politics - the destiny of all who reach the senior ranks of Cisco. With no sponsors left, the group lingered on for a surprisingly long time, but in the end a bean-counter somewhere spotted it and its destiny was sealed. Some of the engineers returned to non-engineering roles, some moved to the US, and some left the company.

The country manager, who I got to know well, once said to me "When you make a friend in Japan, you make a friend for life." And it's true. Even now, fifteen years later, I still have good friends there who I see whenever I get a chance to visit.

Saturday, 4 July 2020

The Garden Railway: Trouble with Turnouts

The design and construction quality of LGB equipment is astoundingly good. You can leave the trains outside in all weathers with no damage or deterioration, whether from rain, snow, intense heat, or UV radiation. The locomotives will pull heavy, friction infested trains all day long without complaint. If anything does break, even the tiniest moulding is available as a spare part, albeit at a high cost. The track too is tough as anything - you can step on it without damage, and electrically it works far better than you could reasonably expect.

But nothing is perfect. The one place where their attention to quality seems to have lapsed is the pointwork, those necessary but fiddly places where trains get a choice of direction. They're eye-wateringly expensive - roughly $100 for a new, electrically operated turnout. But with LGB you just have to get used to that. The problem is, they just aren't that well designed. There's a sort of pervasive optimism, a feeling of "it'll be alright on the night", that applies to every aspect of the design: electrical, mechanical and trackholding.

My garden railway currently has a total of 17 LGB turnouts, all electrically operated via my NCE DCC controller. All the ones on the main running lines are 16xxx medium radius. There is a yard with the small, 600mm radius 12xxx turnouts, mostly bought new 20 years ago. The others are a mix of new, at various times over the last 10 years, and some eBay bargains, of which the oldest was probably 40 years old.

Keeping them all in good working order, so the trains run over them smoothly without derailing, jerking, or just coming to a halt, requires constant attention.

Electrical Problems


All of the electrical side of LGB stock suffers from a degree of design optimism. There are simple rubbing contacts everywhere, for example between the pickups, motors and other electrical connections. The wires are made of brass, which slowly forms an insulating oxide layer on the surface, so intermittent electrical problems slowly arise as the trains gets older, especially when they live outdoors.

The slider pickups on the locomotives are a case in point. The idea is excellent, but the connection from the slider to the rest of the electrics depends on a fragile spring, wound with wire barely thicker than a human hair. If ever there is a short circuit in the engine, the spring heats up to the point where it loses its temper - which is to say it stops being a spring, so the pickup stops working. It's possible, but very fiddly, to replace the springs, and to make it more complicated a different part is needed depending on the particular locomotive.

This pervasive electrical optimism really strikes hard on the points. The outer running rails are solid brass, connected to the adjacent track by heavy, springy fishplates. No problem there. But the connection to the switch rails - the ones that move - is made by very primitive sliding contacts under the rail. This works way better than it deserves to when the track is new, but as it ages the contacts and the rail itself oxidize, and the force holding it all together weakens. The net result is that trains hesitate or flat-out stop as they are going over the points.

It doesn't help that there is a lot of dead track. The place where the two rails cross - the "crossing" or "frog" depending on your train-speak dialect - would ideally be connected to one rail or the other depending on the point setting. This is difficult to arrange, and LGB didn't try. These sections are made of insulating plastic, meaning that one wheel, at least, stands no chance of picking up power. Four wheel locomotives, like "Shiny", our Wismar railbus, are especially vulnerable - the more wheels the better.

The diverging rails are connected invisibly, under the sleepers, by metal strips that are spot-welded to the running rails. They also are a bit optimistic. On several of my older points these welds have failed, leaving a lengthy piece of rail with no connection.

Underside of turnout showing
soldered connecting wires
The solution to all these problems is to make soldered connections between the various pieces of rail. This is a bit daunting since large-section brass rails conduct heat away from the joint area very effectively, and there is an obvious risk of melting the plastic base. To deal with the second problem first - the plastic used for the bases is pretty resilient to soldering. It softens, but doesn't melt, when you heat the rails up. It will melt and burn instantly if you touch it with the hot iron, though.

I've been pretty successful soldering fine wires to the underside of the rails. My technique is:
  • start by cleaning the metal very thoroughly, with a fibreglass "scratch brush", until it is gleaming
  • then cover the joint area in non-corrosive resin flux
  • I use a 50W temperature controlled iron, set to its highest temperature of 425°C, with a substantial chisel-shaped bit about 7mm across - providing plenty of reserve heat
  • hold the iron flat against the rail, holding it as far as possible from the plastic, and hold the solder against the iron - when melted it acts as a heat transfer fluid
  • now hold the iron in place until the rail is hot enough to form a proper joint with the solder. It's easy to see this because the blob of liquid solder suddenly spreads out on the metal
  • now add the wire, then hold it in place with a screwdriver or similar until the solder solidifies again. This will take a while - up to 30 seconds - because of the heat retained by the rail
  • Don't touch anything! - the rail stays painfully burning hot for a long time afterwards.

Mechanical Problems


In real life, track is held on the sleepers by some kind of spike driven into the wood, which either directly holds the rail, in US practice, or holds a metal plate which in turn presses on the base of the rail, in Europe. (It's different for serious railways, with high speeds and heavy trains, but light and narrow gauge railways work like this). The LGB track provides a good visual impression of this, but it really isn't very strong. The rails are held in place by tiny flaps of soft plastic, less than a millimetre thick. It takes very little to twist and break them.

On normal track this isn't really a problem. The sleepers all support each other, so they aren't subject to high stresses. And even if one does break, the rail is still supported by those around it. Points are a different story though. For example, the very first sleeper, closest to the moving switch rails, is the only one supporting the point motor and the first few inches of rail. It can easily get broken, and when it does, the vertical relation between the fixed rail and the moving one is lost. Trains fall off the track as a result.

It's impossible to repair the track base. What I have found effective is to glue the rail in place on the damaged sleepers, using the remains of the simulated spike. The plastic is something soft and difficult to stick to, but I have found a two-part epoxy that works well, Loctite EA9340. I originally bought it to make some repairs in the kitchen, where prolonged exposure to steam softened regular hardware-store epoxy, but it seems perfect for this too. Another advantage is that it dries to a murky dark green, making it pretty much invisible on the track.

The technique is simple. First get everything as clean as possible. Clean the rail with a fibreglass brush, and swab everything with alcohol. Then mix up some epoxy and make it into a blob around the base of the rail, so it looks like part of the sleeper. If several sleepers are damaged on the same point, do it for all of them.

Sometimes you can't blame LGB. One of my points was hit by a heavy steel ball, from playing French bowls (petanque) in the garden. The rail was badly twisted both horizontally and vertically, and many rail fastenings were broken. After I dismantled it and straightened the rail out, the epoxy worked perfectly to hold the rails in place. The repaired point is back on the layout, and trains pass it without problems.

Trackholding Problems


In Victorian times facing points - ones where the train has a choice of which way it goes - were regarded with horror. Railway designers went to great lengths to avoid them on main lines wherever possible. Where they were unavoidable, they always had facing point locks, which held the switch rails firmly in place while a train passed over them. They were interlocked with the signals, so it was impossible to clear a train to pass over the points unless the locks were in place.

Sadly our LGB points don't have these devices. They are held in place rather feebly by the magnets in the point motors. It's quite common to have a tiny gap between the fixed and moving rails - a fraction of a millimetre, but enough to cause problems. If a flange rides over the sharp end of the rail it can move the rail under it, opening the point and dropping into the gap on the wrong side. The rest of the train inevitably derails when this happens.

I haven't found a really good solution to this. Some point motors work better than others. I had one point that would consistently cause derailments. It was an old one, from eBay, with an older design of point motor. Replacing the latter with a newer motor held the rail in place much more firmly, and solved the problem.

The ideal, in the absence of an actual lock, would be a really firm over-centre spring mechanism, but I can't see an easy way to do this. In any case the force produced by the point motor probably wouldn't be enough to overcome it.

LGB four-wheel carriages and trucks have pivoting axles, to simplify going round the tight 600mm radius curves. Normally these are held at the correct angle by the traction on the coupling, but that doesn't work if the train is being pushed. And sometimes they get stiff. So they will occasionally end up trying to go through a point when the wheels aren't aligned correctly with the track. This makes the above problem a lot worse. It causes another problem, too.

In real life, points have check rails, or guard rails, which ensure the wheels go the right way through the "crossing" or "frog", where the two rails cross. The check rail presses against the back of the wheel and stops it slipping into the wrong, diverging flangeway.

Unfortunately the check rails on LGB points are mostly decorative. They are way too far from the rails to be really effective. Mostly this doesn't seem to matter, but on the three-way point they are not only too far away, but not in the places they need to be. There are so many problems with this item that it deserves an article to itself.