Inspired by the observed differences between actual Covid-19 data and the predictions of the classic "SIR" model, I built a detailed pandemic model. It's a person-by-person simulation, for up to 50 million people, rather than a mathematical model. It uses the tricks I've accumulated writing high-performance network software to do this at reasonable speed. The results closely match actual data, and allow control over variables like vaccination, social distancing and self isolation. Skip to the last section to see the results. The source code is available on Github.
Soon after the Covid 19 pandemic started, Mark Handley at UCL in London started a website showing the development in the number of cases and deaths in various countries and places around the world, which I followed with great interest.
Out of curiosity, I put together a simple simulation based on the SIR model of infectious transmission. This is the basis of much of epidemiology, and yet it quickly struck me that the curves it generates didn't correspond at all with Mark's. Using a log/linear scale (i.e. the Y-axis is logarithmic), SIR gives a straight line up until very close to saturation, where nearly all of the population have been infected. This corresponds to the now much-discussed "R0" number, i.e. the number of next-generation victims who will be infected by a single sick person.
Yet Mark's graphs weren't like that. No matter which country or area, nor which policies were being followed there, they all showed a gradual reduction in the slope. This was true for Lombardy and for much less afflicted places. It was true for Spain, which quickly enforced a very strict lockdown, and for Belarus, which never edicted anything at all. Even knowing exactly when lockdowns were put in place, it was difficult or impossible to see an inflection in the curve.
It was easy enough to come up with a mathematical model which closely described these curves. It is sufficient for the "R0" number to decrease slowly over time, to get a close match. The resulting curves perfectly matched the actual data from Lombardy.
So, why is reality different from the this nearly one century old model? It doesn't take long to see an obvious weakness. The math behind the model is simple:
infected on day n+1 = constant * infected on day n * susceptible on day n
where the constant is closely related to our old friend R0. It's a simple differential equation, which indeed predicts the exponential growth everyone talks about.
But wait a moment. This supposes that every infected person is equally likely to infect every susceptible person. Is this realistic? Suppose Bob, living in New York, get sick. Alice lives in Los Angeles with her vulnerable elderly mother. She gets sick too. Which of them is more likely to infect Alice's mother? Or Bob's drinking buddy? In fact, the SIR equation only makes sense when applied to relatively intimate groups of people - families or close friends, for example. For larger communities, it needs to take into account also the probability of contact between individuals.
Everyone is now familiar with the concept of "herd immunity". If enough people are immune, a single infected person can no longer infect enough for the outbreak to grow. R (not R0 any more) has fallen below 1. It's widely stated that herd immunity needs to be somewhere around 60% to be effective, depending on the value of R0.
But again, this assumes that everyone is equally likely to infect everyone else - that herd immunity will not protect New Yorkers from each other if it has not yet reached the necessary level in Idaho. This obviously makes no sense. That situation may be bad news when an infected New Yorker visits Idaho, but New Yorkers will be protected from each other.
People tend to operate in clusters: families, groups of friends, close colleagues. Within such a cluster, transmission is high: if one person in a nuclear family, or one person in a small office, gets sick, the others will all be heavily exposed and most likely will either get sick or develop an immune response. Hence the admonition, common even before Covid, for sick people to stay home from work.
Once all the people in such a cluster have been exposed, the cluster has a localised form of herd immunity, regardless of what is going on in the wider world. An infected stranger visiting an exposed family, for example, poses no risk.
Thinking along these lines led me to the idea of "fractal herd immunity" - that there can be herd immunity at a local level, or in a larger community, without it applying globally. There is leakage between clusters - the nuclear family goes to visit the parents and cousins, close colleagues are part of a larger company. Friendships especially are "leaky", it's common enough that my friends have friends who I don't know, or barely.
Building a Model
I wanted to develop a model to test this idea and see if the results look like reality. I didn't see any mathematical way to do this, so I built a simulation. Each person in the population is simulated, with day-by-day exposure to the infected people around them.
The model creates cities, and populates them according to a realistic distribution - a few big cities, and lots of small ones. Each person is allocated to each of four different clusters: family, friends, work and the local community. The latter corresponds to things like shopping. Clusters are randomly sized; for example, the family cluster can be anything from 1 (people living alone) to 8, with a bias towards smaller sizes.
Influence is a parameter for each cluster. In a family, people are in close proximity. This is taken as an influence of 1. The community cluster has a much smaller influence, but the clusters are a lot bigger. Cluster size is extremely important, because transmission increases as the square of cluster size: more people are exposed to more infected people. This is why large gatherings have been such an effective way to spread Covid, like the religious groups in north-east France and Korea.
Infection has to get between clusters. Partly this is done by grouping them into larger clusters, with reduced influence between the members. So a sick person in one family cluster has a small chance of infecting someone in an adjacent cluster, just as if they visited another part of the family.
Most cluster memberships and relationships are within the same city, but there are also ways for infection to travel between cities. This can be via cluster membership, for example when a family's relations are in another city, or when an office is part of a larger company based elsewhere. It can also be through explicit travel. Each person is randomly assigned a mobility, which is the probability that they will visit another city on any given day.
The model has a few higher-level parameters that control it, and a lot of detailed ones that are manipulated by the higher level ones. For example:
- population: the total number of people - the model can handle 50 million, and gives rapid results for 3 million
- infectiousness: the R0 value for the infection
- auto-immunity: the number of people who, when exposed, will develop immunity without becoming sick or infectious
- distancing: the extent to which people's behavior is modified by social distancing
- vaccination: the number of people who are immune at the start due to prior vaccination
The results can be presented either as a simple graph, showing the number infected and total infected as some parameter is changed, or as a rather fetching animation where each city is shown as a "bubble" gradually changing color as people are infected and either become immune or recover (or die). Here are some examples.
Somewhere around 0.8 or 0.9 is what was achieved in the UK or California, with less travel and contact, but still some, and some people trying to disregard the restrictions altogether. This level very substantially "flattens the peak", as well as reducing the total number infected. The chart also shows that anything less than 0.5 has no impact.
R0 - the Infection Ratio
Watching the Pandemic
The simulation can also show graphically how the pandemic spreads. The picture above is at day 100 with some typical parameters. Each blob is a city (some large cities are drawn on top of their smaller neighbors). The red ring corresponds to current infections, the blue outer circle to those who are still susceptible. The inner green circle is those who are no longer susceptible - recovered, asymptomatically immune, or vaccinated. The small black dot in the middle corresponds to deaths. Clicking here will show the complete evolution of the pandemic (select 1080p for best results).