Challenges of Modelling Coronavirus
Over the past couple of weeks, I have been trying my best to model the coronavirus epidemic. I’ve also been looking at a lot of the models that other people have developed. I thought I’d share a few insights.
The standard model people use for modelling epidemics is called SIR (susceptible, infected, recovered). These models parametrise the rate at which people move from susceptible to infected to recovered status, so they can estimate how the proportions of the population in each state will change over time.
In the case of coronavirus, we might have say that the percentage of new infectees each day is equal to 0.3 x the percentage of people infected x the percentage of people that are susceptible. And we might say that 1% of the people infected will die each day, and that 5% of the people infected will recover each day.
The main problem with this kind of model is that they tend to assume fixed parameters. If there are 100 people infected, the number that are likely to recover will depend on how long they’ve been infected for. Or if the rate of infection decreases as a result of lockdown, that will gradually affect the proportions in different states.
As a result of this concern, I decided to build my own model of the epidemic. My model makes the following assumptions:
- The percentage of new infectees each day is equal to a factor times the percentage of people infected. This factor reduces from 0.3, ultimately to 0.05 under full lockdown
- 96% of people that are infected never need critical care, and recover after 2 weeks.
- 4% of people need critical care 4 days after infection. If hospitals have capacity, 3% of these recover after a further 3 weeks, 1% of these die after 2 weeks. If hospitals are full, 1% of these recover after a further two weeks, 2% of these die immediately, and 1% die after 3 weeks.
This model clearly has a lot more parameters than the usual SIR models. But these degrees of freedom are essential if I want to capture our ability to reduce transmission, or the sensitivity to hospital capacity.
I next want to raise a few concerns I have with my model (and no doubt most other models):
- I don’t ever actually know how many people are actually infected. I only know how many are confirmed to have the virus. In the UK, it is generally accepted that most people with coronavirus haven’t even been tested. I therefore have to treat this as a hidden variable.
- A lot of infected people never show any symptoms. Some experts say such people don’t pass it on, others say they do. This obviously makes a big difference to how much spreading there is (though in practice, if half the infected population don’t spread it, that probably means the other half are spreading it twice as fast).
- On that point, we don’t really know how fast people are spreading it (how many people test positive is a poor estimate of this). I’m also assuming that an infected person spreads equally over the period that they are infected, but this may not be the case. Also, given that most of our estimates for spreading are implied from later fatality rates, we don’t know how much impact changes restrictions on movement will have.
- Because we don’t know how many people are infected, we don’t know what proportion need critical care.
- The government doesn’t publish statistics on how many people with the virus are in critical care. I’m not even sure how accurate their estimate would be, given many people’s confirmation of status doesn’t arrive until several days after they’ve arrived in critical care. But this also means that we don’t really know what proportion of those in critical care recover. I also don’t know how much critical care capacity there is.
- My model treats hospitals as having capacity or not. But in practice, some hospitals in the country will be full while others have capacity.
- I am treating all patients as the same. But in reality, people of different ages have quite different rates of needing critical care, and how quite different survival rates.
- I haven’t bothered including in my model the fact that as more patients recover, they in most cases become no longer susceptible. However, at this point I’m really modelling the first 2–3 months, when it is unlikely (with our suppression measures) that more than 15% will get the virus.
These challenges have a huge impact on the model’s accuracy, to the extent that I wouldn’t anyone to take its results too seriously. For example, by changing the current level of infection from 0.05 to 0.08, it changes the peak rate of deaths from 850 to 1500 per day. However, with my assumptions, I’m seeing the daily deaths rising for the next few days, then sitting between 650 and 850 from 1–8 April, before gradually reducing to just below 100 at the end of April.
My model suggests that 264k people have recovered from coronavirus, and 389k people are currently infected. I have no idea whether the true number of recovered/infected is 200k or 15m (though it is unlikely to be half the population, as one report this week suggested). We should start seeing antibody testing in the UK in the coming week, which I hope will give us a bit more indication of this, and which could help firm up some of this model’s assumptions.
If anyone wants to take a look at my spreadsheet model, I have saved it as ‘pandemic uk.xlsx’ in google drive https://drive.google.com/drive/folders/1w2mF25OAyLmUa5v-URxjzjZfv_hARdDa?usp=sharing.