What the COVID-19 modeling shows and associated challenges to data protection

I have just spent time reading a bit about the modeling results that is informing a lot of policy in European countries in relation to COVID-19.

I will comment on three papers:

The Imperial paper has been widely discussed and is modeling at population level (and in particular the UK) non-pharmaceutical interventions. Here are a few of the non-pharmaceutical interventions they consider, by now familiar to European countries:

They then show the modeling results for applying these two types of interventions around two strategies: mitigation and suppression.

Mitigation “[aims at] flattening the curve, reducing peak incidence and overall deaths” (their words), while accepting that the whole population needs to run through the disease (or rather a significant proportion of it, 1-1/R_0, with R_0 the reproduction number thought to be about 2.5, so around 60% of the population). This is a strategy aiming to achieve “herd immunity”. They also say “When examining mitigation strategies, we assume policies are in force for 3 months, other than social distancing of those over the age of 70 which is assumed to remain in place for one month longer.”

This leads to the following (modeling!) graph:

In other words, some measures are applied during the “light-blue” months, completely overwhelming health system capacity (red line), while in the month of July elderly stay confined and face the tail end of the curve.

Suppression aims more at rooting out all the cases, in a way reflecting the East Asia approach. The goal is not to get herd immunity, but to eventually root out completely the disease. It is well-known in epidemiology that this requires getting reproduction number below 1. People will still die, but less and less people over time, and eventually zero. This requires more drastic measures. Some of those measures do not seek to get 60% of the entire population through the disease. Some of those measures are evaluated in this paper from Imperial. In their first version of suppression, they assume the measures are in place for 5 months (Figure 3 in their paper, not reproduced here). In a more complex version they fine tune the measures and apply them adaptively. The main value of this graph is in showing that the Imperial paper authors at least acknowledge some additional control to the course of the epidemic than just letting it go through the entire population.

The suppression strategy has been further evaluated by the Oxford group using a different intervention, digital contact tracing. First, some motivation. In the Oxford I paper, the authors tried to the best they could, based on real data, to estimate the part of R_0 (the reproduction ratio outlined above) over time due to, respectively, pre-symptomatic transmission, symptomatic transmission, environmental transmission, and asymptomatic transmission. They call this parameter beta(t). This gives the following graph.

The area under the curve is 2, the full reproduction number, of which:

  • 0.9 is due to pre-symptomatic transmissions (light blue);
  • 0.8 is due to symptomatic transmissions (dark blue);
  • 0.2 is due to environmental transmissions (light green);
  • 0.1 is due to asymptomatic transmissions (dark green).

Remember the goal is to get this transmission rate down below 1. Where are we going to make gains? Even if instantly quarantining people the moment they have symptoms, we will not make sufficient gains. We need to address pre-symptomatic transmissions as well, which is obviously problematic and difficult. There are a lot of people right now at home in Europe who are either pre-symptomatic, asymptomatic or healthy. You can try to perfectly quarantine people for 14 days at home (the supposed incubation period to cover 99% of transmission cases), but this will not root out me contaminating a child who then contaminates my wife up until the end of the quarantining period. This would also not address the problem of people living in communal situations (shelters, jails, etc), or the problem of not quarantining people who need to provide health care and food to the population during that quarantine.

The only way out, and this seems to be extremely robust to the numbers, is contact tracing. Note: I didn’t say “digital” contact tracing. Contact tracing is the idea that you can use contamination instances in the dark-blue region to prevent contamination events in the light-blue region. As soon as someone feels symptoms, all their past contacts get quarantined immediately - on top of the symptomatic person being quarantined as well. Clearly this is way more than sufficient. We would be down to a reproduction number of 0.3 only. It is also unrealistic. People will not follow recommendations, some people will self-diagnose late, some contacts will not be traced back. So in the Oxford I paper they also assess the effectiveness required for contact tracing to work. They arrive at the following graph:

This means for instance that:

  • Zone A: if 20% of symptomatic people immediately self-isolating, you need between 55 and 85% of their past contacts (over some number of past days between 5 and 14) to be immediately quarantined (variability due to the uncertainty around R_0).
  • Zone B: for 80% of symptomatic people immediately self-isolating, you need to trace and quarantine back between 15% and 60%.

Crucially, this would assume some percentage could just run around coughing everywhere, and this epidemic could still be curtailed.

So which zone should we aim for, and how could this be achieved?

Zone B invites individual manual and extensive contact tracing, which might no longer be realistic at the current state in Europe.

Zone A invites very few people doing extensive contact tracing. An app would be a solution for this.

The researches deduce that the epidemic could be curtailed with a digital app, based on consent. This is how it would work:

This brings enormous data protection challenges around how consent is obtained, depending on the deployment models for such an app. Even if based on consent, some protocols could also create challenges around stigmatisation of particular groups.

What this study does not explore:

  • impact of population structure on the spread of epidemic;
  • joint impact of population structure and the likelihood of installing an app on the spread of epidemic;
  • some effort at tracing contacts only in the household context;
  • reliance purely on social norms (incentivized by governments) to enforce a policy of going two steps in the contact chain;
  • complete and temporary rearchitectures of life: what if this science was explained, and there was a democratic debate on what is best to do in relation to this situation?

I will present briefly on this (today) Thursday March 19th 2020, and Friday March 20th a bit more extensively (time TBD, but around 6pm CET).


Do we know the role played by surface contamination?

From the modeling it is quite marginal: the light green. But of course it screws up everything in terms of contact tracing, makes it much harder.


But at at what stage in an epidemic is this strategy useful? If we currently have (haven’t checked the data):

  1. 2% of infected people
  2. half of them develop symptoms
  3. we are able to trace 100 % of their contacts
  4. they each had 100 contacts in the last 5 days before they became symptomatic

They were in contact with almost 100% of the population (“almost” because of the overlaps in contacts).

If this is correct, the huge effort to trace people would:

1/ be useless at this stage to identify the population that should be quarantined (it’s already everyone)
2/ potentially increase compliance of quarantine (at a huge cost)
3/ be useful for modelling the transmission (something interesting for future epidemics)
4/ be useful for understanding which mitigations strategies are most effective (with a long time lag).

Does this make sense?


Very good question.

The distribution is not quite uniform. There is a signal there away from the uniform, which will get stronger over time.

Finite allocation of resources to go from one probability distribution to another in an optimal way is called “optimal transport” in mathematics.

This video should help in understanding why that problem is relevant.

It’s my assessment that the epidemiological models being used are extremely crude. Once scientists (start to behave like scientists and) publish open models, we can complexify them and add in other factors in the modeling, such as individual circumstances rather than looking at population-level averages.

This would have to be done extremely carefully if it was to be prescriptive, of course.


(the optimal transport would mostly decide on: who to test, and what measures to apply at any time)

Informed criticism of the simplicity of the Imperial study


Hi @paulolivier - just a note to say you cite the figure giving a “breakdown” of R_0 as from Oxford 1 but pretty sure it’s from Oxford 2 (by your numbering).

1 Like

Thanks @adantro! Actually I presented them in the first place in the wrong order. Corrected now! And welcome!