Efficacy of contact tracing apps

This post has been partly rewritten in a Medium post (and expanded for the Bluetooth considerations).

The Oxford team has recently released a paper comparing the epidemiological advantages of decentralized and centralized systems. This is very welcome, as it will enable better evidence-based comparison of protocols.

That paper includes the following paragraph:

This says essentially:

  1. Whatever the system, you will need to make algorithmic risk assessments based on [Bluetooth-measured distance and] data about infector and/or infectee.
  2. Decentralized systems need to make these assessments based on [Bluetooth-measured distance and] data about the potential infectee.
  3. Centralized systems need to make these assessments based on [Bluetooth-measured distance and] data about the potential infector.
  4. The three best epidemiological predictors of actual transmission [conditioned on a given distance measurement through Bluetooth signal strength proxy] are:
  • time before of since onset of symptoms of the index case (infector side)
  • severity of symptoms of the index case (infector side)
  • age of the contact (infectee side)

It ends with “This gives a small accuracy advantage to the centralized system, since two of the three predictors depend on the transmitter, and the effect sizes are larger”. It doesn’t offer in this paper any data to back up this quantitative comparison, but this breakdown aligns with all the public health discussion.

Within the context of the original article (narrow comparison between the two protocols), this argument could arguably make sense, and even be convincing.

In this post I want to highlight what is unsaid in the article, and show that the resulting effects would trump absolutely every other consideration. It also suggests that focusing on protocols is a distraction. If you look back more broadly, you see that contact tracing apps are trying to infer transmission risk from Bluetooth signals. The Oxford paper highlights some sources of uncertainty, tied to the biological conditions of either side. But of course there is another source of uncertainty, in inferring distance from Bluetooth signals. This is presumably very important, since the main public intervention - after washing hands - implemented so far is social/physical distancing (with a sharp drop off in risk at 2m - presumably one can dig papers backing up this claim).

So, how good is BLE at inferring distance?

This is a screenshot taken from here, showing the impact of a human body intersecting a Bluetooth channel.

This is a video showing a 20dB difference between carrying a phone in your front of back pocket.

I have confirmed myself playing around with iStumbler on my laptop and the “Pally BLE Scanner” app on my phone (with different experimental setups) that the shadowing due to a body is at least of 15dB (disclaimer: I should exercise more).

A delta of 20dB corresponds to a corrective factor in distance of 10 (rule of thumb: every 6dB correspond to a doubling/halving of distance).

You have to take into account that this dampening of the signal could be present on the infector side, on the infectee side, or both! This seems like a much more significant factor in deciding which protocol to use (or at least that it would be much more critical to pay attention how this noise is removed in either protocol, and who gets to decide what is noise in the first place!).

Now, Bluetooth is a very complex protocol. In fact there are very many additional sources of information that could be used to improve the distance measurement, such as the fact that it is not just one channel but at least three (see again here and other research we are currently analyzing).

This being said:

  • all these improvements on the distance measurements come from opportunities of the Bluetooth channels (or side channels) that are also risks from a privacy point of view. For instance, which of the three subchannels are used and in what order can also help determining which device model is being used. In a sense, saying that a side-channel is exploited only highlights gaps in the privacy analysis.
  • There is also huge variation (up to 20dB, again a factor of ten) between smartphone models, and on average 10dB between devices of the same model (a factor of 3 - not actually clear that this variation is not already present with the same device jut at different times). This is measured by the Singapore team
    in an anechoic chamber:

Signal attenuation does enter into the risk calculation of the Google/Apple APIs. Apps get offered access to an attenuation value (from API doc v 1.2):

This value is calculated as the difference between transmission power and the maximum strength of the signal received:

You would be forgiven to wonder how the transmission power is known to the receiving device. Well, it turns out this is transmitted automatically, but encrypted so only Google and Apple devices can decode it once a person declares themselves infected (you have to read through both the Google/Apple Cryptography Specification and the Bluetooth Specification). Note that there are bytes reserved for future use (and that it is hopeless to measure distance to a device without knowing the power AND the orientation of the emitting device).

This encrypted metadata is encoded using the temporary keys (so can be decrypted on the at-risk person’s phone once the infected person declares their status through the server, by handing out the master keys).


This post has been written as the author understood the comparative significance of the hardware and epidemiological factors. Indeed, it is clear by looking at the engineering that technical sources of errors and the systematic biases they might introduce will dominate everything else, and should be addressed before even considering:

  • making a comparison between de/centralized protocols.
  • computing how those systematic biases translate into systemic outcomes (e.g. false positives systematically affecting users of Android phones sold under Chinese brand)
  • thinking about false positives/false negatives

To really get the implication of this, we need to think in false positives and false negatives. The research that suggest contact tracing is possible (https://science.sciencemag.org/content/368/6491/eabb6936) indicates that false negatives are almost totally unacceptable. So you have to increase the circle so much that even when there is an error, you still catch everybody within 2 meters. The variance in the mobile phones between -50 and -80 db, or 30 db variance. Add the two times 20db Paul-Olivier found to that and you have an error of 70db in signal strength. So the signal strength of somebody within 2 meters can be anything from -60 (or so) to -130 db (or so). So any signal of more then -130db has to be counted in as ‘within 2 meters’. That -130db is below the lowest signal level the mobile phone will be able to detect, so every phone that is within BLE-range will have to be marked as ‘possibly infected’. The ‘box’ to mark people infected, ranges up to the theoretical range of BLE of 100m. How big it is on average is hard to estimate but something like 5-10 meters is quite conservative. When you go out, how many people come within that 5 meter from you? And within 10 or more? Many of them will be marked as ‘possibly contaminated’. Depending on what you are doing, that is easily more then 100 people a day. Take that over 3-5 days, and we are talking about hundreds of people. On average (R0) about 3 of them are actually contaminated, if somebody tests positive. So the ratio between positive and false positive is 1:100 or more. So somewhere are 1000 positive tests a day, then every day at least 100.000 people have to go into quarantine and stay there until they are tested negative after 3-5 days or didn’t show symptoms after 2 weeks. If you also trace contacts of contacts, as some researchers suggest, then this will result in millions and millions of people in quarantine every day. Conclusion: the errors in BLE render automated contact tracing totally useless.

Bonus question: how many tests would you need for such scenario’'s?

RSSI variability source: imPACT 2020 - YouTube

and how to improve

Lets also listen 24/7 so we can detect if somebody sneezes or coughs. I have the feeling the problems are only getting bigger…

There seems to have been an update in the API. Now, it looks like the exposures are batched together depending on the attenuation level, in three buckets.

Bravo Paul! I shared this post to the Telegram channel DHID (Digital Health Disposable Identities):
UX User Stories:
https://t.me/DHID_UX_Group (https://t.me/DHID_UX_Group)
https://t.me/joinchat/IBy5OR1LyMBlt5iFz0XdAg (https://t.me/joinchat/IBy5OR1LyMBlt5iFz0XdAg)
https://t.me/joinchat/IBy5ORuEwOD24UnvOPLCsA (https://t.me/joinchat/IBy5ORuEwOD24UnvOPLCsA)
AI: https://t.me/DHID_AI_UX_Group (https://t.me/DHID_AI_UX_Group)
Coms: https://t.me/DHID_Coms_Group (https://t.me/DHID_Coms_Group)
The Project GitHub:
https://github.com/disposableidentities/healthcrisis/blob/master/README.md (https://github.com/disposableidentities/healthcrisis/blob/master/README.md)
https://www.theinternetofthings.eu/hygienegap-project-provides-citizen-centred-digital-trusted-health-status-solution (https://www.theinternetofthings.eu/hygienegap-project-provides-citizen-centred-digital-trusted-health-status-solution)