South Korea is publishing fairly openly a list of individuals (without until recently much concern for privacy except for witholding names) who have been infected, and the tracing the government has managed to do of the infection (this site is in Korean; hint: on Chrome right-clicking offers the option of translating the page).
Here is an example report, run through Google Translate
This site is also discussed in this talk (starting at 17:35):
It would be interesting to do some analysis on this dataset (NLP if the dataset is big), particularly the contact reports for transmission events (rather than the patient reports), in order to:
- assess which communities get mentioned in those reports (ensuring completeness of the thinking of the sociologists here)
- model how frequently the transmission hops from one community to the next, vs within each community
Both of those could be done with the ability to ignore the superspreader event of Patient 31 in South Korea (going to church, which were mapped to the heat maps available in South Korea as a suboptimal retelling of the impact of that one community).
The interest in this dataset is motivated for instance by a scientific understanding of the (limited utility) of R0 vs secondary transmissions, and a scientific understanding of the social response necessary to account for epidemics while being aware of risks such as discrimination etc (detailed here).