Heterogeneous interventions reduce the spread of COVID-19 in simulations on real mobility data

Major interventions have been introduced worldwide to slow down the spread of the SARS-CoV-2 virus. Large scale lockdown of human movements are effective in reducing the spread, but they come at a cost of significantly limited societal functions. We show that natural human movements are statistically diverse, and the spread of the disease is significantly influenced by a small group of active individuals and gathering venues. We find that interventions focused on these most mobile individuals and popular venues reduce both the peak infection rate and the total infected population while retaining high social activity levels. These trends are seen consistently in simulations with real human mobility data of different scales, resolutions, and modalities from multiple cities across the world. The observation implies that compared to broad sweeping interventions, more heterogeneous strategies that are targeted based on the network effects in human mobility provide a better balance between pandemic control and regular social activities.


Computation of social and health values of interventions.
The social value of an intervention strategy is measured as the percentage of activities (check-ins) preserved under the intervention. When a venue is closed or an agent is protected, the corresponding activities are canceled. The social value of a strategy is denoted by 1 − x/y in percentage, where x and y number of check-ins (for Foursquare datasets) or meetings (for University and Bike datasets) performed under the intervention and without intervention respectively. The health value of an intervention is correspondingly measured as the percentage of agents who escape infection due to the intervention. If x and y number of agents are infected under intervention and without intervention respectively, then the strategy's health value is the proportion 1 − x/y, usually written as the percentage.
The simulations are implemented in Python and executed in standard desktop machines with Intel i7 cores and 16GB memory. The simulation is compute-efficient -all our simulations run under 5 minutes. The efficiency results from in-memory processing, pre-computation of meetings prior to simulation, and storing infection states in the data structures for venues. All of these reduce the number of operations at each discrete event timestamp. The result of a simulation contains the timestamps of new infections and recoveries. As our simulation model is stochastic, each experiment is run 10 times to test their stability. We report the temporal dynamics in the figures with the median of 10 runs as the solid curve and the shaded regions denote the area between 25 and 75 percentiles. The figures are smoothed with 7 day rolling average. Moreover, to bring about the major trends, the growth rate and reproduction number plots are further smoothed using a standard Gaussian filter with the standard deviation of the Gaussian kernel set as 2 days.

Existing Models for infection spread
There are various existing models for infection spread. We discuss below the ones most relevant to us.
• Epidemiology models: Classical models in epidemiology include SIR (Susceptible-Infected-Recovered), SEIR (Susceptible-Exposed-Infected-Recovered) models 5 and other variations 6,7 . These models make simplistic assumptions of a homogeneous population and any two individuals have the same structure of interactions and dynamics. The model evolves by using a small number of parameters such as an infectious person's chance infecting another one and then deriving ordinary differential equations. To incorporate heterogeneity in a large population, meta-population models 8 include population structures that describe variations in age groups, behaviors, neighborhoods, but in general, these models are coarse-grained.
• Data-driven models: A few models such as the one from the Institute of Health Metrics (IHME) 9 use a data-driven model to predict the number of new infections, based on data from other countries. This model assumes that the infection process is uniform across different countries, thus it ignores the important parameters such as the discrepancy of culture, weather, the density of population, and the lifestyle.
• Multi-agent models: The models from Imperial College London 10 are individual-based multi-agent models. These models are fairly complicated with a large number of parameters describing the interactions between the agents. For example, individuals are assumed to reside in high-density residential areas from census data. Contacts with other individuals are assumed to happen within the household, at school, in the workplace, and in the wider community. The parameters of population density in these scenarios are taken as the average in published data. It is a challenge to choose these parameters and validate the choices against real data.

Comparison with standard SEIR model
From a dataset, we count the average number of daily contacts for an agent, c. With the population size of N, there are Nc/2 contacts in total per day. The SEIR simulation progresses in synchronous daily rounds and Nc/2 contacts are randomly sampled each day. The other parameters remain similar to Fig. 1.
We apply the standard SEIR model to COVID-19 parameters, i.e., consider meetings between random pairs of agents. We simulate the person-to-person transmission model in the University and the Bike datasets keeping the model parameters the same as Fig. 1B. Given the dataset, Nc/2 contacts between two agents are randomly sampled. Here, we ignore the time-stamped of each meeting from the dataset. Similar to our simulation model, the simulation starts with 10 initial seeds and proceeds with probabilistic disease transmission using sampled contacts. Supplementary Fig. S7 compares the infection spreads in two models for a setting without intervention. For both University and Bike datasets, a larger population gets infected by the SEIR model compared to our mobility based simulation. In both the datasets, the peak of active infections in the SEIR model is at least 15% higher delayed by more than 35 days than the mobility model. This is due to heterogeneity of agents -more active agents get the virus early and infect other susceptible agents early -resulting in an early peak. Besides a large fraction of agents have a low number of meetings, therefore, have less 3/23 risk of being infected which leads to a lower total infection number compared with the SEIR model. Our observations match with the observations in 11 .
The difference in meeting distributions in the two simulation models results in different distributions for the number of agents infected from an individual. While the mobility-based model has a long tail distribution suggesting that the more active agents infect more people, the SEIR model does not have a long tail.    Figure S1. Infection spreading with the intervention strategy of varying probability, with which check-in is skipped with random sampling. All interventions start when 10% of the population is infected and it lasts for 15 days. While with higher probability (i.e., stronger intervention) the peak of active infections gets delayed and lowered, the total number of infected people is independent of the probability (in Istanbul, Tokyo, Jakarta, University, and Bike datasets.

Days from Start
No mitigation 5% 10% 15% Tokyo Chicago Los Angeles Istanbul Jakarta London Bike Figure S3. Infection spreading with a varied start time of uniform intervention as Part I in Figure 2. The intervention strategy uniformly randomly skips 80% of the check-in or meeting events. Intervention starts when 5%, 10%, and 15% population is infected and lasts for 15 days. This intervention strategy can reduce the total number of infected agents in some datasets, but the fraction is independent of the starting time of the intervention.

8/23
Part I  Figure S15. Infection spreading with a varied number of seeds. With a small number of initial seeds, the variance of infection is very large (shaded area). In most cases, when the number of seeds is enough, the total infection numbers and the peaks of people infected actively do not have a large difference.  Figure S16. Infection spreading with the varying probabilities of being asymptomatic. With a high asymptomatic probability, there will be more infected people who have a longer duration to infect other susceptible people. It leads to a high percentage of people infected in total and a higher peak of people infected actively. Close department Close cafeteria Close classroom and department Close classroom, department and cafeteria Figure S18. Closing some types of venues and constraining some activities are good intervention strategies in universities. In the university environment, there are many gathering events, especially taking courses and eating. In reality, many universities allow students to stay in the campus, but move the courses online and require students to take the meal back to dorms. So we test the spread in the university if some types of venues are closed. It shows that closing one type of venue is not enough to control the spreading. Only when we close all the classrooms and cafeteria, the total infection can be controlled under 20%, because students can also contact each other in other venues, due to the limited number of venues.