On the use of aggregated human mobility data to estimate the reproduction number

The reproduction number of an infectious disease, such as CoViD-19, can be described through a modified version of the susceptible-infected-recovered (SIR) model with time-dependent contact rate, where mobility data are used as proxy of average movement trends and interpersonal distances. We introduce a theoretical framework to explain and predict changes in the reproduction number of SARS-CoV-2 in terms of aggregated individual mobility and interpersonal proximity (alongside other epidemiological and environmental variables) during and after the lockdown period. We use an infection-age structured model described by a renewal equation. The model predicts the evolution of the reproduction number up to a week ahead of well-established estimates used in the literature. We show how lockdown policies, via reduction of proximity and mobility, reduce the impact of CoViD-19 and mitigate the risk of disease resurgence. We validate our theoretical framework using data from Google, Voxel51, Unacast, The CoViD-19 Mobility Data Network, and Analisi Distribuzione Aiuti.

Recall the social estimate of the reproduction number: (1) There are many definitions of mobility and a great variety of data repositories. Typically, mobility trends are subdivided into movement trends and trends in distance traveled. The first mobility proxy accounts for changes in movement fluxes for which we have relied on Google mobility open source data [7]. Google reports movement trends as reflected in changes in numbers of visits to various locations (parks, public transportation hubs, residences, etc.), relative to an average baseline on the same day of the week evaluated before the pandemic outbreak. We have used the the average mobility across all locations (residential included).
The second type of mobility proxy is the percent change in average distance traveled, as Unacast [9]. When kilometers traveled per day is used as a mobility proxy, one can use the principle of maximum entropy, under the assumption that the individuals are independent particles and that there is an average daily trip length in the population. Thus, the distribution of distances travelled is exponential, Here L 0 is the characteristic daily path length reported in mobility data. We checked that distance traveled per day and movement requests show almost exactly the same trend. These data offer metrics for physical distancing based on GPS devices. They provide proxies for social mobility (interpreted as average distance traveled) and interpersonal proximity (human encounters), calculated with respect to the 4 weeks before the CoViD-19 outbreak. The former is the percent reduction in the total distance traveled per device, averaged across all devices located in a given US state. The latter is an estimate of close encounters between two devices per square kilometer, expressed as a fraction of the baseline. This estimate counts two users from the same province as having come in contact if they were within a circle of radius 50 meters of each other within a 1-hour period.
Regarding physical proximity we use various proxies according to the availability of data in the aforementioned repositories, namely the Physical Distancing Index (PDI) of Voxel51 [8] and the Human Encounters Index [9]. Voxel51's estimates depend on deep learning models that are able to detect and identify pedestrians, vehicles, and other human-centric objects in the frames of each live street cam video stream in real-time. Using images sampled from each video stream every 15 minutes, they compute the PDI, an aggregate statistical measure that captures the average density of human activity within view of the camera over time. The PDI value for a particular day will not exactly correspond to the number of people in that exact frame. Rather, the PDI value at time, t, is an average measure of peak activity during a window of time (a few days) around that time. So, a large PDI value means that there were a lot of people out and about around that time, at some point.
Voxel51's PDI website presents measures of activity at a single location in the city. While these measures are likely correlated with overall trends in activity in the city, and are thus an interesting proxy for public behavior, this is not guaranteed. This data on the concentration of people in a certain area lets us estimate the density of individuals in a two-dimensional plane. From that information, one can infer the mean distance between individuals, assuming random positions. Note that this proxy has been obtained at the level of cities.
Another proxy for physical proximity is provided by the Unacast dataset [9]. These data offer metrics for physical distancing based on GPS devices. They provide proxies for social mobility (interpreted as average distance traveled) and interpersonal proximity (human encounters), calculated with respect to the 4 weeks before the CoViD-19 outbreak. The former is the percent reduction in the total distance traveled per device, averaged across all devices located in a given US state. The latter is an estimate of close encounters between two devices per square kilometer, expressed as a fraction of the baseline. This estimate counts two users from the same province as having come in contact if they were within a circle of radius 50 meters of each other within a 1-hour period.
In order to turn the estimate of the number of close encounters into an estimate of proximity, we use the principle of maximum entropy. The maximum entropy distribution of particle positions would be a uniform distribution. This corresponds to an exponential distribution of inter-particle distances. The characteristic length scale (equal to the mean distance between particles) is the inverse of the square root of the particle number density (up to a factor of order π). A similar derivation was given by [15], where the proximity ρ to the first nearest neighbour in a randomly distributed population at density δ, can be written as At this point we show an example of how to use Facebook mobility data on the estimation of the social reproduction number R t . Facebook provides two types of indicators: the percentage change in movement relative to baseline and proportion of users staying put within a single location. The first variable is used as a proxy of movement change transforming it to a decimal number with baseline of value 1 instead of 0. As for the second variable, we take the complement to 1 in order to get an estimation of how many people are not at home. After such normalization, in order to obtain a proxy for proximity, we transform the normalized second variable (which is δ) according to Eq. (2). Then we plug those variables directly into Eq. (1). A very similar approach is taken for the other types of data from Google, Unacast and Voxel51.
Moreover, for the case of Italy, we also use data on the number of face coverings and masks distributed among the population, which we interpret as inversely proportional to physical proximity in the model. This should be reasonable, at least during the period of lockdown. MacIntyre [16]and Chu [17] show that the risk of getting infected drops by half for every additional meter of distancing.
In Figs. 1-2, we plot the reproduction number, making use of Google [7] mobility and Voxel51 [8] proximity data for certain US states and EU locations. In the latter case the camera was in a location devoted to leisure activities, this yields a biased estimate of the mean interpersonal proximity as compared with rest of the state.
A selection of results of R(t) via social distancing is shown in Figs.3-7 for many US states, using Unacast [9] mobility and proximity data for the first period of the epidemic from March to July 2020. These results are consistent with Comparison between the reproduction number ensemble average of [13] and [14] (gray dashed line) and the reproduction number computed according our kinetic approach, using data from [7] mobility, [8] for social proximity and [2] for epidemic data. Ribbons are the 80% credible interval obtained via bootstrap. Epidemiological data and R(t) estimation are from data referred to the date of lab diagnosis.
the those obtained using other databases for social mobility and human proximity.
In order to confirm our analysis also for the second wave of CoViD-19, we apply the approach presented above for the second wave of epidemic in Italy, for the period from September 2020 through December 2020. In this period we have used data from Facebook's database, and infer both mobility and proximity from movement range data as described in [10]. We have considered level changes in the rates of mobility and staying put. The first measures the change in frequency of travel between the last two weeks of February and the present day. The latter indicator measures the percentage of people who remained in one location for 24 hours with respect to the same baseline period. One minus this value is a proxy of the density of people exhibiting movement outside the vicinity of their home, and that can propagate the contagion. We have made the same estimate with Voxel51 and Unacast databases, which we have used for our analysis through the period of June/July 2020. These two data providers exited the Data for Good program after summer 2020, so for the second wave analysis, we have preferred to use another data provider which still participated in the Data for Good program. The CoViD-19 Mobility Data Network has helped us confirm the results we found for the first part of the pandemic. We set the beginning of the second wave in Figure 3: Reproduction number of CoViD-19 as estimated in literature, red, and via social proxies (see Eq. 1), blue, for New York, Florida, California, and Nevada. Effective reproduction number data courtesy of Rtlive [12] and social distancing data courtesy of Unacast [9].
Europe at September 22nd 2020 following Epiforecast models [13]. However, especially for the CoViD-19 second wave, other important considerations need to be taken into account. For example, virus mutations can trick the ability to efficiently test people and prevent detection of infectious individuals. Furthermore, imported cases need to be taken into account during the periods in which European countries reopened internal and international borders. In summer 2020, the number of imported cases was greater than that of internal cases, so that hubs of infection became crucially important. Consequently, the assumptions of our model cannot entirely hold true. However, after the transient interval of the second outbreak, these clustering effects stabilized. In the specific case of Italy, social distancing interventions have been assessed at the regional level. Notice that the estimation of real cases for the second period is λ ≈ 0.12 for Italy. In the USA, we observe a more continuous trend of R t without a clear distinction between different waves. So the effects of imported cases and infection hubs play a minor role. In fact, we rely on the Rtlive [12] database provided continuously since March 2020, as reference for epidemiological reproduction number estimations. Thus, we can apply Eq. (1) using mobility and proximity from Facebook's [10] data program. However, we needed to estimate the test and trace variable λ multiple times over the entire period from the end of March 2020 to the beginning of December 2020. In addition to these evaluations, we also apply model calibration for each of the λ intervals. We have applied the overall estimation to some of the hardest hit states in Fig. 8, were we have used λ 1 ≈ 0.04 for the period March-May, λ 2 ≈ 0.11 for the period June-September and λ 3 ≈ 0.36 for the period October-December. In addition, this overall analysis requires a more precise alignment between social distancing and epidemiological data for such an extended period of analysis and calibration via regression multiple times each period.  [12] and social distancing data courtesy of Unacast [9].     [12] and social distancing data courtesy of Facebook [10].

B Description and interpretation of the model variables Human mobility and interaction radius
We observed above that we used various social distancing data sources as proxies of human mobility and social proximity in our model. In general, one can use any data source for human mobility ν that informs one of the movement trends of the population of a region. We infer it in three ways: km traveled per day by individuals, change in frequency of travel, and number of visitors per place (mobility fluxes) with respect to a baseline period.
In general, these are metrics that measure the amount of movement occurring within a population. We consider such movements to be a proxy for average relative speed among particles of an ideal gas.
As for interpersonal proximity ρ, it has an inverse effect on infectivity to that of the interaction radius r. A large interaction radius corresponds to requiring a small distance between individuals for a possibly contagious contact to occur. Assuming people are positioned randomly in a region, we consider that the distance between individuals to be exponentially distributed. Therefore, they have a characteristic interpersonal proximity which is inversely proportional to the number of active individuals in the region of collisions. In this framework, we have inferred social proximity by the number of active individuals given as the complement of people staying at home, or a direct observation of individuals in a certain area.

Test and Trace variable
Changing trends in the reproduction number may be due to several interrelated reasons apart from physical distancing policies. These reasons can be collected into two groups. The first has to do with the virus itself and its capacity to spread. For example, an increase in temperature or the development of less dangerous strains can decrease the effective infectiousness of the contagion. Another hypothesis is that a non-negligible fraction of the population has some form of cross-immunity against SARS-CoV-2 [18,19,20]. This could explain why serological tests performed in Italy and USA gave a value of about λ 0.16 [21,22,23]. The other group of reasons is connected to the decrease in the number of susceptible individuals. Supposing that the latter is the actual reason for changes in R, we fitted it with a linear function of the total population infected using a very simple ansatz: whereR is the reproduction number at the beginning of the period of analysis. In this way, when the number of susceptibles approaches zero R(t) → 0. So, the value of the officially detected fraction of the population leading to R(t) = 0, c null = λ, gives the ratio, λ, between the number of officially detected and the number of actual cases (supposing that this ratio is approximately constant in time). A more detailed and extended discussion of this prevalence estimate from primary data is presented in [24]. These results are a posteriori partially confirmed by some preliminary results from antibody testing performed in Italy. They are also in line with the estimated test fraction, λ, found through models used by officials to decide on policies like shelter-in-place orders, such as [25] and [26].

Transmissibility variable
Let us discuss briefly the meaning of the transmissibility variable η defined as the probability that a contact results in an infection. This variable can change with environmental conditions as well as viral infectivity. All viruses, including SARS-CoV-2, accumulate mutations as they evolve over time. Most mutations have no detectable effect on the biology of the virus. But a few have the potential to change the virus in minor ways and be spread to other hosts. Occasionally, a mutation will enable faster replication, transmissibility, or escape from the immune system, in which case the mutated virus may spread widely throughout the population. However, the D614G variant of SARS-CoV-2 became the dominant form of the virus by March 2020 and continues to be the main form of the virus seen in most infections [27]. Moreover, environmental changes can have an impact on the transmissibility of the virus, as in the case of air temperature, humidity, UV radiation, air flow, etc. All such information is embodied in the variable η which we have taken to be constant over the periods under examination. Theoretically, it would be possible to embed virus-host changes as well as environmental influences in the transmissibility variable η as studied in [28, 29? ]. In fact, we can link the ability of the virus to be transmitted with a probability of success depending on viral fitness and system "temperature" for crossing immune and environmental barriers, as described by kinetic studies on epidemiologicy as in [28,30,31]. However, the estimation of transmissibility η goes beyond the scope of the present work, and we assume it to be stable over the time span of the epidemic. Some variation in η can be accounted for in the calibration process of our analysis.