Travel distance, frequency of return, and the spread of disease

Human mobility is a key driver of infectious disease spread. Recent literature has uncovered a clear pattern underlying the complexity of human mobility in cities: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \cdot f$$\end{document}r·f, the product of distance traveled r and frequency of return f per user to a given location, is invariant across space. This paper asks whether the invariant \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r\cdot f$$\end{document}r·f also serves as a driver for epidemic spread, so that the risk associated with human movement can be modeled by a unifying variable \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r\cdot f$$\end{document}r·f. We use two large-scale datasets of individual human mobility to show that there is in fact a simple relation between r and f and both speed and spatial dispersion of disease spread. This discovery could assist in modeling spread of disease and inform travel policies in future epidemics—based not only on travel distance r but also on frequency of return f.


INTRODUCTION
The global impact of the COVID-19 pandemic on both human health and socioeconomic activity has brought to light the importance of a nuanced understanding of the way that epidemics spread in cities. Urban spread of disease is inherently tied to human mobility: as we move through and between cities, we serve as vectors that allow disease to spread to new individuals and communities.The nature of the relationship between mobility and spread of disease has been extensively studied; it is widely understood that human travel is a driving force behind disease spread [1][2][3][4][5].For this reason, many public policy interventions implemented worldwide to contain the spread of COVID-19 focused on limiting mobility, restricting the radius that individuals could travel from their homes and limiting travel between cities and countries.
As we continue to learn more about the ways that humans move, it is important to apply new discoveries about human mobility to the study of disease spread, deepening our understanding of urban epidemics for use in future public health crises.Recent research uncovered a clear, universal pattern in urban mobility: the total distance that the average visitor to a given location travels to reach it is constant across a city, unrelated to the lo-cation's overall attractiveness [6].This value, equivalent to the distance of a location to an individual's home multiplied by the number of times that they visit it over the course of some time period, can be thought of as an "exploration velocity": it is the effective distance individuals travel towards any given location per unit of time.It is a simple but critical parameter to understanding human movement.
Given the simplicity and apparent universality of this parameter of human mobility, its relationship to disease spread is a natural next question.Many travel restrictions and recommendations during the 2020 and 2021 COVID-19 pandemic focused on confining movements to a certain geographical area-for example, restricting the radius that people are able to travel to one's neighborhood, city, state, or country.But what if speed of disease spread depends not just on radius of travel r, but also on the product v := r • f ?If v has additional impact on disease spread, beyond average travel radius r, COVID-19 policies and restrictions that focused solely on restricting radius may have left significant unmet potential for further disease mitigation.This paper investigates the relationship between exploration velocity v := r • f and spread of disease, opening the door to future research on the ways in which that relationship can be utilized to contain epidemics.

RESULTS
Simulating reduction of exploration velocity on real data.In order to understand how v = r • f interacts with speed of disease spread, we start with largescale datasets of individual human movements in New York City and Dakar, Senegal, which we call M real .The datasets each consist of a set of trips for N individuals over different time periods T , where each trip indicates a given individual moving between two locations.We then use agent based susceptible-infected (SI), susceptible-infected-recovered (SIR), and susceptibleexposed-infected-recovered (SEIR) models calibrated with estimates for COVID-19 [7] to simulate disease spread as agents follow the trajectories in our datasets.Aside from their unique trajectories, each individual is assumed identical.In each simulation, we vary two parameters: the maximum travel distance r, measured relative to each agent's home location (see Methods for details), and the travel frequency f , the maximum number of times each location was visited.By varying these two parameters together, we are able to manually adjust exploration velocity v in the system.In practice, this means discarding any trip of length greater than r and all but f randomly selected trips to a given location from our datasets.The details of the spatial partitionings used as well as other simulation details are given in the Methods.
Figures 1(a) and (b) plot the time it takes for the disease to reach some set proportion τ of the population against exploration velocity for each dataset.The value of τ is set to 29% for our Dakar data and 57% for our New York City data.Thresholds are chosen to be the total epidemic size at the end of the study period under the strictest restrictions for each study area, respectively.Simulations were run over the entire length of the datasets (14 days for the Dakar dataset; 28 days for New York City) with 10,000 agents and a 5% initial infected population.The trends are intuitive and not surprising.For a given f , τ decreases monotonically with r: the further people are allowed travel, the faster the epidemic spreads.Similarly, for a given r, τ decreases monotonically with f : more frequent return trips leads to faster-spreading epidemics.What is surprising, however, is that the τ (r) curves for each value of f have similar shape.This echoes a previous finding [6] and hints that τ, r and f might have a simple relationship.Figures 1(c) and (d) show that they do.Under the rescaling r → r •f all the data appear to merge, collapsing to a single curve that depends on the unifying factor v := r • f , or exploration velocity.This curve can be empirically characterized as where a and b vary across cities and model types.Figure 5 in the Supplementary Material shows the same scaling collapse is achieved when the SI and SIR models are used, and Figure 6 in the SM shows it persists for different values of the disease parameter R 0 .Taken together, these findings suggest the r → r•f collapse persists across multiple disease processes.
After observing the relationship between r • f and epidemic size, we check whether there is a relationship between r • f and spatial dispersion of the disease.We analyze the spatial concentration of infected persons in our New York City simulations using the M function developed by Marcon and Puech, which is calculated as a function of some radius k around each agent [8]-see Methods for a detailed description of this statistic.High M indicates tighter clusters of cases; low M indicates a more homogeneous distribution of cases across space.We find that for a given radius k, as r and f increase, M (k, r, f ) decreases: infections become consistently more dispersed across the city (see Figure 2a for this relationship when k = 700m).As with epidemic size, under the rescaling r → r • f , this relationship collapses to a single curve, as shown in Figure 2b.This indicates that restricting r • f is more effective than restricting r alone at containing geographic spread of disease, just as with slowing epidemic spread.This relationship is robust to the choice of k-as k increases, M (k, r, f ) decreases (as would be expected-the larger radius you look at around an agent, the more representative sample of the whole population you will capture) but stays significant and maintains its relationship to r • f , as illustrated in Figure 7 in the Supplementary Material.This relationship is demonstrated visually in Figures 2c and 2d.When r is fixed at 8, increasing f from 1 to 6 has a clear effect on the extent to which infections spread across the city (Figure 2c); while significant portions of the city remain untouched by infection when f = 1, many more grid cells contain at least one infection when f = 6.In Figure 2d, on the other hand, we see that when r ḟ is held fixed at 12, spatial distribution of infection rates is almost identical for different r and f values, demonstrating the relationship between spatial distribution and r •f .

Mechanisms.
In order to illustrate the potential mechanism by which r • f determines disease spread time, we assess the relationship between r•f and both number and variance of contacts between agents in our simulations.Think of our simulations as a network, where agents are nodes and edges are formed whenever two agents come within some distance of one another, enabling disease transmission.It is a known result of the SIR model that speed at which a disease spreads through a network of individuals depends not only on the average degree, or number of contacts per individual k , but also on the variance of contacts (via k 2 , or mean squared degree) according to the following relation: where τ is the characteristic time, or time it takes for the disease to reach 1/e = 36% of the population, γ is the daily recovery parameter and β is the infection parameter.This relation isn't perfectly applicable to our context-it assumes exposure to all contacts at all timesteps, whereas our simulations incorporate movement in space and thus non-constant degree k over time.However, it is valuable in illustrating the effects of restricting r and f in our dataset.Restricting radius of travel r and frequency of return f affects both k and k 2 .The closer you stay to home and the fewer trips you take, the fewer unique individuals you have the opportunity to encounter (reducing average number of contacts k ).Further, as stricter restrictions are enforced, high-r, high-f trips are removed and the mobility patterns of high-frequency, long-distance travellers start to look more like the mobility patterns of low-frequency travellers who stay close to home (reducing variance of contacts k 2 ). Figure 3 plots these relationships in our New York City data, showing exploration velocity r • f on the x-axis against average number of contacts k and average squared number of contacts k 2 on the y-axis in panels a and b, respectively.Again, we see a logarithmic relationship.Plugging both k and k 2 into the relation above, we see that characteristic time as predicted by the degree distribution (τ ) is proportional to the true characteristic time (τ ) that we see in our simulations -Figure 3, panel c.Taken together, these findings show that the shape of the relationship between r • f and τ can be predicted by decreases in number and variance of contacts between agents in combination with fundamental SIR modeling results.
These results can be replicated by a simple modification of the preferential exploration and preferential return (PEPR) mobility model presented in previous literature [6].Under the PEPR model as proposed in [6], at each timestep, agents either explore a new location with a fixed probability P new or return to a previously-visited location with complementary probability 1 − P new .In order to choose a new location, agents draw a radius of travel ∆r from a heavy-tailed distribution with exponent P (∆r) ∼ |∆r| −1−α , with α = 0.55 taken from [6], and they draw an angle of travel θ with preference towards θ that are heavily visited by other agents (see Methods for details).As demonstrated in [6], the PEPR model replicates the universality of r • f .However, when we run our SEIR model on mobility trajectories simulated with the PEPR model, we do not see the same relationship between r • f and τ that we see in our real data.This is because, in the original PEPR model, agents are always moving from place to place without necessarily returning home, whereas in our real mobility trajectories, time spent at home means less exposure to new, unique contacts (and more exposure to the same neighbors).This leads to the PEPR model being unable to replicate the changes in k that are associated with loosening or tightening radius restrictions, as seen in Figures 4a and  4b.In order to correct for this, we add a probability of travel P travel -at each timestep, with probability P travel , the agent travels to a new location according to the protocol above, whereas with complementary probability 1 − P travel the agent stays or returns home.Figures 4c  and 4d show the results of rerunning the same SEIR simulations that were run on our real mobility traces M real on a set of simulated trajectories M sim derived from the modified PEPR model.In order to implement   radius restrictions within the PEPR model we restrict the power law distribution from which ∆r is drawn and in order to implement frequency restrictions we delete all trips beyond frequency f to the same location.We set P travel = .25,the average proportion of time spent away from home in M real , and run the PEPR simulations on a unit square (see Methods for more details).We see the same logarithmic relationship between r • f and both τ and τ (expected τ derived from k and k 2 ) that we see in our simulations over M real , although the simulated results are less precise than the results we see in M real .We also replicate the simulations with P travel = .40and see similar results (see Figure 8 in the SM).This indicates that the modified PEPR model is able to replicate the relationship between exploration velocity and speed of epidemic spread.

DISCUSSION
The significance of r • f was revealed in recent work [6] on human mobility patterns which shows the number of people who visit a given location f times from a distance r during a certain reference period follow a universal, inverse square law N (r, f ) ∝ 1/(rf ) 2 .We suspect the τ (rf ) curve found in this paper derives from this inverse square law; while we have discussed the relationship between exploration velocity and network contacts, deriving the exact mechanism linking the universal inverse square law to τ (rf ) is an open question for future work.
An important implication of the τ (r • f ) curve is that restricting short, high frequency trips in a simulated environment does as much to slow disease spread as restricting long, low frequency trips.Agents in our simulations that take multiple trips around their neighbourhood spread disease as quickly as a those taking single, long trips to the city center.This is interesting academically, but more importantly, it has potential implications for policy: it means that restricting travel distance r but ignoring travel frequency f could be ineffective in containing the spreading of a disease.
That said, an important limitation of this study is that it restricts exploration velocity in a very specific way, removing trips that extend beyond a certain r or f .In reality, it is unclear how people would respond to policies that aim to restrict exploration velocity.True behavioral responses to different travel policies and what policies or mechanisms would be required to change individuals' behavior in a way that reduces exploration velocity are areas for future research.Further limitations of this study include the inherent limitations of simple epidemiological modeling; it is known that the SI, SIR, SEIR and other classic disease models make assumptions which bound their accuracy [9][10][11].The estimates for the model parameters likely carry error [12], and we also assumed that each agent was identical and that each exposure within a set radius carried equal likelihood of infection.Moreover, the scale of our analysis is restricted to the city level (because our datasets are collected at this level) and so our findings do not necessarily generalize to the perhaps more important case of country-level and international travel policies.If the r • f invariance, or some form of inverse relation between r and f , holds for international mobility patterns, then, as at the city scale, lax distance limits could be compensated by strict frequency restrictions.This is a bold hypothesis, which should be tested in future work.Metapopulation disease models [13][14][15], with their convenient trade off between realism and parsimony, seem like a good theoretical starting point for this effort.
Despite these limitations, we believe our results reveal novel insights and bring up an important component of human mobility that could prove useful for future policies on COVID-19 and other epidemic diseases.In the current COVID-19 pandemic and in future pandemics, evidence-based policies at the city scale (not to mention the national or international scale) are needed to mitigate speed and intensity of disease spread.Our results have the potential to help this effort by opening up investigation into the relationship between exploration velocity and epidemic spread.They indicate that the exploration velocity of a city's inhabitants r • f must be boundedto bound distance r but not visitation frequency f leaves potential for disease containment unmet.Furthermore, and more optimistically, if a bound on exploration velocity is possible and effective, it would mean that strict distance restrictions at the city or town level -as adopted by, e.g., Ireland and Italy at the beginning of the infection -are perhaps unnecessary.Given a desired bound on epidemic speed τ bound , a large r could be offset by a small f ; allowing citizens to travel infrequently to distant services (doctors, hospitals etc) may be safe.This inverse relation between travel distance and frequency (r ∝ τ bound /f ) could also vitally inform remote working policies supporting the hypothesis that working from home multiple days per week -and thereby limiting the visitation frequency to workplaces -helps prevent the spread of disease.

METHODS
New York City data.Individuals' movements in New York City are inferred from GPS traces collected from mobile phones by the company X-Mode over a span of one month (February 2020).The raw data contains about 479,163 anonymized users; our analysis uses 10,000 users randomly selected from those that appear in the dataset every day in the month of February.Dakar data.The Dakar dataset is based on anonymized Call Detailed Records (CDR) provided by the Data for Development (D4D) Challenge.The detailed information of this dataset is provided in [16].Here, we use the SET2, which includes individual trajectories for 300,000 FIG.4: PEPR simulation results.When we run SEIR models across a set of trajectories M sim which have been created using the modified PEPR model with P travel = .25,we see a similar relationship between r • f and τ to that in our real trajectories M real .When we run SEIR models across trajectories which have been created using the original PEPR model (with P travel = 1), we do not see this relationship.sampled users in Senegal, and after the preprocess, we have 173,000 users and 173 cells in Dakar region during two weeks of January, 2013.We subselect for users who appear at least 200 times in the dataset to ensure that we have adequate information about their trajectories over the two weeks.Data preprocessing.The X-Mode data from NYC is generated on an very fine spatial and temporal scale, with exact latitude and longitude coordinate updates as frequently as every second.The CDR data from Dakar, on the other hand, are generated only for voice calls, text messages or data exchanges and therefore have limited resolution in time.The geographic location of the cell towers and their density determines the accuracy of location measurements through triangularization techniques.Therefore, the trajectories extracted from CDRs constitute a discrete approximation of the moving population M (x; y; t).There are several steps in preprocessing of the data before it can be suitable for use in our analysis, which vary between the X-Mode data and the CDR data.
The main steps for the NYC data are: i) We use density-based spatial clustering of applications with noise (DBSCAN) to group tightly-clustered latitude/longitude pairs in each individual's trajectory into locations [17].If a cluster of at least five latitude/longitude points exists such that no point is more than .0004degrees (about 56 meters) from two other points in the cluster, those points are grouped together as a single location.ii) Each agent is assigned the DBSCAN cluster it visits most as its home location.iii) We drop all locations in the trajectory that have been visited for less than a minimum time τ min = 15min.iv) In order to restrict travel distance r, we calculate distance between locations by the haversine formula, which derives the great-circle distance between two points on a sphere.All locations that are more than r km from an agents home location are removed from their trajectory.v) In order to restrict travel frequency f , for each DBSCAN cluster that an agent visits more than f distinct times (where distinct visits are determined by an agent leaving a location and then coming back to it), we randomly select f visits to include in their trajectory and drop the rest.
The main steps for the Dakar data are: i) We view each cell tower as a different location in the city.ii) For each person, we determine the home location as the cell tower location which has been visited for the most cumulative time.By summing over all days in a given time window, one can find the home cell with high level of confidence for the majority of subjects.iii) We drop all locations in the trajectory that have been visited for less than a minimum time τ min = 10min.iv) In order to restrict travel distance r, we calculate distance between cell towers by the haversine formula, which derives the great-circle distance between two points on a sphere.All cell towers that are more than r km from an agent's home location are removed from their trajectory.v) In order to restrict travel frequency f , for each location that an agent visits more than f distinct times (where distinct visits are determined by an agent leaving a location and then coming back to it), we randomly select f visits to include in their trajectory and drop the rest (excepting visits to an agent's home location, which are not restricted).
The duration of stay, frequency, and distance criteria on defining cell visits yields a list of cells visited by that subject over the study period for a given frequency restriction f and distance restriction r.Simulation details.We run an agent-based SEIR, SIR, and SI models with N = 10, 000 agents, 5% of which are initialized to be infected (1% in the SI and SIR Dakar models).Each agent is assigned the trajectory of a real person from our dataset, with location updated every 900 seconds (15 minutes) for the NYC simulations or every 600 seconds (10 minutes) for the Dakar simulations.At each time step, each user's location is updated according to their assigned trajectory and infection status is updated according to the following parameters, drawn from Chen 2020 [7]'s estimates of R0 = 3.58, incubation period = 5.2 days, and infection period = 5.In addition, we reproduce our results with disease parameters estimated for the more highly-contagious Delta variant of COVID-19 (β = 1.42, all other parameters remain the same, [18]) and the 2009 H1N1 influenza strain (β = .913,γ = 1.6, σ = 1, [19,20]).Finally, let I local and N local be the number of infected agents and total agents within a 190 meter radius of the agent's current location for the NYC data or within the same cell tower location for the Dakar data.Then, transition probabilities are: In the SIR and SI models, if an agent becomes infected on a given day, they will become contagious at the start of the next day.Quantifying dispersion.We use the M function developed in [8] to quantify spatial dispersion of disease in our New York City simulations for a given r, f .The M function is calculated as follows: for each infected agent I and some radius k, we calculate the ratio between the proportion of agents within k of I which are infected to the proportion of agents in the total population which are infected.Summing this value over all infected agents I and dividing by N − 1, where N is the number of infected agents, gives M (k), the M function evaluated at k.While M is generally analyzed as a function over all reasonable k, we evaluate the M function at a specific k in order to compare spatial dispersion across r • f values at that k, and then show that the relationship is robust to choice of k.
Confidence intervals for M (k) are obtained by Monte Carlo simulation-for a given epidemic size ψ, we randomly assign ψ infections across the population 1,000 times and calculate M ψ each time.By taking the .025and .975quantiles of these simulated M , we form an upper and lower bound on M ψ .It is notable that our empirical M never reaches this confidence band, implying that spatial dispersion is significantly non-homogenous for every k and ψ.Robustness to model parameters.Here we demonstrate that the universal τ (r•f ) curve is robust to changes in the parameters R 0 by running our simulations with estimated transmission parameters for the 2009 H1N1 influenza strain (β = .913,γ = 1.6, σ = 1 [19,20]) and the Delta variant of COVID-19 (β = 1.42, [18]).Figure 6 shows that, with these parameters, the τ (r • f ) relationship still holds.Preferential return model.The preferential return model proposed in [6] is based off of that proposed in Song et al. [21].With some probability P new , agents return to a location they have already visited; with probability 1 − P new they visit a new location with distance drawn from the empirical distance distribution and direction drawn uniformly at random.Waiting times between trips are also drawn from the empirical distribution.The probability p is a function of the number of locations already visited: where S is the number of locations visited.The parameters ρ and γ are fit to the real data using least-squares regression (using the NYC dataset, we find ρ = .500and γ = .267.) We add the following modification: while in the original PEPR model agents travel at each timestep, we set some probability P travel with which agents travel at any given timestep.Conversely, with probability 1 − P travel , agents stay home.The simulation results reported here are from running our modified PEPR on a unit square for 500 timesteps.

FIG. 1 :
FIG. 1: Epidemics sizes for SEIR model.Top row: plots of epidemic spread speed in units of 10-minute increments against maximum allowed travel distance r for different max travel frequency f .Bottom row: top row plotted against r • f .Each data point is the average of 5 simulations.For other simulation details see Methods.R 2 values for best-fit lines are .981(NYC) and .902(Dakar).Best-fit line parameters are a = −0.08,b = 2292.58(NYC) and a = −0.01,b = 1853.57(Dakar).
(a) Relationship between frequency restriction and dispersion for radius = 700m.(b) Scaling collapse for radius = 700m.R 2 value of best-fit line is .97.(c) Holding r constant but restricting f , we can see that loosening restrictions on f increases spatial dispersion of infections-when f is at 1 (left), infections reach 59.0% of grid cells; when f is relaxed to 6 (right), infections reach 73.8% of grid cells.(d)In the simulations above, r and f are individually different but r • f is the same.The similar spatial distribution of infections in the two simulations reflects the scaling collapse in dispersion of infection: dispersion depends on r • f as opposed to just r or f .Here, color represents proportion of the grid cell population that is infected at the end of the simulation.

FIG. 2 :
FIG. 2:Scaling collapse of spatial dispersion of infections.Like τ , spatial dispersion M shows a universal scaling collapse with the product r • f , as demonstrated in panels (a) and (b).Here, M is calculated using a radius of k = 700 meters.This is visually represented in panel (c), where we see the final spatial distribution of infections for two simulations with the same r but different f , and in panel (d), where we see the final spatial distribution of infections for two simulations with the same r • f but different r and f .

FIG. 3 :
FIG.3: Relationship between spreading speed and number and variance of contacts.Both mean number of contacts k and mean squared number of contacts k 2 of the agents in our dataset show a clear logarithmic relationship with exploration velocity r • f .The relationship between r • f and k , k 2 can predict spreading speed τ using a simple relation that is a known result of the SIR model.
(a) Relationship between distance restriction and τ in PEPR simulations with P travel = .25.R 2 value of best fit line is .875.(b) Relationship between distance restriction and τ as predicted by k , k 2 in PEPR simulations with P travel = .25.R 2 value of best fit line is .924.(c) Relationship between distance restriction and τ in PEPR simulations with P travel = 1.R 2 value of best fit line is .640.(d) Relationship between distance restriction and τ as predicted by k , k 2 in PEPR simulations with P travel = 1.R 2 value of best fit line is .480.
8 days: β = daily transmission parameter = 3.58 5.8 = .617σ = daily rate at which an exposed person becomes infective = 1/5.2γ = daily recovery parameter = 1/5.8Let s be the number of time steps in a day.We then transform the above daily parameters into timestep parameters as follows: β * = time step transmission probability = β/s γ * = time step recovery probability = 1 − s √ 1 − γ σ * = time step probability that an exposed person becomes infective = 1 − s √ 1 − σ

( a )
Relationship between distance restriction and τ in PEPR simulations.(b) Relationship between distance restriction and τ as predicted by k , k 2 in PEPR simulations.

FIG. 8 :
FIG.8: PEPR simulation results, where P travel = .40.When we run SEIR models across a set of trajectories M sim which have been created using the PEPR model with P travel = .40,we see a similar relationship between r • f and τ to that in our real trajectories M real .