Impact of origin-destination information in epidemic spreading

The networked structure of contacts shapes the spreading of epidemic processes. Recent advances on network theory have improved our understanding of the epidemic processes at large scale. The relevance of several considerations still needs to be evaluated in the study of epidemic spreading. One of them is that of accounting for the influence of origin and destination patterns in the flow of the carriers of an epidemic. Here we compute origin-destination patterns compatible with empirical data of coarse grained flows in the air transportation network. We study the incidence of epidemic processes in a metapopulation approach considering different alternatives to the flows prior knowledge. The data-driven scenario where the estimation of origin and destination flows is considered turns out to be relevant to assess the impact of the epidemics at a microscopic level (in our scenario, which populations are infected). However, this information is irrelevant to assess its macroscopic incidence (fraction of infected populations). These results are of interest to implement even better computational platforms to forecast epidemic incidence.

used to estimate the data-driven origin-destination patterns, the metapopulation model, and the comparison of the different routing flow strategies compared to the data-driven scenario. To conclude we present a discussion about the convenience of using origin-destination data-driven flows in the assessment of epidemic spreading.

Estimation of an Origin-Destination matrix. For most kind of analyses in transportation networks,
there is a need for origin-destination (O-D) matrices, which specify the travel demands between the origin and destination nodes in the network. Here we wonder up to which point the estimation of O-D matrices are essential factors to determine the outcome of a certain epidemic process that uses the transportation network as the substrate for the carriers. Let us first determine what is the mathematical problem we are facing.
Given a transport network with n nodes and m links, an Origin-Destination (O-D) matrix T is a two-dimensional trip table whose entries t ij represent the number of trips going from origin node i to destination node j. Let v a be the observed trip volume on link a of the network, and let p ij a be the proportion of trips going from origin node i to destination node j that use link a. For the latter proportions we assume that travelers follow shortest-path routes and, in case of several alternatives, any of them is selected at random with equal probability.
The base constraints to be satisfied for an O-D matrix estimated from link counts state that the sum of all the trips crossing a given link must be equal to the link counts observed on that link, and we have to add the lower bounds for the number of trips, ij The system of equations formed by Eqs (1) and (2) has always more unknowns to estimate (n × n O-D cells) than base constraints (m links), the only exception being the complete networks (cliques); in fact, most real networks are sparse, thus the number of unknows is much larger than the number of equations, n m 2 . Given the possibility of existence of multiple solutions, additional considerations to select a preferred O-D matrix are needed. Several approaches have been traditionally used to solve this problem, from which we have selected a maximum entropy model that estimates the most likely trip matrix consistent with observed link counts. The application of the entropy maximization principles to the O-D estimation problem was initially proposed by Willumsen 19 and Van Zuylen 20 . In the maximum entropy approach, the most likely trip matrix is the one having the greatest number of microstates associated with it, what is equivalent to estimate an O-D matrix that adds as little information as possible to the knowledge contained in the link counts.
Let t be the total number of O-D trips traversing a network. Then the number of ways, defined as entropy, in which t trips can be divided into groups of t ij trips can be computed as To be able to compute the derivatives needed to optimize the objective function (3), we take first the natural logarithm of the function and make use of Stirling's approximation of the factorials, ln(t!) ≈ t ln(t) − t, thus obtaining a more computationally convenient objective function: Assuming that t is constant and changing the sign of the function, the first two terms can be dropped and the goal becomes to minimize the function Epidemic spreading model. The epidemic spreading model we use is based on the well-known reaction-diffusion 11 and metapopulation 12 models, but incorporating a routing scheme for the moving individuals similar to those in 18,21 . The main ingredients are the sites or nodes (e.g. cities or airports), where the individuals of the global population are located, and the links between these nodes, which represent the communication channels (e.g. roads or airways). Each link has an associated weight which accounts for the traffic, the number of individuals moving between nodes in a certain amount of time. The contagion model we have chosen is the standard susceptible-infected-removed (SIR) one, a good proxy for diseases in which the individuals acquire immunization after being infected. In this model individuals may be in three different states: infected (I), those who have got the disease; susceptible (S), healthy but which may become ill by contacts with infected individuals; and removed (R), once they have recovered from the disease and become immune to it. The dynamics of the epidemic spreading consists of two alternating phases, a reaction in which the individuals in the nodes' subpopulations merge, and a diffusion of some individuals through the communication channels. In the reaction step the subpopulation is considered as well-mixed, i.e., every individual is in contact with the rest in the same node, and here is where the contagions and recoveries take place. There are no contagions between individuals at different nodes, but the disease spreads due to the mobility in the diffusion step.
In the standard metapopulation model 12 the diffusion is just a Markovian process, where individuals move to neighboring nodes with a probability proportional to the weight of the link; we call this a Random Diffusion (RD) process. A more realistic mobility is obtained if every individual has a "home" node, it travels to a selected destination, and finally comes back home after spending some time in the destination. It would be possible to choose uniformly Random Destinations (RU), however it is more plausible a selection proportional to the strength of each possible target subpopulation as in 21 , what we call here the Strength Proportional (SP) scheme. Finally, the information in the O-D matrix defines a new diffusion model (OD), since it directly tells the number of trips from any origin to any destination.
In all the diffusion models where the destinations are not bound to be neighbors of the origin, it is necessary to establish the paths followed by the travelers, i.e., a routing algorithm. We have decided just to select the shortest path (in number of hops) and, if it is not unique, one of them is chosen at random with equal probability. It is possible to improve this routing algorithm taking into account geographic and economic constraints that restrict the feasible available paths, e.g. travel times and costs; however, this would only affect the availability and priority of paths, not the results and conclusions from this work.
Microscopic analysis. First, we analyze the results obtained from our metapopulation spreading models at the microscopic level of nodes and links. We make use of the World Air Transportation Network (ATN) 22 , considered as a directed and weighted network. For the calculation of its O-D matrix, we have only considered trips of length lower or equal to three, both for computational reasons and also because longer trips are rarely observed in real life (see Methods). All the presented results come from averages over 150 randomly chosen initial conditions in the Monte Carlo simulations. In all the cases, the mobility parameters have been chosen to ensure that at every time step the average number of circulating individuals is the same for all the mobility strategies. See Methods for a full description of the implementation details of our metapopulation epidemic spreading dynamics.
In Fig. 1 (1): the O-D matrix satisfies them explicitly as a result of a constrained optimization problem, and the Random Diffusion strategy satisfies them implicitly by the Markovian diffusion process that takes into account the weights of the links. On the contrary, the routing strategy used in the Strength Proportional approach sends travelers through shortest paths without taking into account the weights of the links. Therefore, according to the observed fluxes, we can conclude that travelers pass through the same airports independently of the diffusion strategy used, albeit the routes followed are completely different.
Although the Random Diffusion and the Strength Proportional strategies show different routing dynamics, it should be analyzed whether these differences are relevant for the epidemic spreading process. With this purpose, we compute the fraction of recovered individuals R at each node (remember that a node is a population) for both diffusion strategies, and we compare them to the ones obtained with the O-D flows. We can see in Fig. 2 the distribution of R for the three mobility schemes. We observe that, the larger the population size in the node (which is proportional to its strength, see Methods), the larger the incidence of the epidemics, with a non-linear dependency and an important dispersion between nodes of similar strength. The three schemes show a similar distribution profile, thus we need a more direct comparison of incidences node to node. Since the estimated errors on the mean incidence values at each node are mostly below 0.03 (see Fig. 2), differences in the incidence about or larger than 0.1 (in absolute value) can be considered as statistically significant. In the left plots of Fig. 3 we can observe an increasing difference as a function of the out strength of the node up to values about 10 4 , and then the differences start to decrease. The right plots of Fig. 3 show the fraction of recovered individuals R from the Random Diffusion and the Strength Proportional strategies comparing them with respect to the ones obtained with the O-D flows. Again, we do not observe any difference between the patterns followed by the Random Diffusion and the Strength Proportional approaches, but it has become evident that the routing strategies greatly affect the epidemic spreading at the level of nodes. Therefore, the availability of O-D information is completely necessary to be able to have good predictions of the microscopic incidence of the epidemics.

Macroscopic analysis.
According to the results presented in Figs 1 and 3, the different routes that travelers follow in the Random Diffusion and the Strength Proportional strategies show the same patterns of discrepancies when we compare them to the O-D flows in epidemic spreading terms at the microscopic level. Next, we analyze whether these microscopic differences have any significant effect at the macroscopic level.
In Fig. 4 we show both the fraction of recovered individuals R and the fraction of subpopulations that experienced and outbreak D/V as a function of the traffic generation rate λ. The results show that, especially for the fraction of recovered individuals, the deviations observed between the O-D matrices and the Random Diffusion and Strength Proportional strategies are minimal at the macroscopic level. However, the Strength Proportional strategy seems to predict a slightly lower value of D/V than the other two strategies, with a deviation which increases for high values of the mobility.
Analyzing the effect of the reproductive number at the macroscopic level gives similar results. In Fig. 5 we show the fraction of recovered individuals R and the fraction of subpopulations that experienced and outbreak D/V as a function of the reproductive number R 0 . Taken together the results of Figs 4 and 5, they confirm that only minimal differences are present between the three diffusion strategies in terms of epidemic spreading at the macroscopic level. In the same figure we also show that this behavior observed for the Random Diffusion and the Strength Proportional strategies cannot be generalized to the Random Destination strategy, which does not take into account the weights of the links in the network. The Random Destination strategy shows a slight reduction in the fraction of recovered individuals, but an enormous overestimation of the fraction of infected subpopulations, thus discarding it as a feasible mobility method.

Discussion
We have analyzed the influence of the three different mobility models in a metapopulation model of epidemic spreading. Two of them, the Random Diffusion and Strength Proportional routing strategies, have been traditionally used in the simulation of epidemic spreading processes. Here, we have compared them against the more realistic O-D matrix scheme, which is able to capture not only the observed fluxes through the links of the network, but also the fraction of individuals moving between each origin-destination pair. We have shown how the origin-destination matrix can be calculated, and the differences between using these three different models. Although the three methods show equivalent fluxes per node, the Strength Proportional clearly deviates at the level of fluxes per link, thus indicating the routes that travelers follow are completely different. These differences have an important impact on the epidemic spreading at the microscopic level of nodes, with significant departures in the incidence of the epidemics in the subpopulations at each node, and being especially important for nodes with intermediate values of the output strength. The observed deviations affect both the Strength Proportional and Random Diffusion models when compared with the more reasonable O-D matrix scheme, and with a very similar deviation pattern.
When we consider the epidemic spreading at the macroscopic level of the whole World ATN network, we realize the differences almost vanish, thus making the Random Diffusion, the Strength Proportional and the O-D matrix approaches basically equivalent. This means it is safe to use any of the three mobility models if the interest just lays on the global incidence of the epidemics, but care must be taken when the details are needed at the level of nodes, where only origin-destination information is able to provide the desired quality of the predictions.  With this final formulation of the objective function E c , the O-D matrix estimation problem becomes a separable convex optimization problem that we have solved using MOSEK 24 , a software especially designed for large-scale mathematical optimization problems.
Epidemic spreading model implementation. The epidemic model is implemented as follows. Each simulation starts with a small fraction of infected individuals, I 0 . Namely, we randomly choose a small fraction of sites (less than 1%) and within these subpopulations only the 1% of the individuals is infected, ensuring that the condition > μ β I 0 is fulfilled, where β and μ are the infection and recovery rates, respectively, and = β μ R 0 is the so-called reproductive number. In the simulations the diffusion and reaction dynamics have the same time scale so, at each time step, first a movement step is performed and then the SIR dynamics takes place.
In the diffusive phase, for each possible pair of origin-destination nodes (i, j), the number of individuals starting a trip from node i towards node j is given by a binomial distribution with parameters the subpopulation size N i (t) and the probability λ = ∑ p ij t t ij k ik , where 0 ≤ λ ≤ 1 is a mobility rate to distinguish between different traffic regimes, and t ij is the (i, j)-th entry of the O-D matrix. To simulate the Strength Proportional and Random Destinations schemes, the O-D matrix is substituted respectively by a Strength-Destination matrix S whose elements are equal to the input strength of the destination node, = =∑ s w w ij j in k kj ( ) , and by an uniform Random-Destination matrix U in which all the elements equal the unity, u ij = 1. Once the number of traveling individuals and their respective destinations have been chosen, they start a shortest path trip to their destination, making one hop to a neighboring node at each time step. When the individuals arrive to their destinations, they come back to their home nodes, also following a shortest path, with one hop per time step. This travel pattern yields and almost stationary size of the subpopulations for all the mobility schemes, provided the mobility rate is not too large.
After the diffusive step, the reaction dynamics is evaluated. As a proxy of the initial population of each node N i (0) we use node's strength w i calculated in the original ATN data 22 , which amounts a total of . ⋅ 5 82 10 7 individ- uals in the network. The distribution of subpopulation sizes (i.e., strengths) approximately follows a power-law distribution, with sizes ranging between 10 and . ⋅ 3 05 10 6 individuals. When time goes on, N i (t) changes according to the chosen destination selection scheme and routing strategy. Within the nodes, one step of a SIR process takes place, supposing a well-mixed population. The state of every individual inside a node i is modified according to the following probabilities: a susceptible individual becomes infected with probability , and population sizes S i (t) and I i (t) of susceptible and infected individuals, respectively. Note that in this scenario, R 0 only participates in the internal nodes' dynamics; individuals traveling through node i are involved in the epidemic dynamics and thus they can change their state while at node i. Finally, simulations end when the stationary state I(t) = 0 is reached.