Scaling of contact networks for epidemic spreading in urban transit systems

Improved mobility not only contributes to more intensive human activities but also facilitates the spread of communicable disease, thus constituting a major threat to billions of urban commuters. In this study, we present a multi-city investigation of communicable diseases percolating among metro travelers. We use smart card data from three megacities in China to construct individual-level contact networks, based on which the spread of disease is modeled and studied. We observe that, though differing in urban forms, network layouts, and mobility patterns, the metro systems of the three cities share similar contact network structures. This motivates us to develop a universal generation model that captures the distributions of the number of contacts as well as the contact duration among individual travelers. This model explains how the structural properties of the metro contact network are associated with the risk level of communicable diseases. Our results highlight the vulnerability of urban mass transit systems during disease outbreaks and suggest important planning and operation strategies for mitigating the risk of communicable diseases.


Introduction
The rapid growth of population and activity intensity in megacities have propelled an evolutional shift of urban mobility from individual-centric travels to sustainable urban mobility.This substantial shift concerns environment, energy consumption, equity, among others [1][2][3][4] .Lying at the heart for promoting sustainable travel is our understanding of the interplay among urban form, transportation system, and human mobility.Research across a diverse stream of studies and data sources have shown the scaling properties of individual mobility [5][6][7][8] : the vast majority of people travel between a few popular locations and their travel distances are bounded by the scale of the city and its transportation systems.
One indispensable component of urban transportation is the mass transit system, which is so far the only sustainable solution to serve urban mobility needs on a large scale.In 2018, public mass transit served over 53 billion passengers worldwide.The three busiest metro systems reached the daily ridership of 9.48 million (Tokyo), 6.49 million (Moscow) and 5.6 million (Shanghai) 9 , respectively.While these systems allow a large number of commuters to travel efficiently, they also result in high population density within close proximity for long durations.These features establish an environment conductive to the spread of communicable diseases [10][11][12] .In particular, pathogens of infectious travelers can migrate to adjacent travelers through droplets and airborne transmissions, resulting in secondary infections during travel.Public mass transit and the underlying travel patterns are becoming an essential catalyst for influenza pandemics and may greatly accelerate the spreading pace of communicable diseases and thus increase the intensity of disease outbreaks in megacities.Despite acknowledging the linkage between human mobility and the spread of infectious disease, current models do not understand the nexus on how physical contacts/encounters among individuals-which in turn enable the transmission of communicable diseases-are created from human mobility.
An attractive approach to address the challenge is to construct the contact networks during travel and then embed the disease percolation process among individual travelers into the contact networks.Recent advances in complex network theories and epidemic modeling have established a striking connection between network structure and disease dynamics 13,14,[14][15][16][17][18] and the epidemic spreading was modeled on various mobility scales including airline network 19,20 , ground transportation network 21 and river network 22,23 .While mobility networks represent mediate channels for epidemic outbreaks, other studies also focused on correlating disease dynamics with direct human interactions as contact networks at activity locations 12, 16, 24-26 .These studies connected the disease percolation with either mobility networks or contact networks, but the linkage on how contact networks are shaped as a function of human mobility is still missing.Meanwhile, such an interplay has significant implications on keeping infectious people from engaging in daily activities and consequently stopping the epidemic spreading from the source.
To close this gap, this study aims to characterize how human mobility shapes contact networks during travel and how it subsequently affects the threshold of disease percolation among individual travelers.In previous studies, contact networks ).For Guangzhou, we find that trips with T > 50min can be well fitted with λ = 13.78min.For Shanghai, trips with T > 50min can be well fitted with λ = 16.59min.For Shenzhen, trips with T > 50min can be well fitted with λ = 13.43min.
were either of high-resolution for small systems (e.g., at conference 24 and school 12,25 ) or of low-resolution for large systems by simulating from survey data 16,26 .Here we construct high-resolution contact networks for city-wide transit systems by leveraging smart card data from three major cities in China: Guangzhou, Shanghai, and Shenzhen (see SI Appendix ?? for a detailed description of the data).We focus on the metro system, the most used public transit mode, to rebuild the contact networks among metro travelers, but the approach is broadly applicable to other transit systems.The three cities are of drastically different scales and distinct metro network layouts (Fig. 1A).Specifically, Shanghai has the largest population, metro system, and the highest metro passenger volume (over 4 million daily records), followed by Shenzhen (2.1 million) and Guangzhou (1.6 million).The metro ridership of the three cities presents highly regular and recurrent patterns during weekdays with prominent travel peaks (Fig. 1B), which implies a large number of daily commuters and repeated metro visits.The large-scale trip data, as the result of intensive daily activities in metro systems, allow us to directly probe the representative mobility patterns of metro travelers in these cities (see Fig. 1C).Despite distinct network layouts, size, and trip demand, the mobility patterns of the three cities are observed to be strikingly similar.We find that there is a large number of travelers with travel time under 50 minutes, and the number of travelers decays exponentially with increasing trip length.This finding holds true across all three cities, with Shanghai having a lower decay rate (16.59min) and the decay rates for Guangzhou (13.78min) and Shenzhen (13.43min) being almost identical.This indicates that trip length in metro systems is bounded by the scale of the metro network, which reflects the scale of a city where trip duration may not grow infinitely as in the scale-free networks.The finding also provides strong evidence to support the universality of human mobility within public transit, where the travel time follows the exponential distribution with the decay rate being proportional to the scale of the city.As physical encounters are driven by human mobility, this motivates us to investigate the possible existence of scaling laws for the contact patterns in public mass transit networks, as the results of the universal mobility patterns.

Metro Contact Network
Smart card data only provide the entry and exit information of a trip.To gain insights on how travelers come in contact with others during travel, we develop a simulation model based on the observed metro network layout, demand profile, and mobility patterns.The simulation constructs high fidelity metro contact networks (MCNs) by first sampling passenger arrivals at each metro station and their trip destinations, then calculating if two individuals will come into contact based on their trip profiles, and finally assigning expected contact duration between each pair of individuals (The detailed description of the simulation is presented in the SI Appendix ??).The inputs to the simulation are the number of travelers (N), the time period of interest, the operational timetable and metro network layout.The simulation then produces a N × N matrix describing the physical contact pattern between each pair of individual travelers.In particular, each positive entry of the matrix denotes the expected contact duration between two travelers within effective transmission range, e.g., two individuals are of close proximity so that the airborne transmission of a communicable disease is likely.For typical droplet transmission, the effective range is less than 3 feet while certain disease such as SARS may reach 6 feet 27 .We then visualize the structure of the MCNs by simulating a sample realization for each of the cities during 8-8:30 AM, and we set the number of travelers to 500 for better visibility of the network structure (Fig. 2A-C).We observe that the MCNs are visually different among the cities, which is due to the differences in metro network layouts.But these MCNs also share several structural commonalities, including local clusters of nodes and the discrepancies of node degree.
To gain better understanding of their structural properties, for each city, we further generated MCNs from 500 nodes to 10 4 and the details are summarized in SI Appendix ??.We observe that there is fewer super nodes in the MCNs as opposed to the scale-free networks.Instead, there are a large number of nodes with low to medium degree.And we further confirm the structural similarities by different quantitative metrics, as shown in Tab. 1.All the MCNs are observed to have high node degree heterogeneity and high clustering coefficient, and have small network diameter and characteristic path length (CPL).These properties are statistically different from the metrics of their random counterparts (see SI Appendix ??), which corroborates the distinct structural properties associated with MCNs.These confirm that MCNs are a type of smallworld network 28 , and this observation leads to significant implications in the context of disease percolation.Specifically, the outbreak of an exceedingly infectious disease may quickly synchronize among all the travelers because of the small-world property, and therefore the metro system becomes highly vulnerable.But unlike many real-world networks, we report that these network metrics are invariant to the size of the MCNs (Tab.??-??), but instead being determined by the metro network layout and human mobility patterns.In this regard, MCNs, like many other real-world contact networks in school, conference sites and major activity locations, should be regarded and studied as the products of the interactions between human mobility and physical infrastructure.
By plotting the degree distributions of the number of contacts and contact duration for metro travelers (Fig. 2D-F), we find an even more remarkable similarity among the MCNs.Despite the differences in metro network layouts, visualizations and statistics of the simulated MCNs, the degree distributions are found to follow a similar distribution across the three cities, and such observation is also valid for different time periods of the day.In particular, the unweighted degree distributions of the MCNs show a large number of nodes of low to medium degrees (e.g., degree smaller than 50) and the node degrees within this range are found to be nearly uniformly-distributed.But with increasing number of contacts and length of contact duration, the tails of the node degree distributions are found to decay exponentially, similar to the observations for metro mobility patterns.In addition, the rates of decay are found to be time-dependent and also differ among the cities.These observations lead to the conjecture of a universal mechanism underlying the contact of metro travelers, and we explore the mechanism in more depth in the following sections.

Disease dynamics in contact network
With the reconstructed contact network, the risk of communicable diseases can be quantified by modeling the dynamics of disease percolation among individual travelers.We introduce an individual-based model (IBM) following 29 .To characterize disease dynamics within the contact network, the classical susceptible-infectious-susceptible (SIS) process is embedded in the IBM over MCNs.Unlike previous studies 13,16 , this framework does not require nodes and transmission between nodes to be homogeneous, which allows us to model heterogeneous infectious rates due to the varying contact duration.Denote the probability that node i is infected at time t as p i,t and the recovery rate as r, we have where q i,t represents the probability that node i is in S at time t, which depends on that all its neighbors j ∈ N (i) are either in S or in I but fail the transmission: In the equation, β i, j = β t i, j represents the transmission rate between node i and j, which takes the product of per unit time disease transmission strength β and the contact duration t i, j .With N such nodes, we arrive at a nonlinear dynamic system (see SI Appendix ??) with two equilibrium states: (1) the disease-free equilibrium (DFE) where all individuals are in S state (e.g., p i,t = 0) and (2) the endemic equilibrium where a positive proportion of individuals are in I state.The asymptotic stability of the DFE relies on the network-specific critical threshold δ that is associated with the largest eigenvalue of the adjacency matrix of MCNs.And we show that the critical threshold is upper bounded by the largest node degree in MCNs as: As a consequence, if δ < 1, the disease is guaranteed to go extinct while the disease may be endemic with δ > 1.Since β and r are endogenous parameters, the risk level that pertains to a specific disease primarily depends on the value of max i ∑ j β i, j .Such finding subsequently builds the essential connection between the vulnerability of public mass transit with the degree distribution of its contact networks and identifies the impact of the structural property of MCNs on disease threshold in transit systems.The observation provides two immediate implications.First, we can verify that the risk level of a MCN is driven by the riskiest individual who has the highest number of contacts or contact duration, which concerns the tail pattern of MCNs unweighted and weighted degree distributions.Second, by removing the riskiest individual, the next riskiest person may have a similar risk level according to the revealed degree distributions (Fig. 2D-J).This highlights the difficulties in improving MCN's vulnerability and stopping the disease during outbreaks.
In addition to the IBM model, we also build an equivalent OD-based mean-field (ODMF) approach that models the disease dynamics on the passenger flow level between each pair of metro stations (SI Appendix ??).Note IBM is computationally expensive due to the construction of MCNs, and the ODMF can be used to approximately probe the system-wide disease dynamics for the real number of metro travelers.

Disease control strategies
The best practice for controlling the disease is to immunize travelers through vaccination and quarantine 30 .We next explore the effectiveness of five immunization strategies with the percentage of individuals immunized as a control parameter.ODbased and station-based approaches represent population control that immunizes a portion of travelers commuting between a pair of stations or originating from a station.These two control strategies are motivated by the observation that fewer than 20% of the stations produce over 80% of travel demand (Fig. 3A), and more than 80% of the metro commuters are associated with fewer than 20% of station pairs(Fig.3B).These findings suggest that population control may yield satisfactory results in reducing the risk level by focusing on populated stations and trip pairs, as these travelers are likely to have more number of contacts.On the other hand, we also consider uniform, targeted and distance based approaches that are individual-centered methods.The uniform strategy immunizes randomly selected travelers, the targeted strategy iteratively removes travelers of the longest contact duration, and the distance-based strategy aims at immunizing travelers of the longest travel time.Specifically, the targeted method is reported to be most effective in the complex network literature 13,15 .The effectiveness of these control strategies is then examined based on the relative risk level (RRL), which measures the reduction in max i ∑ j β i, j with increasing number of immunized travelers.Our results suggest that the most effective method is the targeted immunization followed by the distance-based method 5/11 and the OD based method, and all the three are superior to the uniform immunization.This finding is consistent across the three cities (Fig. 3C-E).For the targeted immunization, we observe a 27% reduction in RRL by immunizing the top 1% riskiest individuals, and a 60% reduction in RRL can be achieved by removing the top 10% riskiest travelers.On the other hand, the station-based method is found to be less effective than the uniform policy in two of the three cities.The primary reason is that populated stations may not be origins of travelers with high number of contacts and contact duration.Unfortunately, it is usually impracticable to identify the risk level of a traveler before the trip, which is the major barrier for the implementation of targeted immunization.As effective alternatives, we may introduce the distance-based and OD-based strategies by tracking the historical trips of an individual from her smart card.The most practical control strategies are the station-based and the uniform strategies, but neither of them is shown to be effective enough.These results highlight the challenges for stopping the spread of the disease in transit systems.It is therefore important to devise better strategies for the operation of metro systems and for improving the structure of the metro networks.

A generation model for MCNs
Here we develop and validate a simple generation model for MCNs.The MCN to be generated is a scale-dependent network where the degree distribution is a function of the total number of travelers in the network.We observe that the travelers' mobility patterns follow the exponential distribution while the contact degree distributions also decay exponentially.Following our discussion on the MCN structure, we hypothesize that (1) contacts are driven by travelers' mobility pattern, and (2) the probability of two travelers getting into contact is bounded by their mobility in the metro network and also the layout of the metro network.We focus on investigating a universal generation model with individuals' mobility pattern as the input and we consider that the number of contacts is proportional to the total travel time t i of each traveler.Recall that the degree distribution is found to vary across time and city.We thereafter introduce two variables: α captures the impact from metro network layout and γ t models the temporal characteristics of travelers' mobility.We consider the expected total number of contacts experienced by N travelers as: Equation 4 accounts for the scale-dependent nature by including N on the right-hand side and αt γ t i determines the rate that a commuter of travel time t i will meet other N − 1 travelers in the system.And t γ t i refers to the rescaled travel time which depends on the temporal trip similarity among travelers.This is motivated by the fact that different time of day will result in riders heading to various destinations and γ t therefore measures how similar their destinations are.Consider M = 2C as the total number of stubs (half-edges) in MCNs, we can derive the probability that a node is of degree k as (see SI Appendix ?? for derivation details and Fig. ?? on empirical evidence that supports M): with w i = 2αt γ t i (N − 1)/M being the probability that a randomly selected stub is attached to node i.Given the pdf in equation 5, the MCNs can then be generated following the configuration model by first sampling the degree sequence K = {k 1 , k 2 , ..., k N } from p(k), then randomly selecting and connecting a pair of stubs until all stubs are exhausted.For each pair of matched stubs between node i and j, we further assign the weight d i j ∝ min(t γ t i ,t γ t j ) as the edge weight and obtain the weighted degree distribution.
To calibrate α and γ t , we perform cross-validation to determine the optimal α for each city and the corresponding γ t at each time interval, with the objective being minimizing the Kolmogorov-Smirnov (KS) statistics between the pdfs of the generated and simulated MCNs for both unweighted and weighted degree distributions (see SI Appendix ??).To validate the correctness of our generation model, we conduct two sample KS tests to compare the cdfs of unweighted and weighted degree distributions between generated and simulated MCNs, with the null hypothesis being that two data samples for comparison are drawn from the same continuous distribution.
The validation results are summarized in Tab 2, and we also visualize the fitting of the generated MCNs in Fig. 4. We observe that for all experiments, we fail to reject the null hypothesis for the two sample K-S test with the lowest p-value among these cases being 0.742.Even this lowest value is way above the significant threshold for rejecting the null hypothesis (0.05), and in most cases the p value is greater than 0.95 for both weighted and unweighted distributions.The statistics along with the goodness of fit in Figure 4 are indicative that the proposed generation function well captures the underlying mechanisms that govern the meetings of passengers and the duration of exposures during their travels in metro systems.More importantly, the validation of the generation model on three cities provides strong evidence for the existence of a universal rule that shapes the contacts among travelers in transit networks.

Discussion
By inspecting the structure of those simulated MCNs, we observe that there is a lack of high degree nodes.This observation is confirmed by the generation model, which can be decomposed as the weighted combination of Poisson pdfs as shown in equation 5. We see that the degree of a node may be drawn from a collection of Poisson distributions with mean Mw i .This explains the lack of high degree nodes in the contact network as compared to the scale-free network with same number of nodes and average degree, since the probability of having large k in a random network diminishes faster than exponential.On the other hand, the generation model also explains why the tails of the degree distributions of MCNs decay slower than a random network.Note that in MCNs, high degree nodes are generated from the Poisson distribution with a large Mw i value, which requires longer trip length t i .The decay of the tails is therefore the convolution of the tail of a random network and the tail of the mobility distribution which decays exponentially.In addition, we derive the expressions for the mean < k > and variance (σ 2 =< k 2 > − < k > 2 ) of the MCNs' degree distributions in SI Appendix ??.We find that < k >∝ N and the variance σ 2 ∝ N 2 , so that both measures diverge with N → ∞ and the variance is of higher magnitude than the average node degree.This indicates that the degree distributions of MCNs have similar characteristics as compared to the exponential distribution and the number of contacts and the contact duration are bounded by the human mobility in the transit networks.Moreover, these findings are aligned with the empirical observations of < k > and < k 2 > in simulated MCNs (Fig. ??), which further strengthens the validity of the developed generation mechanism.One important parameter in the generation model is γ t , where we define γ t as the similarity among the trips.To validate this argument, we also quantitatively measure the trip similarity (see SI Appendix ??) among travelers based on the trip OD matrix Q.The similarity measure is introduced to quantify the strength of overlapping of a particular trip pair on other trip pairs in terms of contact duration and demand level.We then compare the dominant eigenvalues of Q and we use the variance of the dominant eigenvalues to quantify the similarity of trip purposes.In particular, higher variance suggests that the majority of the trips take place on a few ODs and the trip purposes among these riders are more similar.We compare the computed similarity index with the calibrated γ t and the results are shown in Fig. 5.We see that the calculated similarity presents a strong linear relationship with γ t among all three cities, with R 2 value being above 0.77 if we fit a simple linear function to interpret this relationship.These suggest that similarity can be used as a proxy for γ t for prediction purposes.
While it is difficult to devise effective yet practical control strategies, the degree distribution provides valuable insights in improving the resilience of the transit system by controlling how its contact networks are shaped.To reduce the risk of MCNs, it is equivalent to minimize the probability of the MCNs having high degree nodes.Based on equation 5, we know that p(k) is linearly proportional to the number of passengers and the scale of the metro network.Reducing these values will lead to a linear reduction in the average number of contacts while the shape of the degree distribution will remain the same.By observing the metro network layouts, we observe that larger transit network, possibly with more number of lines and transfer stations, may result in lower α.But the data used in our study is not sufficient to explain how we may reduce α and this may require further investigation.Alternatively, efforts can be made to reduce γ t so as to sub-linearly decrease the probability of having high degree nodes and result in the degree distribution that decays faster.This can be achieved by segregating passengers through an optimally designed timetable or advising passengers to distribute their departure time.The ultimate solution, however, lies in the distribution of the human mobility distribution for t i .This will not only reshape how the Poisson pdfs are combined but also change the weight of each Poisson distribution.In particular, we would like to pursue the distributions of t i with faster decaying tails, so that both Mw i and p(w i ) for larger w i values will be minimized at the same time.And this can be realized by changing the layout of the metro network or, ultimately, the urban form itself.We can expect that w i will decay faster by reducing the number of transfers required for the pair of stations of long trip duration, which results in lower maximum trip length and may also contribute to lowering α.As for the urban form, a more decentralized urban structure is the most effective way for reducing the risk of communicable diseases, which implies that people can avoid long distance travels across the city as they can find work or entertainment places closer to their home locations.While both approaches are deemed to be effective, the design of the metro network and urban form is not a sole function of the risk of communicable disease.Thus oftentimes we have to compromise among the disease risk, construction cost, efficiency, and also equity of urban mobility.But the developed model in this study provides an important tool for improving the network resilience without undermining other aspects of the system.One final issue is to identify the group of travelers who experience and introduce high-risk exposure in the transit system.To gain insights on this issue, the correlations among the travel time distributions and distributions of contact durations are plotted and shown in Fig. 6.We see that the travel time and the contact duration are positively correlated and this observation is consistent across all three cities.We can also verify that there is a wide range of travel time for travelers who experience high contact duration in the metro system.In general, the positive correlation suggests that travelers who experience highest contact duration are likely to be those who have the longest travel time.And the travel time of urban commuters is closely related to their work and home locations, their income levels, and eventually their lifestyle and health conditions.One recent study reported that those commuters with the longest travel time in the metro are likely to be low-income migrates, and they may change their home and work locations more frequently than other urban commuters 31 .This finding implies another potential risk in transit networks.If commuters with long travel time overlap with the low-income population, then these people are likely to be more prone to infection during disease outbreaks.Compared to other population groups, low-income people usually have fewer options (such as time off and sick leaves) and may pay less attention to personal health and hygiene due to limited disposable income 32 .Consequently, the riskiest group of travelers in metro systems are likely to be the most susceptible and vulnerable group of people during the disease outbreak.And this may inevitably raise additional challenges associated with disease contagion and equity of travel in urban transportation networks.

Figure 1 .
Figure 1.Summary characteristics of the travel pattern within the metro networks.(A) Spatial layout of the metro networks of the three cities (from left to right: Guangzhou, Shanghai, Shenzhen).(B) Temporal distribution of metro ridership obtained from the smart card transaction data.(C) Probability density of function of metro trip duration T. We observe that the distributions metro trip duration of all three cities follow exponentially decaying tails (p(T ) ∝ e −T /λ).For Guangzhou, we find that trips with T > 50min can be well fitted with λ = 13.78min.For Shanghai, trips with T > 50min can be well fitted with λ = 16.59min.For Shenzhen, trips with T > 50min can be well fitted with λ = 13.43min.

Figure 2 .
Figure 2. Simulated MCNs with 500 nodes and the unweighted and weighted degree distributions of simulated MCNs with 1000 nodes.(A)-(C) visualizes the layouts of the simulated MCNs in Guangzhou, Shanghai and Shenzhen.In the visualization, larger node size reflects higher node degree and the transparency of the link is proportional to the duration of contact.(D),(F) and (H) present the probability density function of the unweighted degree distributions of Guangzhou, Shanghai and Shenzhen.(E), (G) and (J) present the probability density function of weighted degree distributions of Guangzhou, Shanghai and Shenzhen.

Figure 3 .
Figure 3. Distribution of travel demand and the effectiveness of different control strategies for the MCNs of the three cities.(A) presents the probability density function of the trip demand at the station level.(B) presents the probability density function of the trip demand of each pair of stations.(C), (D) and (E) visualize the effectiveness of OD based, targeted, uniform, distance based and station based control strategies for each city.The effectiveness of control strategies is compared with the proportion of trip demand affected by the corresponding strategy.

Figure 4 .
Figure 4. Visualization for the goodness of fit of the generated MCNs as compared to the simulated MCNs from the smart card data.Each row corresponds to the results of the same city and each column corresponds to the results from a particular time of the day.(A) Fitting results of the probability density function for the unweighted degree distributions.(B) Fitting results of the probability density function for the weighted degree distributions.All results are obtained from the average performance of 50 generated MCNs using the optimal γ t and from the average of 50 simulated MCNs.Each MCN has 1000 nodes.All scenarios fail to reject the null hypothesis of the KS test with very high p-value, where the summary statistics of the KS test and calibrated model parameters are shown in Tab.2.

Figure 5 .
Figure 5.Comparison between the calibrated parameter γ t and trip similarity index for (A) Guangzhou, (B) Shanghai and (C) Shenzhen.

Figure 6 .
Figure 6.The correlations between travel time and total contagion duration in MCNs of three cities.50 MCNs are generated for each plot using data from 8:00-8:30 AM, with each MCN having 1000 nodes.(A)-(C) represent results for Guangzhou, Shanghai, and Shenzhen respectively.

Table 1 .
Summary statistics of the generated MCNs with 1000 nodes.The MCNS correspond to the travel pattern during 8-8:30 AM on weekday and the statistics are measured using unweighted MCNs.Results are averaged from 10 random realizations.

Table 2 .
Summary of fitted results and model parameters from KS test for three cities.