Abstract
Commuting network flows are generally asymmetrical, with commuting behaviors bidirectionally balanced between home and work locations, and with weekday commutes providing many opportunities for the spread of infectious diseases via direct and indirect physical contact. The authors use a Markov chain model and PageRanklike algorithm to construct a novel algorithm called EpiRank to measure infection risk in a spatially confined commuting network on Taiwan island. Data from the country’s 2000 census were used to map epidemic risk distribution as a commuting network function. A daytime parameter was used to integrate forward and backward movement in order to analyze daily commuting patterns. EpiRank algorithm results were tested by comparing calculations with actual disease distributions for the 2009 H1N1 influenza outbreak and enterovirus cases between 2000 and 2008. Results suggest that the bidirectional movement model outperformed models that considered forward or backward direction only in terms of capturing spatial epidemic risk distribution. EpiRank also outperformed models based on network indexes such as PageRank and HITS. According to a sensitivity analysis of the daytime parameter, the backward movement effect is more important than the forward movement effect for understanding a commuting network’s disease diffusion structure. Our evidence supports the use of EpiRank as an alternative network measure for analyzing disease diffusion in a commuting network.
Similar content being viewed by others
Introduction
In light of the presence of network structures in most transmission processes^{1}, topological network structures have utility for understanding the spread of messages^{2,3,4}, diseases^{5,6,7,8,9}, computer viruses^{10,11}, innovative ideas^{12,13}, human movement^{14,15} and rumors^{16,17}, among other ideas and objects. Researchers have used social networks to study the contagious nature of obesity and emotion cognition^{18,19,20}, as well as the spread of violence and wars via location network structures^{21,22}. Some of the networks underlying these transmission examples are formed by human interaction, others by computer or mobile device connections, and still others by spatial links. Despite differences in physical meaning and mechanisms, these networks share the features of nodes representing transmission endpoints and links representing intersections where transmission occurs. Accordingly, transmission examples can be analyzed by conceptualizing endpoints and connections within the topological structure of a network.
Social scientists have utilized network topologies to model human connectivity in the form of social relationships, disease networks, and many other types of systems and complexes, using combinations of real world data and topological structure observations to analyze interactions between individuals and transmission characteristics^{2,23}. Topological structure is now considered a key feature in the understanding of social network transmission events, especially differences in interaction behaviors based on different node and link types^{24}. However, node interactions are subject to oversimplification as a binary variable (i.e., two nodes either do or don’t interact) at the expense of inspecting differences in quantity, interaction strength, and node influence. Clearly, having detailed information on nodes and interactions is central to understanding transmission processes in networks.
Since direct and indirect forms of physical interaction (e.g., coughing, sneezing) can transmit influenza viruses and other diseases, they are considered key components in both novel disease and seasonal influenza outbreaks^{6,25}. Traditional diffusion models such as SIR are used to evaluate epidemic status factors such as infection threshold, infected and potentially infected individuals, and mortality rate, but they are not useful for addressing underlying human interaction in terms of differences in the ability to infect others. Accordingly, SIRlike mathematical models can help identify temporal development trends but not the effects of human movement, therefore they cannot be used by local or national health authorities to make disease control decisions based on spatial considerations. This is an important shortcoming because understanding human movement is key to controlling the spread of infectious diseases.
In spatial epidemiology studies, human movement is considered key to understanding disease spread and diffusion^{26,27}. Grais et al.^{28} and Hufnagel et al.^{29} have used commuting flow to construct network structures for analyzing the influences of human movement on infectious disease diffusion, based on the belief that human movement and interaction (especially direct or indirect physical contact) support the spread of disease pathogens. Another way of describing pathogen movement is as a side effect of human movement, which allows for a social network perspective^{30,31} in which a transportation or commuting network serves as the key movement capturing factor^{28}. Among infectious diseases, flu viruses (caused by droplets and physical contact) and enteroviruses (physical contact) have been shown to be strongly affected by transportation networks^{32}.
Past human movement and disease diffusion process studies have generally focused on movement in one direction, usually from homes to workplaces or schools^{33,34,35}. However, individuals in commuting networks move bidirectionally between their homes, offices, and schools on a daily basis. As shown in Fig. 1, disease transmission can also occur during movement to satellite locations during nonwork hours. Accordingly, even though the number of individuals moving from core to satellite areas may be small, the number of individuals moving from satellite to core areas can be high. Such individuals may become infected while working in a core area and carry disease pathogens to satellite areas. In this study we will refer to these movements as “forward flow” (movement from homes to workplaces or schools) (Fig. 1a) and “backward flow” (movements toward residences) (Fig. 1b)^{36,37}.
The use of bidirectional movement for analytical purposes differs from the conventional use of two separate links moving in opposite directions, which indicate travel from places of origin to other locations for work or school—that is, both represent forward flow (Fig. 1a). In such scenarios, weights and flows in different directions can be considered as independent or asymmetric. Referring to the Fig. 1a example, large numbers of individuals move toward central areas (heavy arrows) and much smaller numbers move toward satellite areas (light arrows); the evening pattern is shown in Fig. 1b. The two figures are representative of a conventional daily commuting network cycle based on a sense of bidirectionality that is key to modelling disease diffusion patterns.
Our study goals are to model the network transitive effect of this bidirectional movement and to design an algorithm for measuring the associated epidemic risk, using a Markov chain model to capture and measure the transitive effect. The PageRank algorithm^{38} was modified to create a new algorithm called EpiRank to capture bidirectional movement. PageRank is the most popular algorithm using the Markov chain model in transitive effect studies^{38}. Spatial network researchers have tried to capture transitive patterns in human flow networks^{14,39,40}, but to our knowledge none of them have included the concept of bidirectional movement in their calculations.
For this study we used two infectious diseases spread by droplets and physical contact (the H1N1 flu virus and Type 71 enterovirus, hereafter referred to as “flu” and “EV”) to test the ability of EpiRank to capture the influence of a bidirectional commuting pattern on disease dissemination in a commuting network in Taiwan. Commuting flows across township boundaries were used to construct the EpiRank origindestination (OD) matrix. Since disease control operations generally use administrative boundaries as spatial units, our EpiRank analysis is based on commuting movement between townships.
Methods
Commuting data
Taiwan census data for year 2000 contains information on the residential and workplace townships of all surveyed individuals^{41}. Assuming that commuting behaviors did not change dramatically during the ensuing decade, we used the 2000 census to extract commuting patterns on the main island of Taiwan. Spatial data covering 353 townships were used to construct a daily flow network and to create an asymmetrical square OD matrix consisting of 353 origin and 353 destination townships. Of the 21,335,199 people residing in these townships, 3,906,663 (18.31%) were identified as intertownship commuters. Since repetitive daily commuting movements reveal the basic flows of movement patterns, their influences on disease diffusion are relatively stable compared to the movement of individuals for other purposes.
The incoming and outgoing statisticalplusspatial urbanization pattern data used in this study are shown in Fig. 2, with Fig. 2a. specifically showing degree of township urbanization. Liu et al.^{42} describe seven township levels in Taiwan. For visualization purposes we collapsed four to create three: urbanized (highly and moderately urbanized cities), regular (emerging and regular towns), and rural (agricultural and remote areas). Concentrated urban areas are Taipei (north Taiwan), TaichungChanghua (middle west coast), and TainanKaohsiung (far southwest). Figure 2b shows origin and destination township numbers, and Fig. 2c shows the numbers of commuters entering and leaving townships for work or school. Note that the Fig. 2c data have a skewed distribution with low commuter numbers (50,000 maximum). Figure 2d shows inout ratio logarithm data in descending order. Log ratios >0 (106 townships) indicate a large number of incoming and small number of outgoing commuters, and log ratios <0 (245 townships) indicate larger numbers of individuals leaving than entering townships. As shown, most townships have negative log ratio values, meaning they “push” commuters to larger urban areas; townships with positive log ratio values are said to “pull” commuters. The longest commuting distance for the 3,906,663 individuals examined for this study was 30 km (Fig. 2e), with 90% travelling less than 18 km and 80% less than 14 km (Fig. 2f). Compared to many other countries, commuting distances in Taiwan are much shorter. Frequency distribution data for commuters living and working in the same township are shown in Fig. 2g. For all townships considered in this study, an average of 84% (SD 7%) of each population resided and worked in the same township.
Disease data
The two infectious diseases used for making comparisons with actual case distributions were the H1N1 influenza virus and the type 71 enterovirus (EV). Data were collected from the Taiwan National Infectious Disease Statistics System maintained by the Taiwan Center for Disease Control, a database with records for various types of notifiable diseases going back to 1999. We gathered data for 1,129 H1N1 cases reported in the 353 townships in 2009. Epidemic risk was defined as the number of flu cases in each township. For EV we used data for five years in which at least 100 severe cases were reported—2000 (291 cases), 2001 (393), 2002 (162), 2005 (142) and 2008 (373). Epidemic risk was defined as the average number of EV cases per year per township.
Figure 3a,b show frequency distributions for the two diseases, and Fig. 3c,d show corresponding epidemic levels according to flow patterns (inout ratio logs). The figures show skewed distributions for both diseases (from lefttoright: noncore, coreIII, coreII and coreI). The skewed patterns generally suggest lower numbers of flu and EV cases in most townships, and only a small number of townships with high epidemic risk. Head/tail breaks^{43} were used to differentiate among epidemic risk levels across townships (three breaks per disease). This method is useful for separating data into various groups when frequency distributions are exponential or skewed. Specifically, head/tail breaks were initially used to separate frequency distributions according to the means of whole data sets, with iterated means expressed on the right side of each division. The second row figures show the distributions of four epidemic levels. According to these distributions, the coreI to coreIII townships were a combination of pull and push types—in other words, they were located on both sides of the dotted line demarcating a zero log ratio.
Spatial distributions for the four epidemic levels for each disease are presented in Fig. 4. As shown, flu cases were more concentrated in sections of the Taipei metropolitan area, while EV cases were concentrated in Taiwan’s four largest cities: Taipei, Taichung, Tainan and Kaohsiung.
The EpiRank algorithm
The Markov chain model and PageRank were used to create an algorithm for considering both forward and backward movement in commuting networks to calculate epidemic risk in individual townships due to a transitive effect. An example of commuting flow between four regions is shown in Fig. 5. Using region A as an example, the potential risk of disease spread is associated with three infectious paths: (a) the 20 individuals from B and 50 from C who travel to A for work (incoming commuters), (b) the 20 who leave A and travel to B or C and return to A in the evening (returning commuters), and (c) the 50 who work or attend school and live in the same region, thus increasing the potential for intraregion infections (local commuters).
To describe how EpiRank functions, assume a commuting network G(v,e) with v = n and e = m, indicating n townships and m commuting relationships between townships. For any township i, an EpiRank value ER(i) emerges from a large t number of iterations involving the following equations:
where ER(i)^{t+1} and ER(i)^{t} are the ER values for region i in iterations t + 1 and t, respectively; d is the effect of the network topological structure on epidemic risk within a range of 0 and 1 (0.95 default value in this study); (1 − d) captures the disease spreading effect associated with local environmental factors rather than network topological structure; exFac(i) is an unspecified external effect influencing epidemic risk (1/n default value in this study); FW(i)^{t} and BW(i)^{t} are the effects of forward and backward movement, respectively; and daytime is a 0−1 parameter controlling differences between FW(i) and BW(i), with a higher daytime value indicating more forward than backward movement and vice versa. The OD matrix records connectivity from residential (rows) to work townships (columns). OD_{j,i} indicates flow from regionj (origin) to regioni, and ∑_{k} OD_{j,k} is the total number of commuters originating from regionj.
The above equations illustrate the EpiRank transitive effect and calculation processes for each township. An EpiRank value for the entire network is calculated using a matrix multiplication approach involving three equations:
where ER^{t+1} and ER^{t} are ER distribution values for iterations t + 1 and t, respectively, in a n × 1 matrix; ER^{0} is the initial status, set to 1/n; exFac denotes external factors that may affect ER value (factors taking the form of a n × 1 matrix with a default value of 1/n for each cell or standardized column, indicating differences in external factors between townships); FW^{t} and BW^{t} are the effects of forward and backward movement; W is a standardizing matrix for each column in the ODmatrix; and W^{T} is a standardizing matrix following transposition.
Since the calculation concept follows the Markov chain mechanism and PageRank calculation procedures, the summation of all ER values (representing random commuter distributions within each analyzed network in each iteration) equals 1.
The following matrix shows calculations based on the Fig. 5 example. ER values during iteration t are shown as a, b, c and d. The EpiRank value for the four areas is the summation of BW^{t} and FW^{t}. FW^{t+1} is calculated as
A similar procedure with a transposed standardized column matrix was used to calculate BW^{t+1}. BW^{t+1} and FW^{t+1} are both n × 1 matrixes.
In these calculations, FW^{t+1} uses an OD^{T} standardized column matrix (W^{T}) and BW^{t+1} uses an OD standardized column matrix (W), since the matrix multiplication process in the Markov chain model puts the flow matrix (n × n) at the front, followed by the previous ER value from iteration t (an n × 1 matrix). To ensure that the matrix multiplication produces the same results as the abovedescribed EpiRank Eq. 1 for a single township, the matrix columns should represent origin townships and matrix rows destination townships (a transposed ODmatrix). For directed links, bidirectional flows consist of the same groups of people. Differences in the forward and backward effects on links involve direction movement and a denominator. For the forward effect the denominator is the total number of individuals leaving from the location of origin; for the backward effect the denominator is the total number of individuals leaving from the original destination.
Using the example shown in Fig. 5, for the forward movement assume 50 commuters moving from C to A. The sum of the C column in the transposed OD matrix (OD^{T}) is 120 (meaning 120 commuters leave C), hence the influence from C to A is 50/120 the current risk status in C. For the backward movement, the 10 commuters returning from C to A are the same 10 who moved from A to C in the morning; accordingly, the sum of the C column in the pretransposed OD matrix is 190, meaning that 190 individuals travelled to C in the morning and left C in the evening, making the CtoA influence 10/190 the current C risk status. In summary, the total influences on A resulting from forward movement is daytime × (5/7(a) + 2/11(b) + 5/12(c) + 0(d)), and the total influence on A resulting from backward movement is (1−daytime) × (5/12(a) + 1/11(b) + 1/19(c) + 0(d)). These calculations can be performed using the abovedescribed matrix multiplication process.
Continuing with the above example, EpiRank value calculations for all iterations are shown below, with daytime set to 0.5, d to 0.95, and exFac for each region set to (1/n). Results from lefttoright are for t = 0, 1, 2, 3, 4, 5, …, 47. If precision is set to 3 decimal places, the Markov chain model enters a steady state at t = 5; if precision is set to 8 decimal places, it enters a steady state at t = 47. In this example, EpiRank values have the highest concentration at c, followed by a, b and d.
In summary, we modified the PageRank algorithm to include bidirectional movement to capture pathogen infection risk within a topological network structure involving both forward and backward movement. We separated EpiRank values into two parts (FW and BW) for Markov chain model calculations, and integrated them using a daytime parameter to control their weighted effects. In addition to movement between regions, we also included movement within each region, as indicated by the diagonal lines in both of the Fig. 5 OD matrixes. The added exFac can be used to consider the collective effects of other factors that are not related to network topological structure. exFac is integrated into calculations via a damping factor (the (1−d) in Eqs 1 and 4), with a default value of 1/n for all regions, indicating an assumption of equal distribution across all townships.
Results
This study consisted of three parts. In the first we compared risk distribution according to three daytime settings (0, 0.5 and 1) respectively representing the transitive effects of backwardonly, bidirectional, and forwardonly movement. In the second part, EpiRank results were compared to actual data for two infectious diseases. In the third part we compared bidirectional EpiRank results with those produced by three other network indexes, and tested the sensitivity of daytime and damping factor parameters.
Transitive effects of forward, backward and bidirectional movement
As stated in an earlier section, a 0 daytime setting indicates that only backward movement is being considered (with the FW effect completely removed from calculations) and a 1 daytime setting indicates that the BW effect is completely ignored in favor of forward movement. A 0.5 setting indicates equal weights for forward and backward movement in EpiRank calculations. Results from those calculations are shown in Fig. 6 (frequency distributions) and 7 (spatial distributions). Since the Fig. 6 data are skewed, we used head/tail breaks to group townships as noncore, coreIII, coreII and coreI, similar to the procedure for grouping the two diseases. The most skewed pattern among the three results is found in Fig. 6c, indicating more noncore townships and fewer coreI townships.
In Fig. 7, coreI townships are shown as the largest circles (red), coreII as the second largest (orange), and coreIII the third largest (yellow); green dots indicate noncore townships. The spatial distribution for all of Taiwan is presented in the first row (ac), and for the Taipei metropolitan area (including Taipei, New Taipei, Keelung and Taoyuan) in the second row. As shown, major forward commuting network flows are from residential to central business districts (CBDs), and major backward flows are from CBDs to residential districts in neighboring or satellite regions. When only backward movement is considered in the Taipei metropolitan area (Fig. 7d), highER value (coreI) townships are mostly concentrated in the southwest section of New Taipei; in contrast, most Taipei townships are coreIII. This pattern extends to Taoyuan (southwest of the core Taipei area), which has higher numbers of coreII and coreIII townships. When only forward movement is considered (Fig. 7f), coreI townships are mostly concentrated in the southwest section of Taipei and extend further southwest to both New Taipei and Taoyuan, where there are more coreII and coreIII townships. When bidirectional movement is considered (Fig. 7e), Taipei townships are predominantly coreI and coreII, and once again higher numbers of coreII and coreIII townships are found in New Taipei and Taoyuan.
Similar observations were made for the TaichungChanghua metropolitan area in westcentral Taiwan and the TainanKaohsiung cluster in the far south. Specifically, when only forward movement is considered, core townships (especially coreI townships) are concentrated in central urban locations (Fig. 7c), and when only backward movement is considered, core areas are scattered throughout satellite regions (Fig. 7a). When bidirectional movement is considered, firstlevel core townships are primarily located in central urban locations, while a larger number of second and thirdlevel core townships are found in neighboring regions.
The forwardonly and backwardonly cases represent the oneway transmission of pathogens toward or away from CBDs. Greater disease spread during morning hours indicates higher rates of transmission in or during movement toward CBDs; higher transmission rates during evening hours occur as commuters move toward home/satellite regions. These observations are also clear in Fig. 6d,f, with concentrated core townships exhibiting pushing properties when backward movement alone is considered, and pulling properties when only forward movement is considered. Townships with pushing properties are more likely to be in residential areas, and townships with pulling properties are more likely to be in commercial districts. During disease outbreaks, disease spread may influence both townships where individuals live and where they work. For this reason, the distribution of results based on a 0.5 daytime parameter exhibited similar patterns for the two diseases.
Comparison of bidirectional EpiRank and disease data
The actual flu and EV distribution data shown in Fig. 8 were compared with predicted conditions. Black circles indicating townships identified by EpiRank as core at all three levels plus actual data for both diseases were added to Fig. 8a,b. As shown, core flu case townships are clustered in the southwest sections of Taipei and New Taipei, with all Taipei townships identified as core. In contrast, the EV case distribution shown in Fig. 8b has an extended pattern in which coreI and coreII townships appear in the southwest section of New Taipei only; in contrast, Taipei was limited to coreII and coreIII townships. A possible explanation is that most EV cases resulted from closer and longer interactions between infected and susceptible individuals.
Figure 9 presents a comparison of actual and EpiRankpredicted conditions. Green bars indicate predicted noncore percentages of the four actual conditions. According to the first bar in Fig. 9a, of the 12 townships in the coreI flu case group, none were identified by EpiRank as noncore and 8 were identified by EpiRank as either coreI or coreII (66.7%). The percentage of EpiRankidentified noncore townships increased for actual coreII townships and even more for actual coreIII townships. The same data also indicate that recall (i.e., true positives divided by total numbers of core townships) decreased for township groups with lower epidemic risk values (100% for coreI, 76.5% for coreII, and 67.9% for coreIII). Combined, EpiRank correctly identified 63 of 87 actual coreI, coreII and coreIII flu case townships (72.4% recall). Among the 266 actual noncore townships, 215 were identified by EpiRank as noncore—an 80% level of specificity (true negatives divided by total numbers of noncore townships). Similar patterns were found for EV cases: 93.3%, 86.2% and 70.0% recall values for actual coreI, coreII and coreIII townships, respectively. The combined results show that EpiRank correctly identified 81 of 134 (60.4%) actual coreI, coreII and coreIII EV case townships—that is, they indicate that EpiRank exhibited low likelihoods of under and overestimation.
Comparisons with previous methods
Cumulative results for transitive effects in a commuting network can be determined using network measures such as PageRank and HITS (including both Hub and Authority, hereafter referred to as HITSHub and HITSAuthority). Head/tail breaks were used to organize results from the three nonEpiRank indexes according to the four core/noncore township categories (Table 1). In all, PageRank identified 107 townships as coreI, coreII or coreIII, and 246 as noncore. HitHub identified 26 core and 327 noncore townships, and HITSAuthority identified 29 core and 324 noncore townships.
We used Pearson and Spearman correlation coefficients and two binary classification tests (recall and precision) to compare distribution results generated by the three network measures and EpiRank for both diseases. As shown in Table 2, correlations for the EpiRank and actual disease data were higher than those for the other three network metrics, and EpiRank recall values were highest among the four. A possible explanation for EpiRank’s lower precision compared to HITSHub and HITSAuthority is the significantly smaller number of core townships in the latter two. Precision is defined as the proportion of true positives to the total number of detected cores (i.e., true predicted condition), therefore the smaller numbers of core townships may have produced higher precision values for the two HITS indexes.
Core and noncore township distributions according to log inout ratios are presented in Fig. 10. As shown, PageRank results concentrated core townships on the right side of the graph, indicating that most had pull properties (Fig. 10b). In contrast, HITSHub (Fig. 10c) and HITSAuthority (Fig. 10d) produced higher numbers of core townships on the left side, indicating more push properties. Our EpiRank distribution results were close to actual data for both diseases (Fig. 3). EpiRank outperformed the other three network indexes for identifying push townships, as well as for identifying noncore townships.
Daytime and damping factor parameter sensitivity
The two primary EpiRank parameters are daytime (for controlling weights between forward and backward effects) and damping factor (for controlling weights between network transitive and external factor effects). The daytime parameter value indicates the strength of the forward movement effect, and the 1−daytime value the strength of the backward movement effect. Results from sensitivity tests for the two parameters are shown in Fig. 11. The daytime value ranged between 0 and 1 in 0.05 increments (21 individual values). The EpiRank damping parameter d refers to the network topological structure weight, with (1−d) indicating the influence of one or more external factors. A damping factor of 0 indicates no effect of network topological structure on distribution, and a damping factor of 1 indicates no effect from one or more external factors on disease diffusion. The external factor distribution in this study was equal for all townships, meaning that EpiRank values for all nodes were equal when the damping factor was set to 0. For our sensitivity analyses we set the damping factor from 0.05 to 1.00 in 0.05 increments (20 individual values).
EpiRank results for the two diseases were compared using Pearson and Spearman correlation coefficients. As shown in Fig. 11a–d, correlations initially increased in step with daytime value increments, then started to fall between values of 0.4 and 0.5. Flu case correlations peaked at daytime values of 0.3 (Pearson’s R) and 0.5 (Spearman’s rho), and EV case correlations peaked at daytime values of 0.4 (Pearson’s R) and 0.5 (Spearman’s rho). Combined, these results indicate that EpiRank captured the structure of bidirectional movement. According to the Yaxis data in the four Fig. 11 graphs, all correlation peak data for daytime values were slightly below 0.5, indicating that the backward movement effect was more important than the forward movement effect—in other words, disease spread was more likely to occur during backward than forward commuting movement. EpiRank successfully captured this effect. Our finding that lower daytime values were better at capturing disease distribution may be due to interaction diversity and intensity. Individuals who interact with large numbers of colleagues, clients, classmates, or others during work hours have many lowintensity interactions, while individuals who stay at home with small numbers of family members or roommates have low diversity but high intensity interaction values, resulting in very different disease transmission effectiveness rates. When daytime values are lower than 0.5, the strength of the backward effect exceeds that of the forward effect, indicating greater importance for interaction intensity compared to diversity.
The correlation coefficients for the Xaxis dimension indicate slow but steadily increasing values in step with increased damping, with the highest correlations for both flu and EV observed at a factor value of 1.0. In sum, correlation changes associated with the damping factor were much smaller than those associated with the daytime factor, possibly because differences in external factors between townships were not considered, thereby increasing the effect of network topological structure on disease spreading.
Discussion and Conclusion
Previous researchers have used networks to capture topological structures underlying disease diffusion^{44,45,46,47} or to model different disease control strategies and consider their potential outcomes^{30,31}. Network structures have also been used to analyze intercity movement and transportation networks^{14,40}, multilayer interactions between cities^{48,49}, surface street congestion problems^{15,50,51}, airline flight patterns^{52,53,54}, and maritime movement^{53,55,56}. Our proposed algorithm offers a novel perspective for these and other networks.
Network topological structure diffusion studies have generally emphasized the forward movement of directed links^{30,57}. Some researchers have investigated car movements in street networks to understand congestion patterns and to identify locations where cars and pedestrians gather as the transitive results of those movements^{14,15}. Commuting networks capture morning (forward) movement from where people live to where they work or attend school, and evening (backward) movement in the opposite direction. Commuting network studies have generally neglected this combination and its implications. Our motivation to create the EpiRank algorithm was to capture the forward and backward movement effects missed by previous methods.
In epidemiological studies, compartment models such as SIR or SEIR are frequently employed to analyze the temporal dimension of disease outbreaks. Although compartment models are useful for analyzing dynamic changes in disease events over time, the addition of spatial considerations generally makes them excessively complex. Epidemic models such as EpiSimS^{45,58}, GLEAMviz^{59,60}, and EpiFast^{61} were created to add spatial variation and human movement into analyses by inserting other factors and processes into disease diffusion models. The strength of these models is their ability to produce detailed results with high degrees of accuracy, but their weakness is the high costs of acquiring, processing and working with input data. For our proposed EpiRank algorithm, input data primarily consist of the OD flow matrixes of commuting networks, and analyses consist of lowcost matrix multiplication series. As an example, for a scenario consisting of 353 nodes (townships) and 11,220 links (commuting connections), only 588 EpiRank iterations were required to achieve a stable outcome. While EpiRank cannot capture the complex and detailed epidemiological structures of disease diffusion, it is more than adequate for capturing geographic diffusion outcome patterns in broader contexts. Accordingly, EpiRank may have high utility for public health authorities working on resource distribution problems during disease outbreaks.
Although we focused on two infectious diseases that are spread via droplets and physical contact, the spread of mosquitoborne diseases such as the dengue and Zika viruses also entail human movement patterns^{8,9,62,63}. Mosquitoborne diseases differ from droplet and physical contacttransmission diseases because they require specific mosquito species and suitable environments for mosquito reproduction and virus development. Researchers must also consider intrinsic and extrinsic incubation periods between pairs of infected individuals. We believe that the EpiRank algorithm can be modified to make it useful for vectorborne disease studies.
There are other ways that EpiRank can be extended and modified for specific research requirements such as those involving trains, buses and rapid transit systems. To emphasize the effects of forward and backward movement, in the present study we simplified transportation mode differences by aggregating commuters between pairs of townships. In Taiwan, common commuting transportation modes include private vehicles (cars, motorcycles) and public transportation (buses, railways). In other countries they include ferries. Human interactions in different transportation types (e.g., crowded buses versus singleoccupancy cars) result in different disease diffusion outcomes—in other words, different link types have various effects on infection processes^{64}. EpiRank can be modified to consider a broad range of FW and BW patterns, with other weighting parameters used to integrate various transportation modes into calculations.
To assist in studies involving external factors, EpiRank provides an exFac term for integrating local environmental influences that have potential for altering location (node) vulnerability. The strength of the exFac effect on the disease spreading process is controlled by the damping factor—the lower the factor, the stronger the effect. Accordingly, exFac can be treated as the collective results of locationbased sociodemographics or physical environmental factors such as prior infection history, population density, income level, daily temperature and precipitation, or air quality. We designed exFac to serve as a positive multiplier in equations so that it exerts a diffusion or increased susceptibility effect. When adding other external factors, researchers need to convert variables to ensure such positive effects. For example, if a higher population density leads to the greater likelihood of disease spread into a specific area, that means population density exerts a positive effect. In contrast, if a lower income level makes a location more vulnerable, that indicates a need to use an inverse income value for the exFac variable. If exFac can be modelled as a function of multiple variables, then collective model outcomes should be used with the EpiRank algorithm. Further, the addition of exFac means that the effects of a damping factor may express different patterns, thus requiring a damping factor sensitivity analysis.
Another possible EpiRank extension involves the spatial features of nodes. Since commuting networks are embedded in geographic spaces, distances between locations can wield strong influences on movement^{14}. Our decision to not include the influences of distance in the version of EpiRank used in this study reflects the lack of certainty regarding the effects of distance on disease diffusion within a commuting network. More investigation is required to clarify the effects of distance on disease diffusion (e.g., the effect shapes of gravity, exponential, or radiation model functions) before adding this factor to EpiRank.
A third possible extension involves separating the influences of local flows. The term “local flow” refers to individuals who live and work in the same geographic location such as a township; it is expressed as a selfloop in the commuting network. In scenarios where diseases are spread among spatial network locations through interlocation links, one potential effect of local flow is strengthening nodes via larger populations. Since population size is another important factor influencing disease spreading mechanisms^{65,66}, it should be modeled separately from and integrated into the link structurebased diffusion model.
There are several study limitations to be considered, and we will address three. First, the study did not distinguish between infection and spreading mode development processes. Different diseases have different incubation and latency periods, as well as different spreading mechanisms (e.g., through physical contact or via air, water, or a vector). These differences in epidemiology and etiology are key considerations when creating models. Second, location or landuse factors such as business, residential, agricultural or forestry status can affect infection patterns, as can terrain and prevailing weather systems. These factors were not included in this study because they require more analysis and experimentation with the EpiRank algorithm to capture their effects. Third, commuting network and disease data collection occurred at different times, the former in 2000 and the latter between 2000 and 2009. Since commuting information could not be extracted from the 2010 census due to changes in methodology, we used 2000 data as a substitute, thus requiring an assumption that commuting behaviors did not change dramatically during the period inbetween.
Data Availability
The processed data and the code for algorithm are available on the public repository: https://github.com/canslab1/EpiRankAlgorithm/.
References
Newman, M. E. J., Barabási, A.L. & Watts, D. J. (eds) The structure and dynamics of networks. Princeton studies in complexity (Princeton University Press, 2006).
Onnela, J.P. et al. Structure and tie strengths in mobile communication networks. Proc. Natl. Acad. Sci. U.S.A. 104(18), 7332–7336 (2007).
Kwak, H., Lee, C., Park, H. & Moon, S. What is Twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web  WWW ’10 591, https://doi.org/10.1145/1772690.1772751 (ACM Press, 2010).
Bakshy, E., Rosenn, I., Marlow, C. & Adamic, L. The role of social networks in information diffusion. In Proceedings of the 21st international conference on World Wide Web  WWW ’12 519, https://doi.org/10.1145/2187836.2187907 (ACM Press, 2012).
Viboud, C. et al. Synchrony, waves, and spatial hierarchies in the spread of influenza. Science 312(5772), 447–451 (2006).
Stoddard, S. T. et al. The role of human movement in the transmission of vectorborne pathogens. PLoS Negl. Trop. Dis. 3(7), e481 (2009).
Sun, G.Q. Pattern formation of an epidemic model with diffusion. Nonlinear Dyn. 69(3), 1097–1104 (2012).
Zhan, X.X. et al. Coupling dynamics of epidemic spreading and information diffusion on complex networks. Appl. Math. Comput. 332, 437–448 (2018).
Li, L. et al. Analysis of transmission dynamics for Zika virus on networks. Appl. Math. Comput. 347, 566–577 (2019).
Balthrop, J., Forrest, S., Newman, M. E. J. & Williamson, M. M. Technological networks and the spread of computer viruses. Science 304(5670), 527–529 (2004).
Huang, C.Y., Lee, C.L., Wen, T.H. & Sun, C.T. A computer virus spreading model based on resource limitations and interaction costs. J. Syst. Softw. 86(3), 801–808 (2013).
Valente, T. Network models of the diffusion of innovations. Comput. Math. Organ. Theory 2(2), 163–164 (1996).
Delre, S. A., Jager, W., Bijmolt, T. H. A. & Janssen, M. A. Will it spread or not? The effects of social influences and network topology on innovation diffusion. J. Prod. Innov. Manag. 27(2), 267–282 (2010).
Chin, W. C. B. & Wen, T. H. Geographically modified PageRank algorithms: Identifying the spatial concentration of human movement in a geospatial network. PLoS One 10(10), e0139509 (2015).
Wen, T.H., Chin, W.C.B. & Lai, P.C. Understanding the topological characteristics and flow complexity of urban traffic congestion. Physica A 473, 166–177 (2017).
Moreno, Y., Nekovee, M. & Pacheco, A. F. Dynamics of rumor spreading in complex networks. Phys. Rev. E 69(6), 066130 (2004).
Lind, P. G., da Silva, L. R., Andrade, J. S. & Herrmann, H. J. Spreading gossip in social networks. Phys. Rev. E 76(3), 036117 (2007).
Christakis, N. A. & Fowler, J. H. The spread of obesity in a large social network over 32 years. N. Engl. J. Med. 357(4), 370–379 (2007).
Rosenquist, J. N., Fowler, J. H. & Christakis, N. A. Social network determinants of depression. Mol. Psychiatry 16(3), 273–281 (2011).
Kramer, A. D. I., Guillory, J. E. & Hancock, J. T. Experimental evidence of massivescale emotional contagion through social networks. Proc. Natl. Acad. Sci. U.S.A. 111, 8788–8790 (2014).
Tita, G. E. & Radil, S. M. Spatializing the social networks of gangs to explore patterns of violence. J. Quant. Criminol. 27(4), 521–545 (2011).
Kramer, C. R. Network Theory and Violent Conflicts: Studies in Afghanistan and Lebanon, https://doi.org/10.1007/9783319413938 (Springer International Publishing, 2017).
González, M. C., Hidalgo, C. A. & Barabási, A.L. Understanding individual human mobility patterns. Nature 453(7196), 779–782 (2008).
Wasserman, S. & Faust, K. Social network analysis: methods and applications. (Cambridge University Press, 1994).
Perez, L. & Dragicevic, S. An agentbased approach for modeling dynamics of contagious disease spread. Int. J. Health Geogr. 8(1), 50 (2009).
Haggett, P., Cliff, A. D. & Frey, A. E. Locational analysis in human geography. (Wiley, 1977).
Cliff, A. D., Haggett, P. & SmallmanRaynor, M. Island Epidemics. (Oxford University Press, 2000).
Grais, R. F., Hugh Ellis, J. & Glass, G. E. Assessing the impact of airline travel on the geographic spread of pandemic influenza. Eur. J. Epidemiol. 18(11), 1065–1072 (2003).
Hufnagel, L., Brockmann, D. & Geisel, T. Forecast and control of epidemics in a globalized world. Proc. Natl. Acad. Sci. U.S.A. 101(42), 15124–15129 (2004).
Meloni, S. et al. Modeling human mobility responses to the largescale spreading of infectious diseases. Sci. Rep. 1, 62 (2011).
Tanaka, G., Urabe, C. & Aihara, K. Random and targeted interventions for epidemic control in metapopulation models. Sci. Rep. 4, 5522 (2015).
Tang, J. W., Li, Y., Eames, I., Chan, P. K. S. & Ridgway, G. L. Factors involved in the aerosol transmission of infection and control of ventilation in healthcare premises. J. Hosp. Infect. 64(2), 100–114 (2006).
Wang, L. & Li, X. Spatial epidemiology of networked metapopulation: An overview. Chin. Sci. Bull. 59(28), 3511–3522 (2014).
Belik, V., Geisel, T. & Brockmann, D. Natural human mobility patterns and spatial spread of infectious diseases. Phys. Rev. X 1(1), 011001 (2011).
Balcan, D. et al. Modeling the spatial spread of infectious diseases: The GLobal Epidemic and Mobility computational model. J. Comput. Sci. 1(3), 132–145 (2010).
Wallace, R. & Wallace, D. Innercity disease and the public health of the suburbs: The sociogeographic dispersion of pointsource infection. Environ. Plan. A 25(12), 1707–1723 (1993).
Wallace, R. & Wallace, D. The coming crisis of public health in the suburbs. Milbank Q. 71(4), 543 (1993).
Brin, S. & Page, L. The anatomy of a largescale hypertextual web search engine. In Proceedings of the Seventh International Conference on World Wide Web 7 107–117 (Elsevier Science Publishers B. V., 1998).
Jiang, B. Ranking spaces for predicting human movement in an urban environment. Int. J. Geogr. Inf. Sci. 23(7), 823–837 (2009).
ElGeneidy, A. & Levinson, D. Place Rank: Valuing spatial interactions. Netw. Spat. Econ. 11(4), 643–659 (2011).
Executive Yuan, DirectorateGeneral of Budget, Accounting and Statistics (Republic of China). The 2000 Census of Population and Housing (2002).
Liu, C. Y. et al. Incorporating development stratification of Taiwan townships into sampling design of large scale health interview survey. J. Health Manag. 4(1), 1–22 (2006).
Jiang, B. Head/Tail Breaks: A new classification scheme for data with a heavytailed distribution. Prof. Geogr. 65(3), 482–494 (2013).
Chin, W.C.B., Wen, T.H., Sabel, C. E. & Wang, I.H. A geocomputational algorithm for exploring the structure of diffusion progression in time and space. Sci. Rep. 7, 12565 (2017).
Eubank, S. et al. Modelling disease outbreaks in realistic urban social networks. Nature 429(6988), 180–184 (2004).
Gómez, J. M. & Verdú, M. Network theory may explain the vulnerability of medieval human settlements to the Black Death pandemic. Sci. Rep. 7, 43467 (2017).
Wen, T.H., Hsu, C.S. & Hu, M.C. Evaluating neighborhood structures for modeling intercity diffusion of largescale dengue epidemics. Int. J. Health Geogr. 17, 9 (2018).
Gastner, M. T. & Newman, M. E. J. Optimal design of spatial distribution networks. Phys. Rev. E 74(1), 016117 (2006).
Hu, Y. et al. Percolation of interdependent networks with intersimilarity. Phys. Rev. E 88(5), 052805 (2013).
Gao, S., Wang, Y., Gao, Y. & Liu, Y. Understanding urban trafficflow characteristics: A rethinking of betweenness centrality. Environ. Plan. B Plan. Des. 40(1), 135–153 (2013).
Buzna, Ľ. & Carvalho, R. Controlling congestion on complex networks: Fairness, efficiency and network structure. Sci. Rep. 7, 9152 (2017).
Guimera, R., Mossa, S., Turtschi, A. & Amaral, L. A. N. The worldwide air transportation network: Anomalous centrality, community structure, and cities’ global roles. Proc. Natl. Acad. Sci. U.S.A. 102(22), 7794–7799 (2005).
Ducruet, C., Ietri, D. & Rozenblat, C. Cities in worldwide air and sea flows: A multiple networks analysis. Cybergeo Eur. J. Geogr, 528, https://doi.org/10.4000/cybergeo.23603 (2011).
Verma, T., Araújo, N. A. M. & Herrmann, H. J. Revealing the structure of the world airline network. Sci. Rep. 4, 5638 (2015).
Ducruet, C., Lee, S.W. & Ng, A. K. Y. Centrality and vulnerability in liner shipping networks: Revisiting the Northeast Asian port hierarchy. Marit. Policy Manag. 37(1), 17–36 (2010).
Ducruet, C. & Notteboom, T. The worldwide maritime network of container shipping: spatial structure and regional dynamics. Glob. Netw. 12(3), 395–423 (2012).
Apolloni, A., Poletto, C., Ramasco, J. J., Jensen, P. & Colizza, V. Metapopulation epidemic models with heterogeneous mixing and travel behaviour. Theor. Biol. Med. Model. 11, 3 (2014).
Halloran, M. E. et al. Modeling targeted layered containment of an influenza pandemic in the United States. Proc. Natl. Acad. Sci. U.S.A. 105(12), 4639–4644 (2008).
Balcan, D. et al. Multiscale mobility networks and the spatial spreading of infectious diseases. Proc. Natl. Acad. Sci. U.S.A. 106(51), 21484–9 (2009).
Bajardi, P. et al. Human mobility networks, travel restrictions, and the global spread of 2009 H1N1 pandemic. PLoS One 6(1), e16591 (2011).
Bisset, K. R., Chen, J., Feng, X., Kumar, V. S. A. & Marathe, M. V. EpiFast: A fast algorithm for large scale realistic epidemic simulations on distributed memory systems. In Proceedings of the 23rd international conference on Conference on Supercomputing  ICS ’09 430, https://doi.org/10.1145/1542275.1542336 (ACM Press, 2009).
Li, R., Wang, W. & Di, Z. Effects of human dynamics on epidemic spreading in Côte d’Ivoire. Physica A 467, 30–40 (2017).
Malik, H. A. M., Mahesar, A. W., Abid, F., Waqas, A. & Wahiddin, M. R. Twomode network modeling and analysis of dengue epidemic behavior in Gombak, Malaysia. Appl. Math. Model. 43, 207–220 (2017).
Li, R., Tang, M. & Hui, P.M. Epidemic spreading on multirelational networks. Acta Phys. Sin. 62(16), 168903 (2013).
Li, R. et al. Simple spatial scaling rules behind complex cities. Nat. Commun. 8, 1841 (2017).
Li, R., Richmond, P. & Roehner, B. M. Effect of population density on epidemics. Physica A 510, 713–724 (2018).
Acknowledgements
This work was supported by the Republic of China Ministry of Science and Technology (MOST1072221E182069) and the High Speed Intelligent Communication (HSIC) Research Center of Chang Gung University, Taiwan.
Author information
Authors and Affiliations
Contributions
C.Y.H. and T.H.W. conceived the experiments. C.Y.H. and Y.H.F. collected and processed all data. C.Y.H., W.C.B.C. and Y.H.F. executed the experiments. C.Y.H., W.C.B.C. and Y.S.T. analyzed the results. All authors contributed to writing and reviewing the manuscript, and all gave approval to its content.
Corresponding author
Ethics declarations
Competing Interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Huang, CY., Chin, WCB., Wen, TH. et al. EpiRank: Modeling Bidirectional Disease Spread in Asymmetric Commuting Networks. Sci Rep 9, 5415 (2019). https://doi.org/10.1038/s41598019417198
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598019417198
This article is cited by

Networks and longrange mobility in cities: A study of more than one billion taxi trips in New York City
Scientific Reports (2020)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.