Spatial super-spreaders and super-susceptibles in human movement networks

As lockdowns and stay-at-home orders start to be lifted across the globe, governments are struggling to establish effective and practical guidelines to reopen their economies. In dense urban environments with people returning to work and public transportation resuming full capacity, enforcing strict social distancing measures will be extremely challenging, if not practically impossible. Governments are thus paying close attention to particular locations that may become the next cluster of disease spreading. Indeed, certain places, like some people, can be “super-spreaders”. Is a bustling train station in a central business district more or less susceptible and vulnerable as compared to teeming bus interchanges in the suburbs? Here, we propose a quantitative and systematic framework to identify spatial super-spreaders and the novel concept of super-susceptibles, i.e. respectively, places most likely to contribute to disease spread or to people contracting it. Our proposed data-analytic framework is based on the daily-aggregated ridership data of public transport in Singapore. By constructing the directed and weighted human movement networks and integrating human flow intensity with two neighborhood diversity metrics, we are able to pinpoint super-spreader and super-susceptible locations. Our results reveal that most super-spreaders are also super-susceptibles and that counterintuitively, busy peripheral bus interchanges are riskier places than crowded central train stations. Our analysis is based on data from Singapore, but can be readily adapted and extended for any other major urban center. It therefore serves as a useful framework for devising targeted and cost-effective preventive measures for urban planning and epidemiological preparedness.


Scientific Reports
| (2020) 10:18642 | https://doi.org/10.1038/s41598-020-75697-z www.nature.com/scientificreports/ is the low-high outliers, that is any location with low density of disease cases which is surrounded by high density locations, thereby making it more vulnerable as it has a higher probability to report a higher number of cases in the following time period 54 . In the field of network science, the concepts of 'spreaders' and 'receivers' first appeared in the Hyperlink-Induced Topic Search (HITS) algorithm 22 as hubs and authorities, respectively. In the HITS algorithm, hubs describe highly influential nodes, while authorities represent highly popular destination nodes. To sum-up, the spatial super-susceptibles correspond to the susceptible locations in a spatial diffusion network. These locations are identified as being more vulnerable within the network as they are the destination of more people, hence generating a higher probability of being visited by infected agents. The spatial super-susceptibles are vulnerable locations as they are prone to disease infection thereby having the potential to become hotbeds for disease spreading to the rest of a city or region. Note that if a place is both spatial super-spreader and spatial super-susceptible, it would require particular attention since it would pose the risk of simultaneously being a hotbed of infection and disease spreader. Identifying these places would therefore be critical in the fight with infectious diseases such as  In this article, we report a study aimed at systematically identifying the spatial super-spreaders and spatial super-susceptibles in the spatial human network of the city-state of the Republic of Singapore. The particular choice of Singapore stems from it having: (1) been hit early in the first wave of infection directly from Wuhan and with a systematic tracking and mapping of infected people 55 , (2) one of the highest population densities in Southeast Asia, (3) a dense and highly interconnected human mobility and transportation networks 48,56 , and (4) detailed and reliable data for the construction of spatial networks 57 . As mentioned earlier, a spatial super-spreader is a locus with a high outflow of people-i.e. a place where a lot of people are originated from and those people are moving to a high variety of places. In the same vein, a spatial super-susceptible is a destination for a large number of individuals originating from different places. Hence, this work proposes a systematic data-centric framework enabling the identification of spatial locations, which should be targeted by public health agencies in the event of an epidemic such as Covid-19. With these critical places identified, policy makers would then be able to implement cost-effective targeted responses with prevention and intervention measures directly connected to the level of vulnerability of a given location.

Materials and methods
This section is divided into three parts: descriptions of (a) the study area, (b) the flow data, and (c) the metrics and indexes. Study area. This study focuses on the public transportation flow network in Singapore. The city-state primarily occupies an island located in Southeast Asia with a total surface area of about 724.2 km 2 . As of 2019, the total population of Singapore is about 5.703 million people (with a population density of about 7,875.68 per km 2 ), in which 70.6% are residents (citizens and permanent residents) and 29.4% are non-residents (foreigners with long-term passes). According to the General Household Survey 2015 58 , about 62.7% students and 64.1% working person relies on bus or rail transport services to travel to schools or work places, thereby making public transportation the primary mode of transportation in Singapore. As a result, the density of people using the public transports during the morning and evening peaks are high, and the distance between people at the stations and vehicles is short. Hence, a direct consequence of the high population density combined with a high rate of people using public transportation is that physical distancing is extremely challenging if not impossible during regular operations. This issue is a serious concern when facing the spreading of a highly contagious disease such as  To analyze the data, we consider the administrative subzone level spatial boundaries (from the Singapore Master Plan 2014 59 ) as the analysis unit. The residential population density (from the General Household Survey 2015 58 ) are shown in Fig. 1. There are five regions (Central, West, North, North East, and East), with 55 planning areas, and 323 subzones 59 . Some of the subzones contain no residential population (white areas), which include airports and airbases (e.g. Changi Airport in the East Region) and industrial parks or ports (e.g. Jurong Island and Bukom at the south of the West Region, and Simpang North and South at the North Region). Although these places lack residential population, they are the workplaces (destinations) of a large number of individuals. The darker color areas indicate the home for a large number of people; in other words, a large number of journeys starting from and ending at these locations.
Weekday and weekend flow networks. We used the origin-destination (OD) ridership data of bus and train to generate the public transport flow networks. The OD ridership data is systematically collected by the Singapore Land Transport Authority (LTA is a government statutory board under the Ministry of Transport) through API calls 57 . In this study, we used the ridership from November 2019 to January 2020. In terms of temporal resolution, the OD ridership data provides hourly passenger flows between each pair of bus stops or train stations (including mass rapid transit and light rail transit). The raw data are then aggregated into weekdays (a total of 21 days in November 2019, 22 days in December 2019 and 23 days in January 2020) or weekends (9 days in both November and December 2019 and 8 days in January). The total number of trips for trains and buses starting from November 2019 to June 2020 are shown in Fig. S1 (see Supplementary Material).
As the raw data records the flow between OD pairs of bus stops or train stations, we spatially aggregate the data into flows between subzones, according to the bus stop or train station locations. A total of 303 subzones (out of a total of 323) contained at least one bus stop or one train station. These subzones then form the nodes (303 nodes) of the weighted direct network, with flows between nodes corresponding to the weight of directed edges. A total of 30,331 edges were found, with a vast majority (30,043 edges or 99%) being edges across subzones,

Metrics and indexes.
To carry out this study, we introduce two indexes, namely the spreader index (SPI) and the susceptible index (SUI) to search for the spatial super-spreaders (SSP) and spatial super-susceptibles (SSS). Both indexes SUI and SPI are quantitatively determined and calculated using two key elements: (1) the local strength of human in-and outflows, and (2) the diversity of their respective neighborhoods 18 . The local strength of in-and outflows for a given location is the number of people coming to or leaving from the location, i.e. respectively the weighted in-degree and weighted out-degree of the corresponding node. The neighborhood diversity is captured and quantified by two types of concepts: (1) the diversity of zones and (2) the diversity of coreness. The diversity of zones 48,60 refers to people that are coming from different parts of the city. As for the diversity of coreness [61][62][63] , it refers to people either coming from the core or from the periphery of the country.
More details about what constitutes core and periphery is given in Step 3 below. We applied this analysis framework to the Singapore public transport flow network, and identified the SSP and SSS using the SUI and SPI indexes. The population flow patterns are expected to be different for weekdays and weekends. Thus, the flow data were separated into weekday and weekend ones. The calculation flow of the spatial spreader and spatial susceptible indexes is detailed in Fig. 2. The first part consists in aggregating the bus and train OD flow data to subzones as mentioned earlier. That top layer provides the main data for the calculation, i.e. two weighted and directed networks: weekday and weekend flow networks. These networks are subsequently used to compute three network characteristic measurements, including degree centrality (Step 1), community detection (Step 2), and k-shell decomposition (Step 3), which are described in full details in the following subsections. The degree centrality is used as a proxy for the intensity of the local out-and inflows, whereas the community detection and k-shell decomposition results enable the computation of neighborhood diversity, including zone-entropy and coreness-entropy as introduced below. Finally, in the last step (Step 4), the three network characteristics are used to calculate the SUI and SPI.
Step 1: Degree centrality. The degree centrality in this study includes both the non-weighted and weighted inand out-degrees. The non-weighted and weighted versions of the degree centrality represent different concepts in terms of network characteristics. The non-weighted in-degree and out-degree are the number of links (or edges) that are pointed to and from a subzone, respectively. This non-weighted degree centrality measures the number of relationships that a particular subzone has. As for the weighted in-degree and out-degree, they correspond to the summation of incoming/outgoing flows for a given subzone, respectively. This weighted version of degree centrality indicates the total strength of a node in terms of gathering flows or spreading flows without accounting for the actual number of (incoming or outgoing) edges.
In this study, the weighted degree centrality is used to represent the local intensity of nodes for the calculation of the SUI and SPI. The weighted degree centrality is scaled within the unit interval (see Eq. (1) for weighted outdegree and Eq. (2) for weighted in-degree), where OutDegree(i) and InDegree(i) are respectively the weighted out-degree and weighted in-degree of node i, O is the set of all out-degree nodes, and I is the set of all in-degree nodes. On the other hand, both non-weighted and weighted degree centralities are used in the weighted k-shell decomposition analysis performed as Step 3.
(1) www.nature.com/scientificreports/ Step 2: Zone-entropy. This study uses a community detection method (MapEquation algorithm 60 ) to identify the zones from the flow network, instead of using the administrative spatial boundaries (i.e. the boundaries of planning areas and regions as defined by the Singapore Government in its Master Plan 2014 59 ) that were designed and selected for governance and political purposes. The communities from this flow network analysis capture both the strength and direction of flows, which reflect the spatial activity of people derived from their daily commuting/mobility behaviors 48 . As the community distribution is identified for weekday and weekend networks, similarly the distribution should be differentiated between weekdays and weekends. MapEquation is used to identify the communities in the flow networks 60 . This algorithm considers the direction and weight of edges to identify the strongly connected nodes in a directed and weighted network. This particular algorithm is different from modularity-based community detection methods since MapEquation's calculation concept emphasizes the strength of flows in community, i.e. higher flow intensities within a community than between communities (flows cycling within communities). MapEquation captures the effect of direction while ensuring large amount of flows are kept within the community. Moreover, the communities obtained with MapEquation are used as the zones that contain strong human flows cycle, which is quantified with the concept of zone-entropy. Note that to maintain the spatial continuous properties of the community, we integrate a distance decay effect 64 in the flow intensity calculation (see Eq. (3)) before running MapEquation: where F(o, d) is the number of people moving from the origin subzone o to the destination subzone d, distance (o, d) is the distance between the two subzones, and F ′ (o, d) is the actual flow intensity incorporating the distance decay effect.
First, we run the MapEquation algorithm on the two networks (weekdays & weekends), and identify the zone (set of communities Z = {Z 1 , Z 2 , . . . , Z max } with Z j = {n| ∀ n belongs to community j} ) in which each subzone (node) belongs to. Then, for each subzone, the incoming/outgoing neighbors' zones are retrieved from the results together with the weights of incoming/outgoing edges (w(j, i) or w(i, j)). The neighbors' zone information and flow weights are used to calculate the normalized entropy ( H zone neigh (i) ) using Eqs. (4)- (6). The entropy is normalized using the total number of zones in the network to enable a comparison between nodes. Note that the zone-entropy value ranges between 0 and 1 as a consequence of this normalization. www.nature.com/scientificreports/ Step 3: Coreness-entropy. The k-shell decomposition is a method to label the coreness (k-shell levels) of nodes in a network based on the connectivity structure 17 . Because the edges of the flow networks were weighted, we use the weighted k-shell decomposition 61 , which is an extended version that consider both the number of links (degree) and the weights of links while labeling coreness. The coreness of a location indicates the position of the location in the range from periphery (low k-shell levels) to core (high k-shell levels). In a population flow network, the core locations indicate the common origins or destinations for a large number of passengers.
In this study, we first run the weighed k-shell decomposition using the non-weighted and weighted in-/ out-degree (from Step 1) to calculate the in/out-k-shell levels for each subzone. Then, the k-shell levels are grouped into core (in-/out-core) or periphery (in/out-non-core) using the median value as a cutoff. Finally, for each node, its incoming/outgoing neighbors' core/non-core information is integrated with the flow weights to calculate the so-called coreness-entropy ( H core neigh (i) ) as defined in Eqs. (7)- (9). The entropy is normalized using the total number of coreness levels (binary levels here, i.e. C = {core, periphery} ), to facilitate the comparison of the results between nodes. Note that the coreness-entropy value ranges between 0 and 1 after this normalization.
Step 4: Spatial spreader and susceptible indexes. The spatial spreader index (SPI) and spatial susceptible index (SUI) are base on the general concepts of the framework proposed by Fu et al. 18 and Zhang et al. 38 . However, the exact indices are largely modified to account for the specificities of our study. Specifically, the SPI and SUI calculations are based on a geometric average of three key network metrics. The SPI (see Eq. (10)) is the geometric average of the local normalized weighted out-degree ( NWOutDegree(i) ), the zone-entropy of outgoing neighbors ( H zone OutNeigh (i) ), and the out-coreness-entropy of the outgoing neighbors ( H core OutNeigh (i) ). To understand this particular definition, one may for instance consider the case for which a node's SPI is high: this node has a high volume of outgoing flows (high local intensity), half of the flows are directed to the core area and the other half to the non-core area (periphery); these flows are equally divided into different zones (high out-neighbors' zoneentropy). In other words, a high SPI subzone has a large number of travelers originating from there, and these individuals are on their way to both core and periphery places, which are located in various zones. Therefore, with such a high SPI index value, the disease spreading would be facilitated within a short period of time. The flow intensity and diversity measurements are all normalized in the unit interval, and consequently the geometric average also varies between zero and one.
The spatial susceptible index SUI (see Eq. (11)) is constructed in a completely similar way as the SPI, with the exception that we are considering all incoming components as opposed to outgoing ones in the SPI: e.g. local normalized weighted in-degree ( NWInDegree(i) ), the zone-entropy of incoming neighbors ( H zone InNeigh (i) ), and the in-coreness-entropy of incoming neighbors ( H core InNeigh (i) ). Again, the concept associated with the SUI is better understood when considering a subzone with large incoming flows: half of the flows are coming from the core area and the other half from the non-core area, and these flows are equally coming from different zones. In other words, this subzone is a destination for a large number of travelers originating from various zones and their origins of movement contain both core and periphery areas. Therefore, a high SUI subzone is expected to be a place where travelers would be more vulnerable and sensitive to being infected. Like the SPI, the SUI varies in the unit interval.
Both SPI and SUI are calculated as the geometric average of the three components, including normalized weighted degree, zone-entropy and coreness-entropy. We have also tested the arithmetic average of these three , if neigh = InNeigh.
ln |core(All)| , Scientific Reports | (2020) 10:18642 | https://doi.org/10.1038/s41598-020-75697-z www.nature.com/scientificreports/ components (see Fig. S3 in Supplementary Material). We chose the geometric average method because the two proposed indexes are meant to be used for identifying super-spreader and super-susceptible locales. Thus, only when all the three components are high, the spreading effectiveness of a subzone shall be considered high.

Results
Local intensity of human movement flows. The spatial distribution of the non-weighted/weighted in-degree and out-degree for weekdays are shown in Fig 3. To observe the spatial distribution of the in-and out-degree, the townships are separated into four groups using the 25%, 50% and 75% percentile of the corresponding degree values as cutoffs, thereby giving the "low", "mid-low", "mid-high" and "high" intensities. It appears that the patterns for the non-weighted and weighted in-degrees (top row) are similar to those of their out-degree counterparts (bottom row). This points to the fact that inflows and outflows are fairly balanced, which is expected for daily aggregated data associated with steady human movements. For the non-weighted degree measurements (left column), the high in-and out-degree subzones appear to be mainly concentrated at the East, North East and Central regions, whereas the West and North have a higher number of lower degree subzones. These results are correlated with the distribution of human density in Singapore, namely high to very high in the East, North East and Central regions, and lower in the West and North of the island. For weighted degree measurements (right column), the East region has higher degree subzones; the number of high degree subzones drop in the Central region; North, North East, and West regions have relatively more higher degree subzones when compared with their non-weighted counterparts. The distribution of the non-weighted measurements for weekends are essentially the same as the results for weekdays. Figure 4 displays the differences in weighted in-and out-degree between weekdays and weekends. Most subzones are in the lightest green or purple colors, thereby indicating that their degree measurements are only very slightly larger than each other (the differences are less than 1.3 times). These subzones have a similar number of people using public transportation during weekdays and weekends. Only a few subzones are in dark colors indicating larger changes as compared to weekdays. These subzones reveal a notably different usage of public transportation at these locations between weekdays and weekends; the changes of usage for weekdays are twice larger than weekends (dark purple), or the other way round (dark green).
Community detection. As discussed in the "Materials and Methods" Section, a critical component of our network analysis is based on community detection. Figure 5 shows the spatial distribution of communities for both weekdays and weekends. The MapEquation algorithm with the provided data reveals 17 different commu- Figure 3. Spatial distribution of the degree centralities for the weekday dataset. Left column (a,c) shows the distribution for non-weighted measurements and the right column (b,d) shows the distribution for weighted measurements of the degree. The top row (a,b) displays the in-degree, while the bottom row (c,d) refers to the out-degree. The townships are separated into four groups using the 25%, 50% and 75% percentile as breaks, thereby giving the "low", "mid-low", "mid-high" and "high" intensities. Generated with Python (3.7.5), Matplotlib (3.2.1) and GeoPandas (0.7.0).

Scientific Reports
| (2020) 10:18642 | https://doi.org/10.1038/s41598-020-75697-z www.nature.com/scientificreports/ nities for both weekday flow network and weekend flow network. Most communities are spatially continuous as the flow data is integrated with the inverse of the distance. However, some exceptions exist in both weekday and weekend communities (e.g. weekday and weekend community #2). The spatially-continuous patterns are expected given the spatial embedding of our networks and it indicates, as expected, that interactions between closer subzones are effectively stronger. On the other hand, the few spatially-split communities appear to be the by-product of a strong flow of human movement between two spatially-distant locations with sparser spaces between them.
Although weekday communities and weekend ones are different-some are split and others have different boundaries-overall, they show some notable similarities (e.g. weekday community #11 and weekend community #10). This observation can be attributed to two particular features of Singapore: (1) given the limited available land, Singapore has a dense and compact urban landscape with a high level of mixed-use areas, be them residential, industrial and/or commercial, (2) a non-negligible fraction of the working population is active on Saturdays, which creates a high flow of travelers with the same commuting patterns as during weekdays. For instance, in the Western region, weekday communities #4 and #16 are extremely similar with weekend communities #3 and #17. These particular communities are fairly large with a heavy mixed-use of residential and industrial areas, where people have similar daily activities within a week. The North East Region (NER) contains three similar communities during weekdays and weekends (community #2 (upper part), #14, and part of #11 during weekdays, and the similar patterns of # 2 (upper part), #13, and #10 during weekends). The North Region (NR) is split into multiple communities (community # 2 (lower part), #7, #8, #10, #11, #15 during weekdays, and # 2 (lower part), #7, #9, #10, #15 during weekends). The identified communities #1, #2, #5, #6, #9, #12 during weekdays, and communities #1, #2, #5, #11, #16 during weekends are similar and fit well with the Central Region (CR), which is the central business district of Singapore. The community detection results show that the boundaries of human activity can be changed between weekdays and weekends. Community #4 in weekends appears to be an area resulting from the merger of communities #5, #17 and part of #8 during weekdays. This indicates that the area has stronger human movement interactions during weekends than weekdays, probably because the area is mostly residential with few shopping places providing daily needs products and necessities. In summary, the human movement boundaries are not fixed to a static pattern, and it is usually smaller than the shape of the known regional/administrative boundaries.
Coreness. The spatial distribution of the core area is shown in Fig. 6. As detailed in the "Materials and Methods" Section, the calculation of coreness is separated into two parts for each network, one of which uses the (weighted or unweighted) in-degree, and the other the (weighted or unweighted) out-degree. Hence, two sets of coreness results (outgoing core area and incoming core area) are obtained for each network. Some areas are identified as core in both incoming and outgoing directions (red subzones in Fig. 6), some are core for either incoming (pink subzones in Fig. 6) or outgoing (purple subzones in Fig. 6) but not both. However, the vast majority of areas are core ones from both the incoming and outgoing flows perspective. These red areas happen to have a notable overlap with residential areas with a high population density, thereby indicating that places where people live would always have high incoming and outgoing flow: a core area of human movement and commuting.
Spreader and susceptible indexes. The calculation of spreader and susceptible indexes require access to the local normalized in-degree and out-degree centrality, as well as the incoming and outgoing neighborhood zone-entropy (Eqs. (4)-(6)) and coreness-entropy (Eqs. (7)-(9)). Note that these three key indicators (local weighted degree, zone-entropy and coreness-entropy) are in the unit interval, i.e. with variations between zero and one. Figure 7 shows the local out-and in-degree (left column), the outgoing and incoming neighborhood zone-entropy (central column) and coreness-entropy (right column) of the weekday (first two rows) and weekend (bottom two rows) flow networks. For observation purposes, the subzones were grouped into "low", "midlow", "mid-high" and "high" categories using the 25%, 50% and 75% percentile of each variables as cutoffs. The www.nature.com/scientificreports/ spatial distribution shows notable differences between centrality, zone-entropy and coreness-entropy. In addition, high levels of local weighted out-and in-degree are mostly concentrated in the East, North East, and Central Regions. As for the zone-entropy, these high levels are primarily located in the North and Central Regions, while high levels of coreness-entropy are mostly found in subzones in the North Region. Essentially, most of the subzones have high levels of one, two or even three of these key indicators. However, only subzones with high levels of all three indicators are SSP or SSS. The distribution of the spreader index (SPI) and susceptible index (SUI) of each subzone for weekdays and weekends are shown in Fig. 8. All four distributions suggest a similar Poisson-like type of distribution, with a mean value between 0.255 and 0.265 (solid vertical lines). A detailed comparison between the two indexes and the three components is given in Fig. S2 (see Supplementary Material). The fact that these mean values are very close for both indexes on weekdays and weekends is in line with our previous comment related to an expected balance between incoming and outgoing flows of human movement. However, for our analysis the locations of interest are those that are outliers corresponding to large SPI and/or SUI values. Using the interquartile range (IQR) method, the outliers are identified as the subzones located above the Q 3 + 1.  www.nature.com/scientificreports/ quartile ( Q 3 ) is also shown in Fig. 8 as a reference level (dotted vertical lines). The subzones that lay between Q 3 and Q 3 + 1.5 × IQR are categorized as secondary-spreaders or secondary-susceptibles. For comparison purposes, we tested the arithmetic average of the three components and compared it with the geometric average results (Fig. S3 in the Supplementary Material); we also present Fig. S4 (see Supplementary Material) to highlight the differences between the SPI or SUI, and the geometric average of either two of the three components (degree, zone-entropy, and coreness-entropy). This analysis reveals that a non-negligible number of locations exhibit large SPI and/or SUI values, thereby contributing to our identification process of spatial super spreaders and spatial super susceptibles.
Comparison with population density. The spatial distribution of population can naturally be considered to be key to understanding the spreadability and susceptibility of a given place. Indeed, places with higher population density could be suspected to have higher spreadability of and susceptibility to an infectious disease. To test this hypothesis, we compare the subzones' SPI and SUI with the residential population density (see Fig. 9). The low correlation coefficients (including the Pearson coefficient and the Kendall tau are below 0.4) indicate that the SPI and SUI are not correlated with population density. Interestingly, some low population Figure 6. Distribution of core/non-core areas from the weighted k-shell decomposition. The coreness in (a) refers to weekday flow data, while in (b) it is for weekend flow data. Red-colored areas are for subzones identified as both incoming and outgoing core areas, purple-colored areas refer to solely outgoing core subzones, and pink-colored subzones highlight solely incoming core subzones. Generated with Python (3.7.5), Matplotlib www.nature.com/scientificreports/ density subzones have a high SPI or SUI. These subzones may be categorized as business or commercial land use areas, hence the low or zero residential population. On the other hand, some of the high residential population density subzones (i.e. above 30, 000) have a low SPI or SUI. This may because their public transport flow structure is relatively simple, i.e. the outgoing or incoming flows are connecting with places with similar features (in the same zone or with the same coreness level). Since residential population distribution represents the spatial patterns of where people live, it is only one type of the destination of population movements, and it therefore lacks the capability to capture places where people work and interact with each other, e.g. commercial area, business area and transport hubs.
Susceptibility vs. spreadability. While both SPI and SUI are calculated based on the same spatial flow network, they capture fundamentally different concepts in relation with disease spreading interactions as compared to classical influential node concept in network science 24 . The key differences between the calculation of SPI and SUI is the flow direction, i.e. incoming flow to a node or outgoing flow from a node. In Fig. 10, we compare the outgoing and incoming measurements for the three components (weighted degree, zone entropy, and coreness entropy), and the two indexes (SPI and SUI). Strong correlations exist between the normalized weighted in-and out-degree (NW In-degree and NW Out-degree, in Fig. 10a for weekdays and (e) for weekends, both correlation coefficients are above 0.9). This is expected as we consider daily average flows to compute the weighted in-and out-degree, i.e. the people who leave their home area in the morning will eventually go back to their home area at some point during the day. Differences between the zone entropy of incoming and outgoing neighbors can be observed in Fig. 10b,f (for weekdays and weekends, respectively, with Pearson coefficient Figure 7. Spatial distribution of the three key indicators: weighted degree, zone-entropy and coreness-entropy. Left column: local weighted in-and out-degrees; Central column: outgoing or incoming zone-entropy; Right column: outgoing or incoming coreness-entropy. First two rows: weekdays; Bottom two rows: weekends. The subzones are separated into four groups using the 25%, 50% and 75% percentile as cutoffs, thereby giving the "low", "mid-low", "mid-high" and "high" categories. Generated with Python (3.7.5), Matplotlib (3.2.1) and GeoPandas (0.7.0).  (Fig. 10d for weekdays and (h) for weekends) show the relationship between SUI (horizontal axis) and SPI (vertical axis). Since both indexes are geometric averages of the previous three directed components, in overall it shows correlated patterns because of the degree and zone-entropy, and the subzones with larger indexes (i.e. Q 3 ≤ SPI ≤ Q 3 + 1.5IQR and Q 3 ≤ SUI ≤ Q 3 + 1.5IQR ) tends to scattered within the box that is composed by the dashed ( Q 3 ) and dotted ( Q 3 + 1.5IQR ) reference lines.

Scientific Reports
This study focuses on the disease spreading process, thus emphasizing the differences between the outgoing and incoming directions. A subzone with high SPI indicates that it has a stronger capability to affect other subzones due to the fact that a lot of people are leaving from this subzone, and they are moving to a high variety of places. On the other hand, a subzone with high SUI indicates that it is easily affected by other subzones owing to an intense flow of people coming from a large variety of places. Figure 10 shows that the number of people leaving from a subzone is highly correlated to the number of people going to the subzone; but the diversity in terms of zone-and coreness-entropy shows differences between the subzones' incoming and outgoing neighbors. For instance, the coreness-entropy of a subzone's incoming neighbor can be high while its outgoing neighbor's coreness-entropy is low. Although the correlation between SPI and SUI is high, we can still observe some deviations between them, especially beyond Q 3 and below Q 3 + 1.5IQR.

Spatial super-spreaders and super-susceptibles. The spatial distributions of super-spreaders (SSP)
and super-susceptible (SSS) is shown in Fig. 11 for weekdays and in Fig. 12 for weekends. For weekday flow movement (see Fig. 11), 9 subzones are identified as SSP (red-colored zones in Fig. 11a) corresponding to www.nature.com/scientificreports/ SPI ≥ Q 3 + 1.5 × IQR ; 11 subzones are identified as SSS (red-colored zones in Fig. 11b) corresponding to SUI ≥ Q 3 + 1.5 × IQR . It is worth noting that 9 subzones overlap in both figures, thereby corresponding to both spatial super-spreaders and super-susceptibles (subzones a-i in both figures, shown as red-colored subzones with a purple border). This indicates that most of the subzones with the highest SPI values would also have the highest SUI values, and vice versa. In Fig. 11a, all identified SSP are also identified as SSS. In Fig. 11b, two subzones-j Khatib, and k Tampines East-are identified as SSS only, with a lower SPI ( Q 3 ≤ SPI < Q 3 + 1.5 × IQR). The weekend distributions exhibit slightly different patterns. There are 9 subzones identified as SSP on weekends, with 8 of them also being identified as SSP on weekdays (subzones a to h in Fig. 12a); none of which are less than Q 3 in the previous figure. Similarly, all weekend SSS are either super-or secondary-susceptibles on weekdays, and vice versa. A total of 13 SSS are found with the weekend human movement network (Fig. 12b); 9 of them (subzones a to i) are also weekend SSP; 11 of which overlap with those of the weekday SSS results, the other two subzones-j Boulevard, and k Bukit Batok Central-are promoted from weekday secondary-susceptibles subzones (pink subzones in Fig. 11b). This result further confirms that the SPI and SUI are not dramatically different between weekdays and weekends.
There are eight subzones (a to i except h in Fig. 11), and (a to h in Fig. 12), including three at the West Region (Choa Chu Kang Central, Jurong Gateway, and Jurong West Central), two at the Central Region (Maritime Square and Toa Payoh Central), and three at the North Region (Sembawang Central, Woodlands Regional Centre and Yishun West) are identified as both SSP and SSS (in red) in both weekdays and weekends. During weekdays, most of the identified SSP or SSS areas belong to the regional core that contained a higher density of human activity. The eight SSP and SSS can be separated into two types. The first type consists of five subzones (a, c, e, f, and i in Fig. 11), which contain high population density; the second type consists of the other three subzones (b Jurong Gateway, d Maritime Square, and h Woodlands Regional Centre in Fig. 11) associated with a lower population density. The subzones in the first type are typical residential area, where the intensity of human activity are high due to the extensive need to travel out during the day time and travel back in the evening. On the other hand, the subzones in the second type are regional hubs of public transportation, which naturally attract a large population flows. For example, Maritime Square contains Harbourfront MRT, which is the terminal station of two One counter-intuitive observation can be made from Figs. 11 and 12: the CBD contains less SSP and SSS as one could expect. The CBD of Singapore is located at the southern central part of the Central Region. High intensity of human activity exists within the CBD area. As shown in Fig.7, most of the subzones in the CBD have either a low weighted degree or a low neighborhood coreness-entropy. The low weighted degree probably finds its origin in the smallness of the area itself, which limits the catchment of incoming and outgoing flows. As for the low coreness-entropy, we trace it to the fact that a majority of the people are circulating within the CBD, which are mainly composed by the core area (Fig. 6). This result indicates that the CBD workplaces are less influential in terms of quickly spreading the disease to the rest of the city/island, but a contagious disease would quickly spread inside the CBD area as a consequence of its strong internal flows. In summary, the key influential areas are clearly identified as being the regional transport hubs, which connect the residential areas with the rest of the country.

Discussion
The concept of super-spreader was originally introduced in the field of social network analysis to identify the most influential persons or nodes within a given social network. These persons could be opinion leaders, trend setters, public figures within a group of people 17,65 . Furthermore, this concept of super-spreader individual has been borrowed by epidemiologists to identify and study the abnormally high spreading activity of a small group of individuals 16,66 in large populations during an epidemic outbreak.
While previous studies focused on the identification of super-spreaders within a social network-nodes are individuals and edges represent the existence of interactions between two persons (binary edge)-this study focused instead on spatial networks of population flow with nodes representing physical locations and weighted/ directed edges representing flows of human movement. This study sought to extend the concept of super-spreader to spatial interaction networks, with the objective of identifying possible spatial super-spreader locations-a set of locations that have the most influential effects in terms of disease spreading. The concept and calculation method were also reversed to uncover another group of critical locations: the most vulnerable places defined as super-susceptibles.
Our results based on large-scale data analytics show that most of the SSP are also SSS. This is reasonable and somehow expected given the nature of the daily population flow network. Specifically, since we are considering daily-aggregated data, the number of people who are leaving from a place can be expected to be of the same order as the number of people who are going to this place, i.e. we are in the presence of balanced commuting flows and the larger the outgoing flow intensity, the larger the incoming flow intensity. Based on the results, the places with intense flows have higher potential to be both SSP and SSS, and this is captured by the directed nature of the www.nature.com/scientificreports/ networks and the incorporation of the weighted in-degree or out-degree in our calculations. It is worth noting that our results are in good agreement with previous studies based on the k-shell decomposition method: the core nodes of a social group tend to be, in general, the most influential ones 17,29 .
Besides the local incoming and outgoing flow intensities, this study also considers two critical neighborhood diversities of these networks: the zone-entropy and coreness-entropy. The diversity of neighborhood is especially important while identifying multiple super-spreaders from a network 18,37 . The zone-entropy is used to measure if the outgoing flows are directed towards more zones within the city-state. For instance, if the outgoing flows from a given place are converging to one zone only, this place can only affect one of the zones among all throughout Singapore, thus its influential power is clearly weak. Conversely, if human movement originating from one place flows to many zones across the country, its influential power is relatively high. In addition, coreness-entropy captures the diversity of flows to or from core or periphery areas. If the flows are all directed towards one of the periphery or core, its influential power is somehow limited to this particular type of areas. Conversely, if human movement flows to both core and periphery areas, this clearly indicates that whenever an outbreak happens at this place, it could quickly affect and spread to both core and periphery areas. These two diversity metrics complement one another and are combined in the calculation framework for differentiating places with high density of flows into strong and weak influential places (see Materials & Methods).
This study enables us to establish a list of subzones, which have a strong capability in terms of diseases spreading, as well as a list of subzones, which are more vulnerable in terms of being a place of high risk of contagion. In summary, the identified subzones are found to be mainly in the core area of residential and transportation Figure 11. The spatial distribution of (a) spreader index (SPI), and (b) susceptible index (SUI) for weekdays. The subzones with purple border in (a,b) respectively indicate the super-susceptible ( SUI ≥ Q 3 + 1.5 × IQR ) and super-spreader ( SPI ≥ Q 3 + 1.5 × IQR ). Generated with Python (3.7.5), Matplotlib (3.2.1) and GeoPandas (0.7.0).

Scientific Reports
| (2020) 10:18642 | https://doi.org/10.1038/s41598-020-75697-z www.nature.com/scientificreports/ hubs. These places have high population density and activity, such as transportation hubs or community hubs. Therefore, these places should be targeted by public health agencies, with higher resource allocations and disease monitoring aimed at prevention and intervention purposes. For example, public health agencies could consider these locations while planning to setup body temperature checkpoints, or to provide personal hygiene toolkits, or also setting up advertisements related to appropriate behaviors to counteract the ongoing epidemics. Moreover, since these locations are more vulnerable and more influential, they should get more attention while setting up differentiated policies such as the temporary closure of some businesses or restrictions on large-scale human activities as opposed to a blanket lockdown across the country. The proposed network analysis framework rests upon the integration of the local flow intensity with neighborhood diversity measures-zone and coreness-to assess the effective spreading ability of particular locations. From the theoretical perspective, the proposed framework considers weighted and directed interactions between nodes (places) to identify super-spreaders and super-susceptibles. From the practical perspective, this study presents a quantitative and systematic framework to identify the key influential and vulnerable locations based on public transport flow data usually available by most transportation agencies in metropolitan areas.
It is worth noting that there are several limitations to this study. First, our analysis is limited to human flow associated with the use of public transportation, which is high in places like Singapore or other continental European cities but could be much lower in other urban areas with far less developed public transportation networks, such as in the United States for instance. In addition, our data only includes ridership of buses and trains and misses out on other important means of public transportation, including taxis, private-for-hire automobiles www.nature.com/scientificreports/ (cars, motorcycles, shuttle buses or vans), and active transportation (by walking, bicycle, skateboard, scooter, personal mobility devices, etc.). Some of the subzones currently do not have bus stops or train stations. However, as mentioned previously, public transportation by bus and train in Singapore is fairly high-more than 60% of daily commuting-thereby confirming the importance of the obtained results, as being representative of key human movement patterns. Second, Singapore is an island country with its northern national border connected to Malaysia through two land checkpoints. Unfortunately, these cross-border flows are not included in this study. Many workers and students commute daily between Singapore and the state of Johor in Malaysia. There are some dedicated bus services directly connecting stations in Johor Bahru, Malaysia and various places across Singapore, including Woodlands at the North Region, Jurong East at the West Region, and Bugis at the Central Region, etc. Since these data were ignored, the in/out-flows of these places in Singapore are certainly underestimated.
Third, inter-mode trip transfers and bus transfers are not captured in the dataset used to carry out our study. Trip transfers between Mass Rapid Transit (MRT) lines are captured from the tap-in and tap-out records, i.e. passengers changing lines at some interchanges. But the OD data for buses only records the direct flow between bus stops, i.e. the records present only the tap-in and tap-out bus information, the records of the exchange of bus services are not shown/captured in the data. On the other hand, the data about changing from bus to train and vice versa is also unfortunately not available. Therefore, we can only capture direct bus services and this naturally limits the movement of travelers to the existing direct bus/train services.
Fourth, the short-time scale dynamics throughout a day is ignored. Indeed, we considered daily-aggregated data. However, a higher temporal resolution could be considered (say on an hourly basis), which could reveal different patterns of SSP and SSS. The temporal evolution of the SUI and SPI indexes would be the topic of a future study. On the other hand, a long-term analysis may also provide insights into the evolution of SPI and SUI or spatial super-spreaders and super-susceptibles over time. The bi-monthly analysis of SPI and SUI (Fig.  S5 -S8), and identified SSP and SSS (Fig. S9-S14) are given in Supplementary Material. However, the distribution pattern for the long-term analysis requires a more in depth investigation and discussion owing to a possibly large number of unidentified factors that may affect the overall human movement structure.
In summary, we have developed for the first time a framework allowing the identification of spatial superspreader and super-susceptible locations. We believe that our results and analysis could be extended in two key directions. First, our analysis would benefit from being complemented by working with epidemiologists specialized in simulations of disease spreading through human contact networks. This would integrate our results with differential spreading across more or less vulnerable places. Specifically, the dynamic patterns of disease propagation could be observed from the simulation models, and thus the effects of the SSP and SSS could be quantified in terms of its actual contamination rate in the population. Second, the geography, demography, and social-economic of the spatial super-spreaders and super-susceptibles could be accounted for and included in our analysis using some statistical models, to identify the potential social and physical environmental factors that made these locations super-spreaders and super-susceptibles.
In conclusion, it is well known that dealing with the reopening of economies and cities after a blanket lockdown requires a finely calibrated approach from governments. Although, here we used the Singapore public transport flow data to build these networks as a case study, similar analyses can readily be carried out using the exact same process in order to uncover the SSP and SSS in any large urban center. Our data-driven methodology, analysis and results offer an effective way of devising targeted and localized preventive measures when lifting stay-at-home orders. Such targeted measures for vulnerable locations are also critical in order to optimize government resources in the face of economic decline.

Data availability
The datasets-generated from the Singapore LTA database 57 -used for this study are available from the following Spatial_Spreader_Susceptible_data repository: https ://githu b.com/wcchi n/Spati al_Sprea der_Susce ptibl e_data.