Multiscale core-periphery structure in a global liner shipping network

Maritime transport accounts for a majority of trades in volume, of which 70% in value is carried by container ships that transit regular routes on fixed schedules in the ocean. In the present paper, we analyse a data set of global liner shipping as a network of ports. In particular, we construct the network of the ports as the one-mode projection of a bipartite network composed of ports and ship routes. Like other transportation networks, global liner shipping networks may have core-periphery structure, where a core and a periphery are groups of densely and sparsely interconnected nodes, respectively. Core-periphery structure may have practical implications for understanding the robustness, efficiency and uneven development of international transportation systems. We develop an algorithm to detect core-periphery pairs in a network, which allows one to find core and peripheral nodes on different scales and uses a configuration model that accounts for the fact that the network is obtained by the one-mode projection of a bipartite network. We also found that most ports are core (as opposed to peripheral) ports and that ports in some countries in Europe, America and Asia belong to a global core-periphery pair across different scales, whereas ports in other countries do not.

Scientific REPORtS | (2019) 9:404 | DOI: 10.1038/s41598-018-35922-2 correspond to either regional or global (or intermediate) groups of ports in each of which some ports may serve as core ports whereas the others may play a role of peripheral ports. Second, in our previous studies 5, 6 , the algorithm found CP pairs more accurately than other algorithms did in artificial networks with planted CP pairs. We construct the GLSN from the empirical data on the liner shipping services operated by world's top 100 liner shipping companies in terms of fleet capacity (i.e., the twenty-foot equivalent unit capacity of the fleet). The data altogether account for over 92% of the total fleet capacity in the world.
To reveal the CP structure in the GLSN, we extend our previous algorithm in the following three manners. First, we adopt a null model that is compatible with the way we construct the GLSN from the data. Specifically, the original data set is regarded as a bipartite network composed of a layer of port nodes and a layer of shipping route nodes (Fig. 2). Edges represent which ports belong to which shipping routes. Our null model discounts the effects induced by the one-mode projection of an originally bipartite network. Second, our previous algorithms have a resolution limit, with which one can not find CP structure smaller than a threshold size 6,20 . To circumvent this problem, we use a multiresolution method for community detection 21,22 to extend the algorithm. Third, our previous algorithms provide different CP structures in the different runs of the same algorithm even if the initial condition is the same. In the present study, we run the algorithm 100 times and look at the consensus of the results obtained from the different runs. Figure 1. Adjacency matrix of a network with two CP pairs. The filled cell or empty cell indicates the presence or absence of an edge, respectively. The solid line indicates the partition of nodes into the two CP pairs. The dashed lines within each CP pair indicate the subpartition of nodes into the core and periphery. Each core block (top-left block in each CP pair) and periphery block (bottom right block in each CP pair) consist of 20 nodes and 40 nodes, respectively. The probability that each pair of nodes is adjacent by an edge is equal to 0.95 within each core block. The same probability is equal to 0.8 between the core and periphery blocks within each CP pair. The same probability is equal to 0.05 within each periphery block and between different CP pairs. We draw edges according to these probabilities, independently for the different node pairs. The present algorithm is applicable to networks constructed from a one-mode projection of bipartite networks. Examples of such networks include human disease networks 23 , metabolic networks 24 and mutualistic networks 25 . The Python code of the present algorithm is available on GitHub 26 .

Results
Number of calling ports, number of serving routes, and node strength. The distribution of the container capacity of a route (i.e., the sum of the maximum volume of containers that shipping companies deploy on the shipping route) is shown in Fig. 3(a). The container capacity is heterogeneously distributed; a majority of the shipping routes has a capacity less than 10 2 , while 2% of the routes has a capacity larger than 10 5 . Degree d i port of ports in the bipartite network is also heterogeneously distributed ( Fig. 3(b)). A majority (56%) of ports is shared by less than five routes, whereas 13 ports (1.3%) including Shanghai and Singapore are shared by more than 100 routes. Degree d r route of routes in the bipartite network is more homogeneously distributed than d i port . A majority (52%) of routes contains less than five calling ports. The largest number of calling ports in a route is 31, which covers only 3.2% of the N = 977 ports.
The degree of each port in the GLSN is shown in Fig. 3(c). A majority of ports (540 ports; 55%) has a degree less than 25 in the GLSN, while 60 (6%) ports have a degree larger than 100. We define node strength (i.e., weighted degree) of each port by the sum of the weight of edges attached to the port. As is the case for the container capacity, node strength is heterogeneously distributed (Fig. 3(d)). Most ports (813 ports; 83%) have a strength less than 2 × 10 5 , while 51 ports (5%) have a strength larger than 10 6 .
Multiscale CP structure. We identify consensus CP pairs (we call them CP pairs for short in the following text) using the algorithm presented in the Multiresolution algorithm section. The present algorithm is equipped with a resolution parameter γ, with which one can control the characteristic size of CP pairs to be detected. Different γ values may yield considerably different results. Therefore, we examine CP pairs across a range of γ, i.e., γ ∈ {0.01, 0.1, 0.2, 0.3, …, 4}.
We show the CP pairs detected at some γ values in Figs. 4-6. There are at most five CP pairs. For 0.01 ≤ γ ≤ 1.9, the algorithm identifies a unique CP pair containing ports in various geographical regions (Fig. 4). We refer to this CP pair as CP pair 1. The number of ports in CP pair 1 decreases from 951 ports at γ = 0.01 to 76 ports at  For 2 ≤ γ ≤ 3, the algorithm identifies three CP pairs (Fig. 5). As is the case for 0.01 ≤ γ ≤ 1.9, CP pair 1 contains the ports across many regions. At γ = 2.0, the algorithm identifies CP pair 2 that branches from CP pair 1 ( Fig. 5(a)). CP pair 2 contains most ports in the East Coast of the US, a Canadian port (Halifax) and an Egyptian port (Suez). At γ = 2.1, the algorithm identifies CP pair 3 located in the South Africa ( Fig. 5(a)). CP pair 2 persists and enlarges in most cases as γ increases. In contrast, CP pair 3 is absent for γ ≥ 2.2 ( Fig. 5(b)).
For 3.1 ≤ γ ≤ 4, the algorithm identifies four CP pairs. Each of CP pairs 1 and 2 spans different continents ( Fig. 6). At γ = 3.1, CP pair 1 contains a majority of Chinese ports, the only port in Singapore and ports in the West Coast of the US. CP pair 2 contains most ports in the East Coast of the US, two ports in the Mediterranean   Sea, a port in Sri Lanka. The other CP pairs 4 and 5 also branch from CP pair 1 and are composed of geographically close ports. In fact, CP pairs 4 and 5 mostly consist of the Mediterranean ports and North European ports, respectively. The membership of each port at each γ value is shown in Fig. 7. The number of ports in CP pair 1 decreases as γ increases. CP pairs 2, 4 and 5 detected for 2 ≤ γ ≤ 4 are part of CP pair 1 detected for smaller γ values. CP pair 4 is absent for some γ values for 2.6 ≤ γ ≤ 3 but persists for 3.1 ≤ γ ≤ 4. As γ increases, CP pairs 2, 4 and 5 largely expand by absorbing ports that belong to CP pair 1 at small γ values.
The distribution of the coreness values of ports in any CP pair is shown in Fig. 8. For all γ values, most ports have a coreness value larger than 0.9. Therefore, the algorithm has classified most ports as core ports in most runs. If a CP pair only consists of core nodes, then the CP pair is a group of nodes that are densely interconnected with each other, which is equivalent to the usual notion of community. Therefore, the current result indicates that the detected CP pairs are close to communities. This property holds true for all γ values that we have examined.
Persistence of ports. CP pair 1 considered across different resolutions (i.e., γ) has a nested relation. In other words, CP pair 1 at resolution γ contains CP pair 1 at all larger γ values in a majority of cases. This is the case for all but two ports when one varies γ in the range 0.01 ≤ γ ≤ 4. Based on this observation, we define the persistence of a port as the smallest γ value above which the port does not belong to CP pair 1 for the first time as one increases γ. In other words, the persistence is the largest value of γ such that the port belongs to CP pair 1 for all resolution values up to that γ value. We note that the persistence is independent of γ. The persistence of each port is represented by the size of the circle in Fig. 9. In the figure, only the ports belonging to CP pair 1 at γ = 0.01 are shown. Highly persistent ports (e.g., persistence value larger than 3) are concentrated in China, the North Sea, the Mediterranean Sea, the Malay Peninsula, the Red Sea, and the West Coast of the US. The two highly persistent ports in the Malay Peninsula, Singapore and Tanjung Pelepas, face the Strait of Malacca, which is an important shipping lane in the world 27 . There are few highly persistent ports in the Caribbean Sea, Japan, Oceania, the East Coast of the South America, the East Africa and the West Africa. Therefore, these regions may be relatively segregated from the main international shipping trade networks.
We show the ports with the persistence value larger than 2.8 in Table 1. Highly persistent ports have a relatively large node strength (i.e., weighted degree). More precisely, the persistence and node strength are positively correlated with the Spearman correlation coefficient being equal to 0.83. We find that 497 ports (51%) have a persistence value less than or equal to 0.1, while 64 ports (7%) have a persistence value larger than 2.

Discussion
We developed a multiscale algorithm to identify CP structure in a one-mode projection of bipartite networks, which intends to reveal multiscale CP pairs across different scales. We applied the algorithm to a GLSN and revealed the inequality of regions in terms of the extent to which they are integrated into the global maritime transportation system. Specifically, our algorithm uncovered the following properties of the CP structure in the GLSN.
First, at a coarse resolution, we detected a unique CP pair (CP pair 1) that mainly consists of ports in Asia, Europe and North America (Fig. 4(c)). As major production and consumption centres on a global scale, these three regions have long been seen as dominating poles in global trade and container shipping activities 28 . Container shipping services that connect Asia and Europe, Asia and North America, and Europe and North America constitute the world's main East-West trading lanes, well-known as "East-West Corridor" in the maritime shipping industry 29 . Our result also provides some information on the integration of the economy in different regions into the global markets. For instance, the ports in CP pair 1 are located in leading countries in trades (e.g., China, France, Germany, the United Kingdom and the United States) but not in Japan. The absence of Japanese ports indicates that the integration of Japan into the global maritime transportation system may be insufficient, despite its status as the world's fourth-largest export economy in value. This situation might have a negative influence on the country's international trade development in the long run.
Second, for finer resolutions, the algorithm identified four small CP pairs that branch from CP pair 1 (Fig. 6). These CP pairs involve main regional liner shipping markets of North Europe, Mediterranean, East Asia and North America, respectively. Two out of the four CP pairs, which are composed of major container ports in Northern Europe (CP pair 4) and the Mediterranean (CP pair 5), respectively, are geographically concentrated. In the liner shipping industry, they are highly developed conventional markets of intra-regional seaborne trade in Europe. In contrast, the other two CP pairs extend across distinct geographical regions, corresponding to two inter-regional shipping routes in the West-East direction: North American East Coast-Mediterranean Sea-Indian Subcontinent shipping route via Suez Canal (CP pair 2) and North American West Coast-East Asia shipping route across the Pacific Ocean (CP pair 1). In particular, the dominance of China and the US in CP pair 1 is consistent with the high intensity of the bilateral trade between China and the US, the world's two largest countries in commodity trades 30 .
Third, the present algorithm classified a majority of ports in the GLSN as core ports as opposed to peripheral nodes (Fig. 8), as indicated by their high coreness values. Although we do not know why this is the case, the result underlines the specificity of the GLSN. In fact, in worldwide airport networks, more than half of the airports were classified as peripheral nodes 5,6 . This comparison indicates that the GLSN may be better regarded as a collection of communities, which is in agreement with the previous work reporting the community structure of global maritime shipping networks 31 . It should be noted that we found that CP pairs in the GLSN were similar to communities because we actually ran CP analysis.
Fourth, the persistence that we calculated for each port might be useful in evaluating the extent to which a port is integrated into the main international seaborne trade markets. The majority of the most persistent ports are regional load centres in the container shipping markets, i.e., world's leading container ports in terms of the yearly container throughput volume 32 . Examples include East Asian ports of Busan, Guangzhou, Hong Kong, Ningbo-Zhoushan, Qingdao, Shanghai, and Shenzhen, Southeast Asian ports of Singapore and Tanjung Pelepas, North American West Coastal ports of Long Beach and Los Angeles, and European ports of Antwerp, Hamburg and Rotterdam.
Our study has the following limitations. First, we did not inform the edge weight by the actual container traffic between ports due to the commercial confidentiality. Instead, we used traffic capacity deployment data provided by shipping companies to approximate the actual traffic, assuming that the traffic capacity between any port pair on a same shipping service route was equal and bidirectional. Second, one-mode projection discards much information about the original bipartite network composed of the ports and routes. To mitigate this problem, one can use other one-mode projection methods that reflect some properties of the bipartite network to the projected networks [33][34][35] . Another approach is to study the original bipartite network without one-mode projection. Third, we did not analyse another family of CP structure, i.e., transportation-based CP structure 3,7,8,11 . Transportation-based CP structure dictates that a core is a group of nodes that are frequently used in paths connecting nodes, e.g., nodes with high betweenness centrality. Because GLSNs underlie maritime transportation, analysis of transportation-based CP structure may yield useful knowledge of the flow of cargo across the world.

Methods
Data set. We use an empirical data set provided by Alphaliner 36 , which reports the statistics of R = 1,631 major liner shipping service routes in the world for the year 2015. On each liner shipping service route (hereafter, shortened as service route), container ships call at a sequence of ports with a fixed service schedule. Cargo ships may call at ports for bunkering and maintenance, which are not directly associated with trade. The present data set contains only the calling ports for cargo loading and unloading, ensuring a high relevance to world seaborne trade. There are N = 977 ports in total. We denote by d r route the number of calling ports for route r. Additionally, we denote by d i port the number of routes that port i serves. The container capacity of route r, denoted by φ r , is given by the sum of the maximum volume of containers (counted in Twenty Equivalent Unit; TEU) deployed on shipping route r by world shipping companies.
The data set does not contain the amount of containers transported between ports owing to the commercial confidentiality. Therefore, we assume that the same amount of containers is transported between any pair of ports belonging to the same route. This procedure is equivalent to the following one-mode projection of the bipartite network.
We represent the data as a bipartite network composed of ports and routes, where a port i and a shipping route r are adjacent if and only if port i is a calling port of route r (Fig. 2). We denote by B = (B ir ) the N × R adjacency matrix of the bipartite network, where B ir = 1 or B ir = 0 indicates that port i and route r are adjacent or not adjacent, respectively.
We construct the GLSN composed of ports by projecting the bipartite network to a one-mode network (Fig. 2). For example, in collaboration networks between academic authors, one connects all pairs of authors of a paper by an edge, resulting in a clique. Because a larger clique (i.e., a paper involving more authors) implies that the pairwise relationships between each pair of authors would be weaker, one often normalises the edge weight by dividing it by d−1 37,38 , where d is the number of authors of the paper. We apply the same method to the GLSN because the pairwise relationship between ports on a route would be relatively weak if the route involves many ports. We assume that a route is worth a summed edge weight of unity for each port. Then, we obtain ⋅) is Kronecker delta. The sum of the weight of edges incident to each port (i.e., node strength) is equal to the sum of the container capacity deployed in all the individual service routes in which the port is involved. This quantity is used for calculating the well-known country-level liner shipping connectivity index (LSCI) 39 . We note that the GLSN is a weighted network and does not contain self-loops (i.e., edges whose endpoints are the same node).

Multiresolution algorithm.
We regard a network as a collection of C non-overlapping CP pairs (Fig. 1).
Each CP pair consists of one core block (i.e., group of nodes) and one periphery block. By construction, there are many edges within each core block, whereas there are relatively few edges within each periphery block. One may assume that there are many edges between the core and periphery blocks 9,13 or few edges 40,41 . We assume that there are many edges between the core and periphery blocks because we need to pair each periphery block with a particular core block. The present algorithm is an extension of our previous algorithm, which we call the KM algorithm 5,6 . Therefore, we start by explaining the KM algorithm. The algorithm identifies multiple CP pairs in networks, which many previous algorithms do not. In the KM algorithm, we quantify the intensity of CP structure of a network by ∑∑ ∑∑ where c i is the index of the CP pair to which node i belongs, and x i = 1 or x i = 0 indicates that node i is a core node or a peripheral node, respectively. The first and second terms on the right-hand side of Eq. (2) are the fraction of the weight of edges confined within the core blocks and that connecting the core and periphery blocks within a CP pair, respectively. Quantity is the sum of the edge weight in the entire network, which normalises the value of S between 0 and 1. The KM algorithm seeks CP pairs by maximising where  S is the value of S in a sample network generated from a null model. The adjacency matrix of the sampled network is denoted by . The expectation with respect to the null model is denoted by ⋅ [ ]. We note that Q CP is equivalent to the modularity 21,42 when all nodes are core nodes, i.e., x i = 1 (1 ≤ i ≤ N). This algorithm has a resolution limit 6 . In other words, CP pairs whose size is smaller than a threshold cannot be detected. The modularity maximisation for finding communities in networks also shares this shortcoming 43 . To discuss the CP structure at different resolutions, here we extend the algorithm 6,20 using multiresolution methods 21,22 . In the new algorithm presented in this study, we seek CP pairs by maximising where γ (γ ≥ 0) is a resolution parameter that controls the effect of the null model term (i.e,  W [ ] ij ). The value of γ affects the size of the CP pairs. A detected CP pair is typically large if γ is small. It should be noted that Q CP γ is equivalent to Q CP when γ = 1.
The KM algorithm accepts various null models. We exploit this property to mitigate the artificial effect induced by the one-mode projection of bipartite networks such as the abundance of large cliques in the projected network. In our previous algorithms 5, 6 , we have adopted the Erdős-Rényi random graph 44 or the configuration model 45 as the null model. With the configuration model, we rewire the edges by preserving the degree of each node; the Erdős-Rényi random graph does not preserve the degree of each node. Here we use the configuration model as the null model because it is a standard null model in community detection 42 , rich-club detection 46 and motif analysis 47 . However, applying the configuration model directly to the GLSN is problematic because the GLSN is obtained as the one-mode projection of a bipartite network (i.e., Eq. (1)). To circumvent this problem, we incorporate the effect of the one-mode projection into the configuration model, similar to a previous study on community detection 37 , as follows.
We generate a randomised bipartite network, whose adjacency matrix is denoted by ~= B B ( ) ir , using the configuration model. In other words, the randomised network preserves the degree of each node and the bipartiteness; otherwise, the network is uniformly randomly generated. We allow multi-edges (i.e., multiple edges between the same pair of nodes) in the randomised bipartite networks for computational ease. We carry out the one-mode projection of B to obtain a randomised unipartite network. The expected edge weight of the randomised unipar- , is given by~W The randomised bipartite network (whose adjacency matrix is B) preserves the degree d r route of each route r. Therefore, Eq. (5) simplifies to [ ] ir jr represents the probability that ports i and j are adjacent to route r in the randomised bipartite network. With the configuration model, the probability that ports i and j are adjacent to route r is equal to 37 Figure 10. Schematic illustration of the variant of the Louvain algorithm. At the beginning of the current round, we have an input network of nodes (i.e., (a)). In the first step, we detect CP pairs in the input network using a label switching heuristic (i.e., (b)). In the second step, we construct a coarse-grained network by contracting the nodes in the input network having the same label into a super-node (i.e., (c)). Then, we perform the next round of which the input network is the coarse-grained network of the current round (i.e., (d)). We iterate the rounds until the value of γ Q CP stops increasing. (a) Input network for the current round. (b) CP pairs detected in the first step. The colour of each node indicates the CP pair to which the node belongs, i.e., c i (1 ≤ i ≤ N). The filled and blank circles indicate core and peripheral nodes, respectively, i.e., x i . (c) Coarsegrained network constructed in the second step. The colour and openness of circles indicate the label (c i , x i ) of super-node i. The thickness of the edge between super-nodes indicates the weight of the edge, i.e., the sum of the weight of the edges between a node in the input network belonging to one super-node and a node in the input network belonging to the other super-node. (d) The input network for the next round.
Scientific REPORtS | (2019) 9:404 | DOI:10.1038/s41598-018-35922-2 ir jr i j r r port port route route route is the number of edges in the randomised bipartite network. Substitution of Eq. (7) into Eq. (6) yields ij i j r R r r port port 1 route By substituting Eq. (8) into Eq. (4), we obtain the quality function We used a label switching heuristic to maximise γ Q CP in our previous algorithms 6,20 . In our preliminary analysis, we found that the label switching heuristic in the present case detected multiple CP pairs in the GLSN for γ = 0, whereas a single CP pair is natural anticipation in this case. This result suggests that the label switching heuristic may return notably suboptimal results for various γ values. Therefore, we implemented the following Louvain algorithm 48 to maximise the Q CP γ , which in fact yielded larger values of Q CP γ than the label switching heuristic for all γ values that we investigated.
We iterate rounds, each of which consists of two steps (Fig. 10). In the first step, we identify CP pairs in a network using a label switching heuristic. In the second step, we coarse-grain the network by contracting the nodes belonging to the same CP pair detected in the first step into a super-node. (To avoid the confusion with the nodes in the original GLSN, here we use the term super-node to refer to a node in the coarse-grained network.) Then, we apply another round of the two steps to the coarse-grained network. We iterate the rounds of the two steps until the value of Q CP γ stops increasing. Then, we set the label of each node in the original network (i.e., W) to the label of the super-node to which it belongs in the final coarse-grained network.
The details of each step are as follows. Let W be an N′ × N′ weighted adjacency matrix of the network in the beginning of the rth round, where N′ is the number of super-nodes in the beginning of the rth round. We note that = W W and N′ = N in r = 1. In the first step of each round, we initialise the label of each super-node i by (c i , Then, we inspect each super-node in a random order. For each inspected super-node i, we propose a new label (c i , x i ) = (c j , 0), where super-node j is a neighbour of super-node i in the network specified by W. We also propose new label (c i , x i ) = (c j , 1). After carrying out this procedure for all neighbours of super-node i, we adopt the proposed label that yields the largest increment in Q CP γ . If the largest increment in γ Q CP is negative, then we do not change the label of super-node i. The increment in γ Q CP caused by changing the label of super-node i from (c, x) to (c′, x′) is given by is the sum of d i over the super-nodes with label (c, x). We note that W ii is the edge weight of the self-loop of super-node i. If no label has changed in the process of inspecting the N′ super-nodes, then we proceed to the second step. Otherwise, we repeat to draw a new random order of the N′ super-nodes and inspect the N′ super-nodes for possible label switching, until no further increase in Q CP γ occurs. In the second step, we coarse-grain the network by contracting the super-nodes having the same label as a result of the first step into one super-node. In the new network, the edge weight between two super-nodes representing labels (c, x) and (c′, x′) is given by the sum of the weight of the edges between a super-node with label (c, x) before the coarse-graining and a super-node with label (c′, x′) before the coarse graining. We note that the super-nodes may have self-loops (Fig. 10).
Statistical test. We examine the statistical significance of individual CP pairs using the so-called (q, s)-test 6,20 that we previously proposed. The (q, s)-test evaluates the significance of individual CP pairs. For a CP pair in question, the (q, s)-test computes the quality of a CP pair composed of the same number of nodes in randomised networks. Then, the (q, s)-test judges the CP pair in question as significant if its quality value is statistically larger than that of the CP pair of the same number of nodes in randomised networks. The (q, s)-test requires a quality function q for individual CP pairs. We compute the quality of the CP pair c, denoted by q c , by the contribution of the cth CP pair to γ Q CP , i.e., We note that the sum of q c over all CP pairs is equal to Q CP γ . The value of q c would be positively correlated with the number n c of nodes in the cth CP pair 6 . In other words, a large q c value may be caused by a large number of nodes in the CP pair. To discount the effect of the correlation, the (q, s)-test assesses the significance of the cth CP pair using the conditional probability ≥ | P q q n ( ) c c that the quality  q of a CP pair of the same size n c detected in a randomised network is larger than q c . If ≥ | P q q n ( ) c c is smaller than a significance level α (0 < α ≤ 1), then one judges the CP pair in question to be significant. Otherwise, the CP pair is insignificant.
In the (q, s)-test, one infers ≥ | P q q n ( ) c c as follows. First, we generate 500 randomised networks using the null model discussed in the Multiresolution algorithm section. Second, we detect the CP pairs in the randomised networks using the present algorithm with the same resolution parameter used for finding the CP pair in question. For each c th detected CP pair in the 500 randomised networks, we compute the quality  q c ( ) and the number  n c ( ) of nodes in the CP pair. Third, we infer a joint probability   P q n ( , ) using the Gaussian kernel density estimator 49 , i.e., is the cumulative function of the standard normal distribution. We note that the Gaussian kernel estimator converges to any form of the probability distribution as the number of samples, C, increases 50 . Parameter h is a free parameter that affects the speed of the convergence. We use Scott's rule of thumb 51 , i.e., h C 1/6 = − . We adopt the Šidák correction 52 to evade the multiple comparisons problem. In other words, we test each CP pair in the original network at a significance level of α α = − − ′ 1 (1 ) C 1/ , where α′ is the targeted significance. We set α′ = 0.05. Consensus CP pairs. Even one starts with the same initial condition, the present algorithm yields different significant CP structures in different runs due to the stochasticity of the algorithm. We address this issue by gathering the consensus of the results of different runs, which is regarded as a type of consensus clustering of data points [53][54][55] .
To this end, we first run the present algorithm 100 times for a given value of γ. (We show the results for 6 runs at each γ value in the Supplementary Figures S1-S9). Second, for each pair of ports i and j, we compute the fraction of runs in which ports i and j belong to the same CP pair, which we denote by P ij . Third, we construct an undirected and unweighted network composed of the N = 977 ports, where two ports i and j are adjacent if and only if P ij ≥ θ. We set θ = 0.9. Finally, we regard each connected component of the network as a consensus CP pair. We refer to the ports that do not belong to any consensus CP pair as homeless ports. We define the coreness of each port i in the consensus CP pair as the fraction of runs in which port i is classified as core port.
Matching CP pairs across resolutions. Given consensus CP pairs calculated at different resolutions, we match consensus CP pairs detected at two consecutive resolutions γ and γ′ as follows. For each consensus CP pair c at resolution γ and each consensus CP pair c′ at resolution γ′, we compute the similarity τ c,c′ between them using the Jaccard index, i.e.,