Introduction

Transportation networks such as airways, railways and roadways underpin how the goods and people flow. An understanding of the structure of transportation networks is crucial in finding bottleneck of transportation and vulnerable parts, contributing one to improve its efficiency and resilience1. Maritime transport is by far the most cost-effective way to move goods and raw materials across the globe. More than 80% of global trade by volume is carried by ships and handled by seaports2. The most dominant type of global maritime transport in terms of seaborne trade value is the global liner shipping. To date, container ships carry over 70% value of the world trade2, making the global liner shipping network (GLSN) indispensable to the development of international trade and the world economy.

Core-periphery (CP) structure is a meso-scale structure of networks that has been found in many networks including transportation networks such as airport networks3,4,5,6, railway networks7 and road networks3,8. With CP structure based on edge density, a network is decomposed into a set of core nodes and that of peripheral nodes5,6,7,8,9,10,11,12,13. The nodes within the core are densely interconnected, those in the periphery are sparsely interconnected, and a node in the core and one in the periphery are connected with some probability depending on the assumption. Previous studies suggested that transportation networks with CP structure would be robust against random failures (e.g., closure) of nodes14 and realise a competitive trade-off between the cost and profit15. Moreover, the existence of a core may contribute to the functional stability of networks11,16.

The portrait of core-periphery dichotomy was postulated as a means to explain the uneven trade development and economic growth of nations in the process of globalization17. Maritime shipping serves as the primary transportation mode for international trade. As such, investigating the CP structure of the GLSN may help us to understand heterogeneous international trade among world regions and countries18,19. Specifically, there are many practical questions one can address by uncovering CP structure in maritime networks. How can we plan shipping routes to improve the stability and economic efficiency of seaborne trade? Which are the ports playing key roles in regional trade and those in international trades? How are ports integrated to global trade markets? Therefore, we analyse the CP structure in the GLSN. Crucially, we use the extension of our previous algorithm, Kojaku-Masuda (KM) algorithm5,6. The algorithm generally detects multiple CP pairs in networks (Fig. 1), which many other algorithms do not. We use this algorithm for two reasons. First, individual CP pairs are expected to correspond to either regional or global (or intermediate) groups of ports in each of which some ports may serve as core ports whereas the others may play a role of peripheral ports. Second, in our previous studies5,6, the algorithm found CP pairs more accurately than other algorithms did in artificial networks with planted CP pairs. We construct the GLSN from the empirical data on the liner shipping services operated by world’s top 100 liner shipping companies in terms of fleet capacity (i.e., the twenty-foot equivalent unit capacity of the fleet). The data altogether account for over 92% of the total fleet capacity in the world.

Figure 1
figure 1

Adjacency matrix of a network with two CP pairs. The filled cell or empty cell indicates the presence or absence of an edge, respectively. The solid line indicates the partition of nodes into the two CP pairs. The dashed lines within each CP pair indicate the subpartition of nodes into the core and periphery. Each core block (top-left block in each CP pair) and periphery block (bottom right block in each CP pair) consist of 20 nodes and 40 nodes, respectively. The probability that each pair of nodes is adjacent by an edge is equal to 0.95 within each core block. The same probability is equal to 0.8 between the core and periphery blocks within each CP pair. The same probability is equal to 0.05 within each periphery block and between different CP pairs. We draw edges according to these probabilities, independently for the different node pairs.

To reveal the CP structure in the GLSN, we extend our previous algorithm in the following three manners. First, we adopt a null model that is compatible with the way we construct the GLSN from the data. Specifically, the original data set is regarded as a bipartite network composed of a layer of port nodes and a layer of shipping route nodes (Fig. 2). Edges represent which ports belong to which shipping routes. Our null model discounts the effects induced by the one-mode projection of an originally bipartite network. Second, our previous algorithms have a resolution limit, with which one can not find CP structure smaller than a threshold size6,20. To circumvent this problem, we use a multiresolution method for community detection21,22 to extend the algorithm. Third, our previous algorithms provide different CP structures in the different runs of the same algorithm even if the initial condition is the same. In the present study, we run the algorithm 100 times and look at the consensus of the results obtained from the different runs.

Figure 2
figure 2

The construction of the GLSN. The width of edges in the one-mode network indicates the edge weight.

The present algorithm is applicable to networks constructed from a one-mode projection of bipartite networks. Examples of such networks include human disease networks23, metabolic networks24 and mutualistic networks25. The Python code of the present algorithm is available on GitHub26.

Results

Number of calling ports, number of serving routes, and node strength

The distribution of the container capacity of a route (i.e., the sum of the maximum volume of containers that shipping companies deploy on the shipping route) is shown in Fig. 3(a). The container capacity is heterogeneously distributed; a majority of the shipping routes has a capacity less than 102, while 2% of the routes has a capacity larger than 105. Degree \({d}_{i}^{{\rm{port}}}\) of ports in the bipartite network is also heterogeneously distributed (Fig. 3(b)). A majority (56%) of ports is shared by less than five routes, whereas 13 ports (1.3%) including Shanghai and Singapore are shared by more than 100 routes. Degree \({d}_{r}^{{\rm{route}}}\) of routes in the bipartite network is more homogeneously distributed than \({d}_{i}^{{\rm{port}}}\). A majority (52%) of routes contains less than five calling ports. The largest number of calling ports in a route is 31, which covers only 3.2% of the N = 977 ports.

Figure 3
figure 3

Distributions of (a) container capacity φr of each route, (b) the node’s degree in the bipartite network, (c) the port’s (unweighted) degree in the GLSN and (d) the port’s weighted degree (i.e., node strength) in the GLSN.

The degree of each port in the GLSN is shown in Fig. 3(c). A majority of ports (540 ports; 55%) has a degree less than 25 in the GLSN, while 60 (6%) ports have a degree larger than 100. We define node strength (i.e., weighted degree) of each port by the sum of the weight of edges attached to the port. As is the case for the container capacity, node strength is heterogeneously distributed (Fig. 3(d)). Most ports (813 ports; 83%) have a strength less than 2 × 105, while 51 ports (5%) have a strength larger than 106.

Multiscale CP structure

We identify consensus CP pairs (we call them CP pairs for short in the following text) using the algorithm presented in the Multiresolution algorithm section. The present algorithm is equipped with a resolution parameter γ, with which one can control the characteristic size of CP pairs to be detected. Different γ values may yield considerably different results. Therefore, we examine CP pairs across a range of γ, i.e., γ {0.01, 0.1, 0.2, 0.3, …, 4}.

We show the CP pairs detected at some γ values in Figs. 46. There are at most five CP pairs. For 0.01 ≤ γ ≤ 1.9, the algorithm identifies a unique CP pair containing ports in various geographical regions (Fig. 4). We refer to this CP pair as CP pair 1. The number of ports in CP pair 1 decreases from 951 ports at γ = 0.01 to 76 ports at γ = 1.9. At γ = 1.9, the CP pair 1 contains many ports in China, the North Sea, the Mediterranean Sea and North America. Few ports in Oceania, the South America, the West Africa and the East Africa belong to CP pair 1.

Figure 4
figure 4

Consensus CP pairs in the GLSN. The resolution is equal to (a) γ = 0.01, (b) γ = 0.1, (c) γ = 1.9. The filled circles indicate the ports with a coreness value larger than 0.5. The open circles indicate the ports with a coreness value less than or equal to 0.5. The open squares indicate homeless ports.

Figure 5
figure 5

Consensus CP pairs in the GLSN. The resolution is equal to (a) γ = 2, (b) γ = 2.1, (c) γ = 3.

Figure 6
figure 6

Consensus CP pairs in the GLSN. The resolution is equal to (a) γ = 3.1, (b) γ = 3.5 and (c) γ = 4.

For 2 ≤ γ ≤ 3, the algorithm identifies three CP pairs (Fig. 5). As is the case for 0.01 ≤ γ ≤ 1.9, CP pair 1 contains the ports across many regions. At γ = 2.0, the algorithm identifies CP pair 2 that branches from CP pair 1 (Fig. 5(a)). CP pair 2 contains most ports in the East Coast of the US, a Canadian port (Halifax) and an Egyptian port (Suez). At γ = 2.1, the algorithm identifies CP pair 3 located in the South Africa (Fig. 5(a)). CP pair 2 persists and enlarges in most cases as γ increases. In contrast, CP pair 3 is absent for γ ≥ 2.2 (Fig. 5(b)).

For 3.1 ≤ γ ≤ 4, the algorithm identifies four CP pairs. Each of CP pairs 1 and 2 spans different continents (Fig. 6). At γ = 3.1, CP pair 1 contains a majority of Chinese ports, the only port in Singapore and ports in the West Coast of the US. CP pair 2 contains most ports in the East Coast of the US, two ports in the Mediterranean Sea, a port in Sri Lanka. The other CP pairs 4 and 5 also branch from CP pair 1 and are composed of geographically close ports. In fact, CP pairs 4 and 5 mostly consist of the Mediterranean ports and North European ports, respectively.

The membership of each port at each γ value is shown in Fig. 7. The number of ports in CP pair 1 decreases as γ increases. CP pairs 2, 4 and 5 detected for 2 ≤ γ ≤ 4 are part of CP pair 1 detected for smaller γ values. CP pair 4 is absent for some γ values for 2.6 ≤ γ ≤ 3 but persists for 3.1 ≤ γ ≤ 4. As γ increases, CP pairs 2, 4 and 5 largely expand by absorbing ports that belong to CP pair 1 at small γ values.

Figure 7
figure 7

Membership of each port. The colour at (γ, i) indicates the index of the CP pair to which port i belongs at resolution γ. The colour code is the same as that used in Figs. 46.

The distribution of the coreness values of ports in any CP pair is shown in Fig. 8. For all γ values, most ports have a coreness value larger than 0.9. Therefore, the algorithm has classified most ports as core ports in most runs. If a CP pair only consists of core nodes, then the CP pair is a group of nodes that are densely interconnected with each other, which is equivalent to the usual notion of community. Therefore, the current result indicates that the detected CP pairs are close to communities. This property holds true for all γ values that we have examined.

Figure 8
figure 8

Distribution of coreness values of the ports in the consensus CP pairs.

Persistence of ports

CP pair 1 considered across different resolutions (i.e., γ) has a nested relation. In other words, CP pair 1 at resolution γ contains CP pair 1 at all larger γ values in a majority of cases. This is the case for all but two ports when one varies γ in the range 0.01 ≤ γ ≤ 4. Based on this observation, we define the persistence of a port as the smallest γ value above which the port does not belong to CP pair 1 for the first time as one increases γ. In other words, the persistence is the largest value of γ such that the port belongs to CP pair 1 for all resolution values up to that γ value. We note that the persistence is independent of γ.

The persistence of each port is represented by the size of the circle in Fig. 9. In the figure, only the ports belonging to CP pair 1 at γ = 0.01 are shown. Highly persistent ports (e.g., persistence value larger than 3) are concentrated in China, the North Sea, the Mediterranean Sea, the Malay Peninsula, the Red Sea, and the West Coast of the US. The two highly persistent ports in the Malay Peninsula, Singapore and Tanjung Pelepas, face the Strait of Malacca, which is an important shipping lane in the world27. There are few highly persistent ports in the Caribbean Sea, Japan, Oceania, the East Coast of the South America, the East Africa and the West Africa. Therefore, these regions may be relatively segregated from the main international shipping trade networks.

Figure 9
figure 9

Persistence of each port, i.e., the largest resolution at which the port belongs to CP pair 1. The radius of the circle is proportional to the persistence of the port.

We show the ports with the persistence value larger than 2.8 in Table 1. Highly persistent ports have a relatively large node strength (i.e., weighted degree). More precisely, the persistence and node strength are positively correlated with the Spearman correlation coefficient being equal to 0.83. We find that 497 ports (51%) have a persistence value less than or equal to 0.1, while 64 ports (7%) have a persistence value larger than 2.

Table 1 Ports with the largest persistence values.

Discussion

We developed a multiscale algorithm to identify CP structure in a one-mode projection of bipartite networks, which intends to reveal multiscale CP pairs across different scales. We applied the algorithm to a GLSN and revealed the inequality of regions in terms of the extent to which they are integrated into the global maritime transportation system. Specifically, our algorithm uncovered the following properties of the CP structure in the GLSN.

First, at a coarse resolution, we detected a unique CP pair (CP pair 1) that mainly consists of ports in Asia, Europe and North America (Fig. 4(c)). As major production and consumption centres on a global scale, these three regions have long been seen as dominating poles in global trade and container shipping activities28. Container shipping services that connect Asia and Europe, Asia and North America, and Europe and North America constitute the world’s main East-West trading lanes, well-known as “East-West Corridor” in the maritime shipping industry29. Our result also provides some information on the integration of the economy in different regions into the global markets. For instance, the ports in CP pair 1 are located in leading countries in trades (e.g., China, France, Germany, the United Kingdom and the United States) but not in Japan. The absence of Japanese ports indicates that the integration of Japan into the global maritime transportation system may be insufficient, despite its status as the world’s fourth-largest export economy in value. This situation might have a negative influence on the country’s international trade development in the long run.

Second, for finer resolutions, the algorithm identified four small CP pairs that branch from CP pair 1 (Fig. 6). These CP pairs involve main regional liner shipping markets of North Europe, Mediterranean, East Asia and North America, respectively. Two out of the four CP pairs, which are composed of major container ports in Northern Europe (CP pair 4) and the Mediterranean (CP pair 5), respectively, are geographically concentrated. In the liner shipping industry, they are highly developed conventional markets of intra-regional seaborne trade in Europe. In contrast, the other two CP pairs extend across distinct geographical regions, corresponding to two inter-regional shipping routes in the West-East direction: North American East Coast-Mediterranean Sea-Indian Subcontinent shipping route via Suez Canal (CP pair 2) and North American West Coast-East Asia shipping route across the Pacific Ocean (CP pair 1). In particular, the dominance of China and the US in CP pair 1 is consistent with the high intensity of the bilateral trade between China and the US, the world’s two largest countries in commodity trades30.

Third, the present algorithm classified a majority of ports in the GLSN as core ports as opposed to peripheral nodes (Fig. 8), as indicated by their high coreness values. Although we do not know why this is the case, the result underlines the specificity of the GLSN. In fact, in worldwide airport networks, more than half of the airports were classified as peripheral nodes5,6. This comparison indicates that the GLSN may be better regarded as a collection of communities, which is in agreement with the previous work reporting the community structure of global maritime shipping networks31. It should be noted that we found that CP pairs in the GLSN were similar to communities because we actually ran CP analysis.

Fourth, the persistence that we calculated for each port might be useful in evaluating the extent to which a port is integrated into the main international seaborne trade markets. The majority of the most persistent ports are regional load centres in the container shipping markets, i.e., world’s leading container ports in terms of the yearly container throughput volume32. Examples include East Asian ports of Busan, Guangzhou, Hong Kong, Ningbo-Zhoushan, Qingdao, Shanghai, and Shenzhen, Southeast Asian ports of Singapore and Tanjung Pelepas, North American West Coastal ports of Long Beach and Los Angeles, and European ports of Antwerp, Hamburg and Rotterdam.

Our study has the following limitations. First, we did not inform the edge weight by the actual container traffic between ports due to the commercial confidentiality. Instead, we used traffic capacity deployment data provided by shipping companies to approximate the actual traffic, assuming that the traffic capacity between any port pair on a same shipping service route was equal and bidirectional. Second, one-mode projection discards much information about the original bipartite network composed of the ports and routes. To mitigate this problem, one can use other one-mode projection methods that reflect some properties of the bipartite network to the projected networks33,34,35. Another approach is to study the original bipartite network without one-mode projection. Third, we did not analyse another family of CP structure, i.e., transportation-based CP structure3,7,8,11. Transportation-based CP structure dictates that a core is a group of nodes that are frequently used in paths connecting nodes, e.g., nodes with high betweenness centrality. Because GLSNs underlie maritime transportation, analysis of transportation-based CP structure may yield useful knowledge of the flow of cargo across the world.

Methods

Data set

We use an empirical data set provided by Alphaliner36, which reports the statistics of R = 1,631 major liner shipping service routes in the world for the year 2015. On each liner shipping service route (hereafter, shortened as service route), container ships call at a sequence of ports with a fixed service schedule. Cargo ships may call at ports for bunkering and maintenance, which are not directly associated with trade. The present data set contains only the calling ports for cargo loading and unloading, ensuring a high relevance to world seaborne trade. There are N = 977 ports in total. We denote by \({d}_{r}^{{\rm{route}}}\) the number of calling ports for route r. Additionally, we denote by \({d}_{i}^{{\rm{port}}}\) the number of routes that port i serves. The container capacity of route r, denoted by ϕr, is given by the sum of the maximum volume of containers (counted in Twenty Equivalent Unit; TEU) deployed on shipping route r by world shipping companies.

The data set does not contain the amount of containers transported between ports owing to the commercial confidentiality. Therefore, we assume that the same amount of containers is transported between any pair of ports belonging to the same route. This procedure is equivalent to the following one-mode projection of the bipartite network.

We represent the data as a bipartite network composed of ports and routes, where a port i and a shipping route r are adjacent if and only if port i is a calling port of route r (Fig. 2). We denote by B = (Bir) the N × R adjacency matrix of the bipartite network, where Bir = 1 or Bir = 0 indicates that port i and route r are adjacent or not adjacent, respectively.

We construct the GLSN composed of ports by projecting the bipartite network to a one-mode network (Fig. 2). For example, in collaboration networks between academic authors, one connects all pairs of authors of a paper by an edge, resulting in a clique. Because a larger clique (i.e., a paper involving more authors) implies that the pairwise relationships between each pair of authors would be weaker, one often normalises the edge weight by dividing it by d−137,38, where d is the number of authors of the paper. We apply the same method to the GLSN because the pairwise relationship between ports on a route would be relatively weak if the route involves many ports. We assume that a route is worth a summed edge weight of unity for each port. Then, we obtain

$${W}_{ij}\equiv [1-\delta (i,j)]\sum _{r=1}^{R}\frac{{\varphi }_{r}}{{d}_{r}^{{\rm{route}}}-1}{B}_{ir}{B}_{jr},$$
(1)

where δ(,) is Kronecker delta. The sum of the weight of edges incident to each port (i.e., node strength) is equal to the sum of the container capacity deployed in all the individual service routes in which the port is involved. This quantity is used for calculating the well-known country-level liner shipping connectivity index (LSCI)39. We note that the GLSN is a weighted network and does not contain self-loops (i.e., edges whose endpoints are the same node).

Multiresolution algorithm

We regard a network as a collection of C non-overlapping CP pairs (Fig. 1). Each CP pair consists of one core block (i.e., group of nodes) and one periphery block. By construction, there are many edges within each core block, whereas there are relatively few edges within each periphery block. One may assume that there are many edges between the core and periphery blocks9,13 or few edges40,41. We assume that there are many edges between the core and periphery blocks because we need to pair each periphery block with a particular core block.

The present algorithm is an extension of our previous algorithm, which we call the KM algorithm5,6. Therefore, we start by explaining the KM algorithm. The algorithm identifies multiple CP pairs in networks, which many previous algorithms do not. In the KM algorithm, we quantify the intensity of CP structure of a network by

$$S\equiv \frac{1}{2{\rm{\Omega }}}\sum _{i=1}^{N}\sum _{j=1}^{N}{W}_{ij}{x}_{i}{x}_{j}\delta ({c}_{i},{c}_{j})+\frac{1}{2{\rm{\Omega }}}\sum _{i=1}^{N}\sum _{j=1}^{N}{W}_{ij}[\mathrm{(1}-{x}_{i}){x}_{j}+{x}_{i}({x}_{j}-\mathrm{1)}]\delta ({c}_{i},{c}_{j}),$$
(2)

where ci is the index of the CP pair to which node i belongs, and xi = 1 or xi = 0 indicates that node i is a core node or a peripheral node, respectively. The first and second terms on the right-hand side of Eq. (2) are the fraction of the weight of edges confined within the core blocks and that connecting the core and periphery blocks within a CP pair, respectively. Quantity \({\rm{\Omega }}={\sum }_{i\mathrm{=1}}^{N}{\sum }_{j=1}^{N}{W}_{ij}\mathrm{/2}\) is the sum of the edge weight in the entire network, which normalises the value of S between 0 and 1. The KM algorithm seeks CP pairs by maximising

$${Q}^{{\rm{CP}}}\equiv S-{\mathbb{E}}[\tilde{S}]=\frac{1}{2{\rm{\Omega }}}\sum _{i=1}^{N}\sum _{j=1}^{N}({W}_{ij}-{\mathbb{E}}[{\tilde{W}}_{ij}])({x}_{i}+{x}_{j}-{x}_{i}{x}_{j})\delta ({c}_{i},{c}_{j}),$$
(3)

where \(\tilde{S}\) is the value of S in a sample network generated from a null model. The adjacency matrix of the sampled network is denoted by \(\tilde{{\bf{W}}}=({\tilde{W}}_{ij})\). The expectation with respect to the null model is denoted by \({\mathbb{E}}[\,\cdot \,]\). We note that QCP is equivalent to the modularity21,42 when all nodes are core nodes, i.e., xi = 1 (1 ≤ i ≤ N).

This algorithm has a resolution limit6. In other words, CP pairs whose size is smaller than a threshold cannot be detected. The modularity maximisation for finding communities in networks also shares this shortcoming43. To discuss the CP structure at different resolutions, here we extend the algorithm6,20 using multiresolution methods21,22. In the new algorithm presented in this study, we seek CP pairs by maximising

$${Q}_{\gamma }^{{\rm{CP}}}\equiv \frac{1}{2{\rm{\Omega }}}\sum _{i=1}^{N}\sum _{j=1}^{N}({W}_{ij}-\gamma {\mathbb{E}}[{\tilde{W}}_{ij}])({x}_{i}+{x}_{j}-{x}_{i}{x}_{j})\delta ({c}_{i},{c}_{j}),$$
(4)

where γ (γ ≥ 0) is a resolution parameter that controls the effect of the null model term (i.e, \({\mathbb{E}}[{\mathop{W}\limits^{ \sim }}_{ij}]\)). The value of γ affects the size of the CP pairs. A detected CP pair is typically large if γ is small. It should be noted that \({Q}_{\gamma }^{{\rm{CP}}}\) is equivalent to QCP when γ = 1.

The KM algorithm accepts various null models. We exploit this property to mitigate the artificial effect induced by the one-mode projection of bipartite networks such as the abundance of large cliques in the projected network. In our previous algorithms5,6, we have adopted the Erdős-Rényi random graph44 or the configuration model45 as the null model. With the configuration model, we rewire the edges by preserving the degree of each node; the Erdős-Rényi random graph does not preserve the degree of each node. Here we use the configuration model as the null model because it is a standard null model in community detection42, rich-club detection46 and motif analysis47. However, applying the configuration model directly to the GLSN is problematic because the GLSN is obtained as the one-mode projection of a bipartite network (i.e., Eq. (1)). To circumvent this problem, we incorporate the effect of the one-mode projection into the configuration model, similar to a previous study on community detection37, as follows.

We generate a randomised bipartite network, whose adjacency matrix is denoted by \(\mathop{{\bf{B}}}\limits^{ \sim }=({\mathop{B}\limits^{ \sim }}_{ir})\), using the configuration model. In other words, the randomised network preserves the degree of each node and the bipartiteness; otherwise, the network is uniformly randomly generated. We allow multi-edges (i.e., multiple edges between the same pair of nodes) in the randomised bipartite networks for computational ease. We carry out the one-mode projection of \(\mathop{{\boldsymbol{B}}}\limits^{ \sim }\) to obtain a randomised unipartite network. The expected edge weight of the randomised unipartite network, \({\mathbb{E}}\,[{\tilde{W}}_{ij}]\), is given by

$${\mathbb{E}}[{\mathop{W}\limits^{ \sim }}_{ij}]=[1-\delta (i,j)]{\mathbb{E}}\,[\sum _{r=1}^{R}\frac{{\varphi }_{r}}{{d}_{r}^{{\rm{r}}{\rm{o}}{\rm{u}}{\rm{t}}{\rm{e}}}-1}{\mathop{B}\limits^{ \sim }}_{ir}{\mathop{B}\limits^{ \sim }}_{jr}].$$
(5)

The randomised bipartite network (whose adjacency matrix is B) preserves the degree \({d}_{r}^{{\rm{route}}}\) of each route r. Therefore, Eq. (5) simplifies to

$${\mathbb{E}}[{\tilde{W}}_{ij}]=[1-\delta (i,j)]\sum _{r=1}^{R}\frac{{\varphi }_{r}}{{d}_{r}^{{\rm{route}}}-1}{\mathbb{E}}\,[{\tilde{B}}_{ir}{\tilde{B}}_{jr}].$$
(6)

The term \({\mathbb{E}}\,[{\tilde{B}}_{ir}{\tilde{B}}_{jr}]\) represents the probability that ports i and j are adjacent to route r in the randomised bipartite network. With the configuration model, the probability that ports i and j are adjacent to route r is equal to37

$${\mathbb{E}}\,[{\tilde{B}}_{ir}{\tilde{B}}_{jr}]={d}_{i}^{{\rm{port}}}{d}_{j}^{{\rm{port}}}\frac{{d}_{r}^{{\rm{route}}}({d}_{r}^{{\rm{route}}}-\mathrm{1)}}{M(M-\mathrm{1)}},$$
(7)

where \(M={\sum }_{r{\prime} \mathrm{=1}}^{R}{d}_{r{\prime} }^{{\rm{route}}}\) is the number of edges in the randomised bipartite network. Substitution of Eq. (7) into Eq. (6) yields

$${\mathbb{E}}[{\tilde{W}}_{ij}]=[1-\delta (i,j)]{d}_{i}^{{\rm{port}}}{d}_{j}^{{\rm{port}}}\sum _{r=1}^{R}\frac{{\varphi }_{r}{d}_{r}^{{\rm{route}}}}{M(M-\mathrm{1)}}.$$
(8)

By substituting Eq. (8) into Eq. (4), we obtain the quality function

$${Q}_{\gamma }^{{\rm{C}}{\rm{P}}}=\frac{1}{2{\rm{\Omega }}}\sum _{i=1}^{N}\sum _{j=1}^{N}({W}_{ij}-\gamma {d}_{i}^{{\rm{p}}{\rm{o}}{\rm{r}}{\rm{t}}}{d}_{j}^{{\rm{p}}{\rm{o}}{\rm{r}}{\rm{t}}}\sum _{r=1}^{R}\frac{{\varphi }_{r}{d}_{r}^{{\rm{r}}{\rm{o}}{\rm{u}}{\rm{t}}{\rm{e}}}}{M(M-1)})({x}_{i}+{x}_{j}-{x}_{i}{x}_{j})\delta ({c}_{i},{c}_{j}).$$
(9)

Maximisation of \({Q}_{\gamma }^{{\rm{CP}}}\)

We used a label switching heuristic to maximise \({Q}_{\gamma }^{{\rm{CP}}}\) in our previous algorithms6,20. In our preliminary analysis, we found that the label switching heuristic in the present case detected multiple CP pairs in the GLSN for γ = 0, whereas a single CP pair is natural anticipation in this case. This result suggests that the label switching heuristic may return notably suboptimal results for various γ values. Therefore, we implemented the following Louvain algorithm48 to maximise the \({Q}_{\gamma }^{{\rm{CP}}}\), which in fact yielded larger values of \({Q}_{\gamma }^{{\rm{CP}}}\) than the label switching heuristic for all γ values that we investigated.

We iterate rounds, each of which consists of two steps (Fig. 10). In the first step, we identify CP pairs in a network using a label switching heuristic. In the second step, we coarse-grain the network by contracting the nodes belonging to the same CP pair detected in the first step into a super-node. (To avoid the confusion with the nodes in the original GLSN, here we use the term super-node to refer to a node in the coarse-grained network.) Then, we apply another round of the two steps to the coarse-grained network. We iterate the rounds of the two steps until the value of \({Q}_{\gamma }^{{\rm{CP}}}\) stops increasing. Then, we set the label of each node in the original network (i.e., W) to the label of the super-node to which it belongs in the final coarse-grained network.

Figure 10
figure 10

Schematic illustration of the variant of the Louvain algorithm. At the beginning of the current round, we have an input network of nodes (i.e., (a)). In the first step, we detect CP pairs in the input network using a label switching heuristic (i.e., (b)). In the second step, we construct a coarse-grained network by contracting the nodes in the input network having the same label into a super-node (i.e., (c)). Then, we perform the next round of which the input network is the coarse-grained network of the current round (i.e., (d)). We iterate the rounds until the value of \({Q}_{\gamma }^{{\rm{CP}}}\) stops increasing. (a) Input network for the current round. (b) CP pairs detected in the first step. The colour of each node indicates the CP pair to which the node belongs, i.e., ci (1 ≤ i ≤ N). The filled and blank circles indicate core and peripheral nodes, respectively, i.e., xi. (c) Coarse-grained network constructed in the second step. The colour and openness of circles indicate the label (ci, xi) of super-node i. The thickness of the edge between super-nodes indicates the weight of the edge, i.e., the sum of the weight of the edges between a node in the input network belonging to one super-node and a node in the input network belonging to the other super-node. (d) The input network for the next round.

The details of each step are as follows. Let \(\overline{{\bf{W}}}\) be an N′ × N′ weighted adjacency matrix of the network in the beginning of the rth round, where N′ is the number of super-nodes in the beginning of the rth round. We note that \(\bar{{\bf{W}}}={\bf{W}}\) and N′ = N in r = 1. In the first step of each round, we initialise the label of each super-node i by (ci, xi) = (i, 1), where 1 ≤ i ≤ N′. Then, we inspect each super-node in a random order. For each inspected super-node i, we propose a new label (ci, xi) = (cj, 0), where super-node j is a neighbour of super-node i in the network specified by \(\overline{{\bf{W}}}\). We also propose new label (ci, xi) = (cj, 1). After carrying out this procedure for all neighbours of super-node i, we adopt the proposed label that yields the largest increment in \({Q}_{\gamma }^{{\rm{CP}}}\). If the largest increment in \({Q}_{\gamma }^{{\rm{CP}}}\) is negative, then we do not change the label of super-node i. The increment in \({Q}_{\gamma }^{{\rm{CP}}}\) caused by changing the label of super-node i from (c, x) to (c′, x′) is given by

$$\begin{array}{c}\frac{1}{{\rm{\Omega }}}[{\bar{W}}_{i,(c{\prime} ,1)}+x{\prime} {\bar{W}}_{i,(c{\prime} ,0)}-{\bar{W}}_{i,(c,1)}-x{\bar{W}}_{i,(c,0)}+(x{\prime} -x){\bar{W}}_{ii}-\gamma {\bar{d}}_{i}({\bar{D}}_{(c{\prime} ,1)}+x{\prime} {\bar{D}}_{(c{\prime} ,0)}-{\bar{D}}_{(c,1)}-x{\bar{D}}_{(c,0)})(\sum _{r=1}^{R}\frac{{\varphi }_{r}{d}_{r}^{{\rm{r}}{\rm{o}}{\rm{u}}{\rm{t}}{\rm{e}}}}{M(M-1)})],\end{array}$$
(10)

where \({\overline{d}}_{i}\) is the sum of \({d}_{j}^{{\rm{port}}}\) values of the nodes belonging to super-node i, \({\overline{W}}_{i,(c,x)}={\sum }_{j=\mathrm{1,}j\ne i}^{N{\prime} }{\overline{W}}_{ij}\delta (c,{c}_{j})\delta (x,{x}_{j})\) is the sum of the weight of the edges between super-node i and other super-nodes with label (c, x), and \({\overline{D}}_{(c,x)}={\sum }_{j=1}^{N{\prime} }{\overline{d}}_{j}\delta (c,{c}_{j})\delta (x,{x}_{j})\) is the sum of \({\overline{d}}_{i}\) over the super-nodes with label (c, x). We note that \({\overline{W}}_{ii}\) is the edge weight of the self-loop of super-node i. If no label has changed in the process of inspecting the N′ super-nodes, then we proceed to the second step. Otherwise, we repeat to draw a new random order of the N′ super-nodes and inspect the N′ super-nodes for possible label switching, until no further increase in \({Q}_{\gamma }^{{\rm{CP}}}\) occurs.

In the second step, we coarse-grain the network by contracting the super-nodes having the same label as a result of the first step into one super-node. In the new network, the edge weight between two super-nodes representing labels (c, x) and (c′, x′) is given by the sum of the weight of the edges between a super-node with label (c, x) before the coarse-graining and a super-node with label (c′, x′) before the coarse graining. We note that the super-nodes may have self-loops (Fig. 10).

Statistical test

We examine the statistical significance of individual CP pairs using the so-called (q, s)-test6,20 that we previously proposed. The (q, s)-test evaluates the significance of individual CP pairs. For a CP pair in question, the (q, s)-test computes the quality of a CP pair composed of the same number of nodes in randomised networks. Then, the (q, s)-test judges the CP pair in question as significant if its quality value is statistically larger than that of the CP pair of the same number of nodes in randomised networks. The (q, s)-test requires a quality function q for individual CP pairs. We compute the quality of the CP pair c, denoted by qc, by the contribution of the cth CP pair to \({Q}_{\gamma }^{{\rm{CP}}}\), i.e.,

$${q}_{c}\equiv \frac{1}{2{\rm{\Omega }}}\sum _{i=1}^{N}\sum _{j=1}^{N}({W}_{ij}-\gamma {d}_{i}^{{\rm{p}}{\rm{o}}{\rm{r}}{\rm{t}}}{d}_{j}^{{\rm{p}}{\rm{o}}{\rm{r}}{\rm{t}}}\sum _{r=1}^{R}\frac{{\varphi }_{r}{d}_{r}^{{\rm{r}}{\rm{o}}{\rm{u}}{\rm{t}}{\rm{e}}}}{M(M-1)})({x}_{i}+{x}_{j}-{x}_{i}{x}_{j})\delta ({c}_{i},{c}_{j})\delta ({c}_{i},c).$$
(11)

We note that the sum of qc over all CP pairs is equal to \({Q}_{\gamma }^{{\rm{CP}}}\).

The value of qc would be positively correlated with the number nc of nodes in the cth CP pair6. In other words, a large qc value may be caused by a large number of nodes in the CP pair. To discount the effect of the correlation, the (q, s)-test assesses the significance of the cth CP pair using the conditional probability \(P(\mathop{q}\limits^{ \sim }\ge {q}_{c}|{n}_{c})\) that the quality \(\tilde{q}\) of a CP pair of the same size nc detected in a randomised network is larger than qc. If \(P(\mathop{q}\limits^{ \sim }\ge {q}_{c}|{n}_{c})\)  is smaller than a significance level α (0 < α ≤ 1), then one judges the CP pair in question to be significant. Otherwise, the CP pair is insignificant.

In the (q, s)-test, one infers \(P(\mathop{q}\limits^{ \sim }\ge {q}_{c}|{n}_{c})\)  as follows. First, we generate 500 randomised networks using the null model discussed in the Multiresolution algorithm section. Second, we detect the CP pairs in the randomised networks using the present algorithm with the same resolution parameter used for finding the CP pair in question. For each \(\overline{c}\)th detected CP pair in the 500 randomised networks, we compute the quality \({\tilde{q}}^{(\overline{c})}\) and the number \({\tilde{n}}^{(\overline{c})}\) of nodes in the CP pair. Third, we infer a joint probability \(P(\tilde{q},\tilde{n})\) using the Gaussian kernel density estimator49, i.e.,

$$P(\mathop{q}\limits^{ \sim },\mathop{n}\limits^{ \sim })=\sum _{\bar{c}=1}^{\bar{C}}\,f(\frac{\mathop{q}\limits^{ \sim }-{\mathop{q}\limits^{ \sim }}^{(\bar{c})}}{h{\sigma }_{\mathop{q}\limits^{ \sim }}},\frac{\mathop{n}\limits^{ \sim }-{\mathop{n}\limits^{ \sim }}^{(\bar{c})}}{h{\sigma }_{\mathop{n}\limits^{ \sim }}})/\bar{C},$$
(12)

where \(\bar{C}\) is the sum of the number of CP pairs detected in the 500 randomised networks, and \({\sigma }_{\tilde{q}}\) and \({\sigma }_{\tilde{n}}\) are the unbiased estimation of the standard deviation for {\({\tilde{q}}^{(\overline{c})}\)} and \(\{{\mathop{n}\limits^{ \sim }}^{(\bar{c})}\}\) (\(1\le \overline{c}\le \overline{C}\)), respectively. Function f(,) is the bivariate standard normal distribution given by

$$f({y}_{1},{y}_{2})\equiv \frac{1}{2\pi \sqrt{1\,-\,{\rho }^{2}}}\exp (\,-\,\frac{{y}_{1}^{2}-2\rho {y}_{1}{y}_{2}+{y}_{2}^{2}}{2(1-{\rho }^{2})}),$$
(13)

where ρ is the Pearson correlation coefficient between \(\{{\tilde{q}}^{(\overline{c})}\}\) and \(\{{\mathop{n}\limits^{ \sim }}^{(\bar{c})}\}\) (\(1\le \bar{c}\le \bar{C}\)). Using Eq. (12), we obtain

$$\begin{array}{cc}P(\mathop{q}\limits^{ \sim }\ge {q}_{c}|{n}_{c}) & =\,\displaystyle \frac{{\int }_{{q}_{c}}^{{\rm{\infty }}}P(\mathop{q}\limits^{ \sim },{n}_{c}){\rm{d}}\mathop{q}\limits^{ \sim }}{{\int }_{{\rm{\infty }}}^{{\rm{\infty }}}P(\mathop{q}\limits^{ \sim },{n}_{c}){\rm{d}}\mathop{q}\limits^{ \sim }}\\ & =1-\displaystyle \frac{\displaystyle \sum _{\bar{c}=1}^{\bar{C}}\exp (\,-\,\frac{{({n}_{c}-{\mathop{n}\limits^{ \sim }}^{(\bar{c})})}^{2}}{2{\sigma }_{\mathop{n}\limits^{ \sim }}^{2}{h}^{2}}){\rm{\Phi }}(\frac{{\sigma }_{\mathop{n}\limits^{ \sim }}({q}_{c}-{\mathop{q}\limits^{ \sim }}^{(\bar{c})})-\rho {\sigma }_{\mathop{q}\limits^{ \sim }}({n}_{c}\,-\,{\mathop{n}\limits^{ \sim }}^{(\bar{c})})}{{\sigma }_{\mathop{n}\limits^{ \sim }}{\sigma }_{\mathop{q}\limits^{ \sim }}h\sqrt{1-{\rho }^{2}}})}{\displaystyle \sum _{\bar{c}=1}^{\bar{C}}\exp (\,-\,\frac{{({n}_{c}-{\mathop{n}\limits^{ \sim }}^{(\bar{c})})}^{2}}{2{\sigma }_{\mathop{n}\limits^{ \sim }}^{2}{h}^{2}})},\end{array}$$
(14)

where \({\rm{\Phi }}(y)={\mathrm{(2}\pi )}^{-\mathrm{1/2}}{\int }_{-\infty }^{y}\exp (\,-\,{u}^{2}\mathrm{/2)}{\rm{d}}u\) is the cumulative function of the standard normal distribution.

We note that the Gaussian kernel estimator converges to any form of the probability distribution as the number of samples, \(\bar{C}\), increases50. Parameter h is a free parameter that affects the speed of the convergence. We use Scott’s rule of thumb51, i.e., \(h={\bar{C}}^{-\mathrm{1/6}}\). We adopt the Šidák correction52 to evade the multiple comparisons problem. In other words, we test each CP pair in the original network at a significance level of \(\alpha =1-{\mathrm{(1}-\alpha {\prime} )}^{\mathrm{1/}C}\), where α′ is the targeted significance. We set α′ = 0.05.

Consensus CP pairs

Even one starts with the same initial condition, the present algorithm yields different significant CP structures in different runs due to the stochasticity of the algorithm. We address this issue by gathering the consensus of the results of different runs, which is regarded as a type of consensus clustering of data points53,54,55.

To this end, we first run the present algorithm 100 times for a given value of γ. (We show the results for 6 runs at each γ value in the Supplementary Figures S1S9). Second, for each pair of ports i and j, we compute the fraction of runs in which ports i and j belong to the same CP pair, which we denote by Pij. Third, we construct an undirected and unweighted network composed of the N = 977 ports, where two ports i and j are adjacent if and only if Pij ≥ θ. We set θ = 0.9. Finally, we regard each connected component of the network as a consensus CP pair. We refer to the ports that do not belong to any consensus CP pair as homeless ports. We define the coreness of each port i in the consensus CP pair as the fraction of runs in which port i is classified as core port.

Matching CP pairs across resolutions

Given consensus CP pairs calculated at different resolutions, we match consensus CP pairs detected at two consecutive resolutions γ and γ′ as follows. For each consensus CP pair c at resolution γ and each consensus CP pair c′ at resolution γ′, we compute the similarity τc,c between them using the Jaccard index, i.e.,

$${\tau }_{c,c{\prime} }\equiv \frac{|{V}_{c}\cap {V}_{c{\prime} }|}{|{V}_{c}\cup {V}_{c{\prime} }|},$$
(15)

where Vc and Vc are the sets of ports in consensus CP pairs c and c′, respectively. We match c and c′ if \({\tau }_{c,c{\prime} } > {{\rm{\max }}}_{\bar{c}\ne c}{\tau }_{\bar{c},c{\prime} }\) and \({\tau }_{c,c{\prime} } > {{\rm{\max }}}_{\bar{c}\ne c{\prime} }{\tau }_{c,\bar{c}}\). We note that some consensus CP pairs at resolution γ may not be matched with any consensus CP pair at γ′ or vice versa. We did not find ties in the τc,c value during the matching procedure.

As a result of the matching, we found seven consensus CP pairs across the resolution values. In fact, three of them (shown in green in Figs. 46 and 7) are composed of almost the same set of nodes and reside in different ranges of γ separated by gaps (therefore not contiguous in terms of the γ value). Therefore, we regard these three consensus CP pairs as a single consensus CP pair.