Introduction

Experience reveals that species forming complex ecological or economic ecosystems are organized in hierarchies. The ranks of such species, namely their position in the hierarchy, are functions of the interactions encoded in the adjacency matrix of the ecological or economic network. While several genetic algorithms1,2 and simple heuristics3,4,5 exist to rank species in complex ecosystems, capturing analytically the relationship between species’ ranking and the underlying adjacency matrix has remained elusive so far.

In economic and ecological ecosystems, ranking rows and columns of the adjacency matrix has revealed the existence of nested structures: neighbors of low-rank nodes are subsets of the neighbors of high-rank nodes1,6,7. For example, nested patterns are found in world trade, in which products exported by low-fitness countries constitute a subset of those exported by high-fitness countries3. In fragmented habitats, species found in the least hospitable islands are a subset of species in the most hospitable islands1. Nestedness in real-world interaction networks has captured cross-disciplinary interest for three main reasons. First, nested patterns are ubiquitous among complex systems, ranging from ecological networks1,6 and the human gut microbiome8 to socioeconomic systems3,9 and online social media and collaboration networks10,11. Second, the ubiquity of nested patterns has triggered intensive debates about the reasons behind the emergence of nestedness in mutualistic systems12,13,14,15 and socioeconomic networks9,11. Third, nestedness may have profound implications for the stability and dynamics of ecological and economic communities: highly nested rankings of the nodes have revealed vulnerable species in mutualistic networks16,17 and competitive actors in the world trade5,18.

The ubiquity of nestedness and its implications in shaping the structure of biotas have motivated the formulation of the nestedness maximization problem (NMP). This problem can be stated in the following way: find the permutation (i.e. ranking) of the rows and columns of the adjacency matrix of the network resulting in a maximally nested layout of the matrix elements. Originally introduced by Atmar and Patterson1, the problem has been widely studied in ecology, leading to several algorithms for measuring the nestedness of a matrix, e.g. the popular nestedness temperature calculator and its variants1,2,19,20. Yet these methods do not attempt to optimize directly the actual cost of a nested solution but exploit some simple heuristics that are deemed to be correlated with nestedness. Another method, called BINMATNEST2, optimizes a nestedness cost following a genetic algorithm but lacks the theoretical insight contained in an analytic solution to the problem. More generally, we lack a formal theory to derive the ranking of the nodes and the degree of nestedness of a network from the structure of the adjacency matrix.

Here we introduce an analytic framework to calculate the ranking positions of nodes in bipartite interaction networks. In the proposed framework, the observed interactions are associated with an energy (or cost) function that depends on both the nodes’ ranks and the adjacency matrix of the network. Under this general assumption, the task of ranking species can be cast in the problem of finding a suitable permutation of the rows and columns of the adjacency matrix, and this problem is, fundamentally, a combinatorial one. We solve it through statistical physics techniques for an energy function that captures the nestedness maximization problem, which has attracted long-standing interest in ecology1,2 and, more recently, has played a central role in the economic complexity field and its policy implications5,21,22,23. We map the NMP onto the quadratic assignment problem (QAP)24, thereby directly tackling the problem of finding the optimal permutation of rows and columns that maximizes the nestedness of the adjacency matrix. In our formulation, the degree of nestedness is measured by a cost function over the space of all possible rows and column permutations, whose global minimum corresponds to a matrix layout having maximum nestedness. Roughly speaking, the cost function is designed to reward permutations that move the maximum number of non-zero elements of the matrix in the upper left corner and to penalize those permutations that move non-zero elements in the bottom right corner. Next, we set up the theoretical framework, which allows us to obtain the mean-field solution to the NMP as a leading order approximation and, in principle, also calculate next-to-leading order corrections. Lastly, we stress that our theoretical framework easily generalizes also to higher-order interaction networks.

Results

Problem formulation

To formulate the problem, we shall focus in the following discussion on bipartite networks, although we anticipate that the theoretical framework and the algorithm we present here can be applied to any square or rectangular matrix, bipartite or not, directed or undirected, with non-negative entries. We consider bipartite networks where nodes of one kind, representing for example, plants indexed by a variable i = 1, . . . , N, can only be connected with nodes of another kind, e.g. pollinators indexed by another variable a = 1, . . . , M, as seen in Fig. 1a. We denote by Aia the element of the network’s N × M adjacency matrix: Aia ≠ 0 if i and a are connected, and Aia = 0 otherwise. Besides connectivity, the adjacency matrix encodes the interaction strength between nodes such that whenever i and a are connected, the strength of their interaction is Aia = wia > 0. A ranking of the rows is represented by a permutation of the integers {1, 2, . . . , N}, denoted r ≡ {r1, r2, . . . , rN}; a ranking of the columns is represented by a (different) permutation of the integers {1, 2, . . . , M}, denoted c ≡ {c1, c2, . . . , cM}. More precisely, the r sequence arranges rows in ascending order of their ordinal rankings ri such that row i is ranked higher than row j if ri < rj. Similarly, the c sequence arranges columns such that column a ranks higher than column b if ca < cb.

Fig. 1: Modeling the nestedness maximization problem.
figure 1

a A bipartite network models the interactions between, e.g., plants i, represented by purple circles, and pollinators a, represented by cyan squares, through the adjacency matrix A. The interaction is mutualistic, i.e. Aia = 1 > 0 if i interacts with a and Aia = 0 otherwise. b A nested network has a hierarchical structure wherein the neighbors of low-rank nodes (the specialist species at the bottom) are a subset of the neighbors of high-rank nodes (the generalists at the top). The rank of a node is encoded in the variables ri (for plants) and ca (for pollinators). Top-rank nodes have r = c = 1, while low-rank ones have r = c = 4. The adjacency matrix of a nested network shows a peculiar pattern with all non-zero entries clustered in the upper left corner. c Maximizing network nestedness amounts to minimize the cost function E(r, c) over the ranking vectors r and c, which, in turn, is equivalent to optimizing the cost E(P, Q) with respect to the permutations matrices P and Q. The optimal permutation matrices bring the adjacency matrix to its maximally nested form PtAQ = Anested, which is complementary to the layout of matrix B.

To model the problem, one more concept is needed: network nestedness. Nestedness is the property whereby if j ranks lower than i, then the neighbors of j form a subset of the neighbors of i, as illustrated in Fig. 1b. Different rankings, i.e. different sequences r and c, produce different nested patterns, that is, nestedness is a function of the rankings. Therefore, any cost (energy) function that seeks to quantify matrix nestedness must be a function of the rankings r and c. The simplest energy function that does the job, aside from trivial cases (see Supplementary Note 1), is

$$E(r,c)=\mathop{\sum }\limits_{i=1}^{N}\mathop{\sum }\limits_{a=1}^{M}{A}_{ia}{r}_{i}{c}_{a}.$$
(1)

The product Aiarica penalizes strong interactions between low-rank nodes since they contribute a large amount to the cost function; thus, low-rank nodes typically interact weakly. Strong interactions are only allowed between high-rank nodes because, when Aia is large, the product Aiarica can be made small by choosing ri and ca to be small. Furthermore, high-rank nodes can have moderate interactions with low-rank nodes because the product riAiaca can still be relatively small when ri is large and ca is small (or vice versa) provided Aia is not too large (hence the name ‘moderate’ interaction).

The assumptions of our model are relevant to diverse scenarios where nestedness has been observed. In bipartite networks of countries connected to their exported products, we could interpret ri as the fitness of country i and ca as the inverse of the complexity of product a. In this scenario, high-energy links riAiaca represent the higher barriers faced by underdeveloped countries to produce and export sophisticated products3, whereas low-energy links represent competitive countries exporting ubiquitous products. In mutualistic ecological networks, high-energy links represent the higher extinction risk for specialist pollinators to be connected with specialist plants, whereas low-energy links represent connections within the core of generalist nodes6 as depicted in Fig. 1b.

With this premise it should be clear that to maximize nestedness, we have to minimize the energy function in Eq. (1). More precisely, nestedness maximization is the mathematical optimization problem in which we seek to find the optimal sequences r*  and c* that minimize the energy function, i.e. \(\mathop{\min }\limits_{r,c}E(r,c)=E({r}^{* },{c}^{* })\). Since the sequence r is a permutation of the ordered sequence {1, 2, . . . , N}, we can always write \({r}_{i}=\mathop{\sum }\nolimits_{n = 1}^{N}{P}_{in}n\), where P is a N × N permutation matrix. Similarly, we can write \({c}_{a}=\mathop{\sum }\nolimits_{m = 1}^{M}{Q}_{am}m\) where Q is a M × M permutation matrix. Therefore, the energy function, considered as a function of the permutation matrices P and Q, can be rewritten in the form

$$E(r,c)=E(P,Q)={{{{{{{\rm{Tr}}}}}}}}\left({P}^{t}AQ{B}^{t}\right),$$
(2)

where B is an N × M matrix with entries Bia = ia, as shown in Fig. 1c. In this language, the NMP is simply the problem of finding the permutations P* and Q* that minimizes the energy function given by Eq. (2), which mathematically reads

$$({P}^{*},{Q}^{*})=\mathop{{{{{{\mathrm{arg}}}}}} \, {{{{{\mathrm{min}}}}}} }\limits_{P,Q}E(P,Q).$$
(3)

The geometric meaning of the optimal permutations P* and Q* is clear if we apply them to the adjacency matrix as PtAQ = Anest, in that the nested structure of A is visually manifest in Anest, as illustrated in Fig. 1c. The optimization problem defined by Eqs. (2) and (3) can be recognized as an instance of the Quadratic Assignment Problem in the Koopmans-Beckmann form24, one of the most important problems in combinatorial optimization, that is known to be NP-hard. The formal mathematical mapping of the NMP onto an instance of QAP represents our first and most important result. Having formulated the NMP in the language of permutation matrices, we move next to solve it using a statistical physics approach.

Solving the NMP with statistical physics

Our basic tool to study the NMP is the partition function Z(β) defined by

$$Z(\beta )=\mathop{\sum}\limits_{P,Q}{e}^{-\beta E(P,Q)},$$
(4)

where β is an external control parameter akin to the inverse temperature in the statistical physics language. The partition function Z(β) provides a tool to determine the global minimum of the energy function via the limit

$$E({P}^{* },{Q}^{* })=-\mathop{\lim }\limits_{\beta \to \infty }\frac{1}{\beta }\ln Z(\beta )$$
(5)

Calculating the partition function may seem hopeless since it requires to evaluate and sum up N!M! terms. Nonetheless, the calculation is greatly simplified in the limit of large β, since we can evaluate Z(β) via the steepest descent method. The strategy consists of two main steps. The first step is to work out an integral representation of Z(β) of the form

$$Z(\beta )=\int\,DXDY\,{{\rm {e}}}^{-\beta F(X,Y)},$$
(6)

where the integral is over the space of N × N doubly-stochastic (DS) matrices X and M × M DS matrices Y, that converge onto permutation matrices P and Q when β → ; and F(X, Y) is an effective cost function that coincides with E(P, Q) for β → . The second step is to find the stationary points of F(X, Y) by zeroing the derivatives ∂F/∂X = ∂F/∂Y = 0, resulting in a set of self-consistent equations for X and Y, called saddle point equations. All steps of the calculation are explained in great detail in the “Methods” subsection “Derivation of the saddle point equations”. The resulting saddle point equations are given by

$${X}_{ij} ={u}_{i}\exp \left[-\beta {(AY{B}^{t})}_{ij}\right]{v}_{j},\\ {Y}_{ab} ={\mu }_{a}\exp \left[-\beta {({A}^{t}XB)}_{ab}\right]{\nu }_{b},$$
(7)

where u, v are N-dimensional vectors and μ, υ are M-dimensional vectors determined by imposing that all row and column sums of X and Y are equal to 1. At this point, we can exploit the specific form of matrix B, i.e. Bia = ia, to further simplify Eq. (7). Specifically, we define the stochastic rankings ρi and σa as

$${\rho }_{i}=\mathop{\sum }\limits_{k=1}^{N}{X}_{ik}\,k,\quad {\sigma }_{a}=\mathop{\sum }\limits_{b=1}^{M}{Y}_{ab}\,b,$$
(8)

whereby we can cast Eq. (7) in the following vectorial form (details in the “Methods” subsection “Derivation of the saddle point equations”)

$${\rho }_{i} = \frac{{\sum }_{k}k\,{v}_{k}\,{{\rm {e}}}^{-\beta k{\sum }_{a}{A}_{ia}{\sigma }_{a}}}{{\sum }_{k}{v}_{k}\,{{\rm {e}}}^{-\beta k{\sum }_{a}{A}_{ia}{\sigma }_{a}}},\\ {\sigma }_{a} = \frac{{\sum }_{c}c\,{\nu }_{c}\,{{\rm {e}}}^{-\beta c{\sum }_{i}{A}_{ia}{\rho }_{i}}}{{\sum }_{c}{\nu }_{c}\,{{\rm {e}}}^{-\beta c{\sum }_{i}{A}_{ia}{\rho }_{i}}},$$
(9)

where the normalizing vectors v and υ satisfy

$$\frac{1}{{v}_{j}} = \mathop{\sum}\limits_{i}{\left[\mathop{\sum}\limits_{k}{v}_{k}{{\rm {e}}}^{-\beta (k-j)\mathop{\sum}\limits_{a}{A}_{ia}{\sigma }_{a}}\right]}^{-1},\\ \frac{1}{{\nu }_{b}} = \mathop{\sum}\limits_{a}{\left[\mathop{\sum}\limits_{c}{\nu }_{c}{{\rm {e}}}^{-\beta (c-b)\mathop{\sum}\limits_{i}{A}_{ia}{\rho }_{i}}\right]}^{-1}.$$
(10)

Equations (9) and (10) represent our second most important result and, when interpreted as iterative equations, provide a simple algorithm to solve the NMP, whose implementation is discussed in detail in the “Methods” subsection “ Algorithm”. Note that ρ and σ converge to the actual ranking r and c for β → . However, in practice, we solve Eqs. (9) and (10) iteratively at finite β. Once we reach convergence, we estimate r and c by simply sorting the entries of ρ and σ. We observe that larger values of β give better results, i.e., lower values of the cost E(r, c), as seen in Fig. 2a. A full discussion of convergence and bounds of our algorithm will be published elsewhere (see also discussion in Supplementary Note 2). Here, we test its performance by applying it to many real mutualistic and economic networks and show that we obtain better results than state-of-the-art network metrics and genetic algorithms, as discussed next.

Fig. 2: Numerical solution and comparison with other methods in ecological networks.
figure 2

a Optimal cost E returned by our algorithm on the mutualistic network named ML-PL-001 in the Web-of-Life database for several choices of the parameter β. Larger values of β give lower costs. In particular, for sufficiently large β, our algorithm returns a lower cost than the best off-the-shelf algorithm for nestedness maximization (BINMATNEST, red line). b Comparison of our algorithm with state-of-the-art methods in the literature: Degree (bi), Fitness-Complexity (bii), Minimal-Extremal-Metric (biii), and BINMATNEST (biv). In each panel, we plot the cost returned by each algorithm divided by the cost returned by our algorithm (denoted \(\frac{{E}_{{\rm {Degree}}}}{{E}_{{\rm {NMP}}}}\) for the Degree method, \(\frac{{E}_{{\rm {FC}}}}{{E}_{{\rm {NMP}}}}\) for the Fitness-Complexity method, \(\frac{{E}_{{\rm {MEM}}}}{{E}_{{\rm {NMP}}}}\) for the Minimal-Extremal-Metric, and \(\frac{{E}_{{\rm {BIN}}}}{{E}_{{\rm {NMP}}}}\) for BINMATNEST) for each ecological interaction network considered in this work. More precisely, we compute the ordinate in each plot by taking the rankings (r, c) returned by each method, substituting them in the expression of E(r, c) given by Eq. (1), and dividing this number by the minimum cost returned by our algorithm, denoted ENMP. A value E/ENMP > 1 means that our algorithm returns a better, i.e. lower, cost. We find that our algorithm returns a lower cost in 100% of the networks when compared to degree, Fitness-Complexity, and Minimal-Extremal-Metric; and in 80% of the networks when compared to BINMATNEST (see also Table 1). c Similarity transformation applied to the adjacency matrix A of network ML-PL-OO1 that brings A into its maximally nested form PtAQ, where P and Q are the optimal permutation matrices constructed from the optimal ranking vectors r* and c*.

Nestedness maximization in empirical matrices

We apply our algorithm to two classes of empirical bipartite networks: plant-pollinator networks relevant to the study of mutualistic networks in ecology and country-activity networks relevant to the economic complexity field, where the activities represent the technological outputs on green innovations (green technology networks) or the export flows of physical goods (international trade networks). The 47 ecological mutualistic networks are freely downloadable at https://www.web-of-life.es/, whose filenames can be found in the first column of Table 1. The green technology networks that link countries and green technologies are sourced from REGPAT, accessible upon request at [https://www.oecd.org/sti/inno/intellectual-property-statistics-and-analysis.htm#ip-data]. These green technologies fall under the classification of the Cooperative Patent Classification (CPC) within the Y02/Y04S classes encompassing technologies linked to Climate Change Mitigation and Adaptation. Further, the international trade networks that link the countries with the goods they competitively export are based on the COMTRADE data collected by the United Nations (the database is not freely available, but sample data can be found at https://comtradeplus.un.org/). The exported products are classified following the SITC version-2 classification, which includes 184 classes of goods. On the geographical dimension, we consider the top 65 countries in terms of trade volume, which accounts for more than 90% of the total volume.

Table 1 Numerical results on real mutualistic networks from the Web of Life database

To standardize the comparison with existing methods, we first consider binary matrices. Subsequently, we shall study the impact of links’ weights on our algorithm’s performance. We binarize the adjacency matrices of the mutualistic networks by setting Aij = 1 if nodes i and j are connected and zero otherwise, thus ignoring the weights; for the economic networks, we binarize the matrices with the same thresholding criterion used in previous works3,5, based on the countries’ revealed comparative advantage (RCA) on the technologies or products. Despite this simplification, we emphasize that our algorithm can be applied, as is, to any bipartite network with nonnegative weights of the most general form, as shown below. We run four different methods comprising: naive degree25, fitness-complexity (FC)3, minimal extremal metric (MEM)4, and BINMATNEST2. While BINMATNEST is the state-of-the-art algorithm in ecology for nestedness maximization26, the effectiveness of the FC27,28 and MEM4 has been proved in recent works in economic complexity, which also connected the FC to the Sinkhorn algorithm from optimal transport28,29,30.

We compute the value of the cost function E(r, c) for r and c returned by each of the analyzed methods and compare it to the value ENMP returned by our algorithm (see Supplementary Note 1 for implementation details of the analyzed methods). We find that the proposed algorithm outperforms all the state-of-the-art methods in finding a better (i.e. lower) cost of the nodes’ ranking. In ecological mutualistic networks, our method finds a better (i.e. lower) cost than degree, FC, and MEM on 100% of the empirical mutualistic networks (Fig. 2b). When compared to BINMATNEST (the state-of-the-art method in ecology for the NMP), we find a better (or equal) cost in 80% of the instances, as seen in Fig. 2b and Table 1. In the green technology and international trade networks, our algorithm finds a better (i.e. lower) cost than FC—the state-of-the-art method in economic complexity—across all the analyzed years (see Fig. 3). In Fig. 2c, we show an application of the similarity transformation that brings the adjacency matrix to its maximally nested form. We call P and Q the optimal permutations that solve the QAP in Eq. (3) (details in the “Methods” subsection “Algorithm”), and we perform the similarity transformation

$$A\to {P}^{{t}}AQ,$$
(11)

which reveals the nested structure of the adjacency matrix (see Fig. 2c).

Fig. 3: Comparison with other methods in economic networks.
figure 3

Optimal cost E(r, c) returned by our algorithm (red line) as opposed to the optimal cost returned by the fitness-complexity algorithm (blue line) in (a) the green technology production networks and (b) the country-product export networks in international trade. Over all years considered in this study, our algorithm returns a better cost than Fitness-Complexity, which is the state-of-the-art algorithm in the economic complexity field.

Next, we find that including the links’ weights in the optimal ranking calculation does not substantially alter the rankings or the convergence time. We perform a comparison between the optimal rankings when using binarized versus weighted adjacency matrices; we do that on a subset of 12 networks, which are weighted. For each one of these 12 networks, we run our algorithm first on the weighted adjacency matrix and then on its binarized version. Then, we measure the correlation between the rankings obtained in the two cases, separately for the rows’ and columns’ rankings. In Fig. 4a, we report the square of the correlation coefficient R2 between the rankings of the weighted and binarized matrices and show that including the weights on the interaction links do not change the rankings substantially: high correlations are found in all data sets (Fig. 4b) but one (Fig. 4c).

Fig. 4: Comparison of the rankings in binary versus weighted adjacency matrices.
figure 4

a Each point represents the square of the correlation coefficient between the rankings of a weighted adjacency matrix and its binarized version. For each of the 12 networks there are two points, representing the R2 between the rankings of the matrix rows (light blue points) and of matrix columns (dark blue points). To make a fair comparison, we normalize both the weighted and binary adjacency matrices by their norm Aia → Aia/A1 before running our algorithm. Furthermore, we use the same value of β in both cases, chosen to be the value \({\beta }_{\max }\) also used in the plots in Fig. 2. High values of R2 are found consistently in all datasets, except for Net33. b Scatter plot of the rankings of the weighted versus binary adjacency matrices of Net17 that display the highest correlation. c Scatter plot of the rankings of the weighted versus binary adjacency matrices of Net33 that have the lowest correlation. d Ratio Tb/Tw of the execution times of our algorithm in the binary vs. weighted case for the same 12 networks analyzed in (a). The ratio roughly hovers around 1 for all networks, indicating that the runtime is nearly the same in both cases.

Furthermore, we asked whether the weights have any impact on the convergence time of the algorithm. Therefore, we measured the execution time of the algorithm in both the binary and weighted cases. In Fig. 4d we plot the ratio Tb/Tw, and we observe that, for the majority of networks, it is <1, meaning that convergence is faster in the weighted case than in the binary one. More generally, all data points are contained in the strip Tb/Tw [0.5, 1.5], which means that both execution times are of the same order of magnitude.

Finally, we suggest that deviations between the ranks by degree and our method (Fig. 4a) may reveal structurally important or vulnerable species. For example, in Network31, pollinator (row) 31 has a much better rank by NMP than by degree (upward blue arrow in Fig. 5b); the reverse is true for higher-degree pollinator 2 (downward red arrow in in Fig. 5b). Pollinator 2 has indeed higher degree than pollinator 31 (7 vs. 2 interaction partners). However, both Pollinator 31’s interaction partners are specialists that only interact with Pollinator 31. As a result, pollinator 31’s extinction would trigger their extinction in a co-extinction cascade process16,31. The same is not true for pollinator 2’s interaction partners, which all have at least one extra interaction partner besides pollinator 2, which makes them less vulnerable to pollinator 2’s potential extinction. Therefore, pollinator 31’s higher NMP rank reflects its higher importance for the robustness of the system compared to the higher-degree pollinator 2. Similar examples can be found among the columns ranks (Fig. 5c), and they point to the better ability of the NMP to identify structurally important species compared to the degree, a hypothesis that may be falsified by future empirical tests.

Fig. 5: Comparison of Nestedness maximization vs ranking by degree.
figure 5

a A comparison between the adjacency matrix (of Net31) reordered by solving the NMP (blue squares) versus the one reordered according to the node degree (red squares), showing how the ranking by degree completely fails in identifying the high level of nestedness of the network. The additional plots illustrate the different rankings of rows (b) and columns (c) by degree and NMP, as well as a few highlighted nodes that gain (blue arrows) or lose (red arrows) a substantial number of rank positions with the NMP method. These discrepancies can be due to the different structural importance of the nodes16: for example, while row 2 has a higher degree than row 31, row 31’s interaction partners only interact with row 31, which makes them vulnerable to row 31’s extinction. The same is not true for row 2 (see main text).

Discussion

In this work we introduced a framework to calculate analytically the ranks of the nodes of a bipartite network. The proposed approach requires the specification, for each link, of a cost function that depends on the rankings of the interacting nodes. This formulation allowed us to recast the Nestedness Maximization Problem as an instance of the Quadratic Assignment Problem, which we tackled with statistical physics techniques. In particular, we obtained a mean field solution by using the steepest-descent approximation of the partition function. The corresponding saddle-point equations depend on a single hyper-parameter (the inverse temperature β) and can be solved by iteration to find the optimal rankings of the rows and columns of the adjacency matrix that result in a maximally nested layout. We benchmarked our algorithm against other methods on several real ecological and economic networks and showed that our algorithm outperforms the best existing algorithm in +80% of the instances.

We conclude by outlining the potential applications of our work in economic complexity and ecology and discuss how the proposed method could be generalized to higher-order interactions. In economic complexity, algorithms such as the fitness-complexity and its variants provide heuristic solutions to the NMP; these algorithms have been recently validated via their ability to forecast the future development trajectories of nations5 and provided policy-relevant insights21,22,23. Our work provides an analytic ranking algorithm based on first principles, which helps move toward a microfoundation of the methods routinely used in the field28 and could potentially inspire the next generation of economic-complexity algorithms.

In ecology, measures of nestedness based on the NMP have been widely popular, especially in biogeography1. The use of the energy function derived here as a measure of nestedness comes with two important caveats. First, aligned with the nestedness temperature and BINMATNEST, using the optimal nestedness energy as a measure of nestedness would assume that the relevant degree of nestedness is the one provided by the optimal ranking. Under this assumption, compared to genetic algorithms that provide no insight into the ranking mechanism and act as black boxes31, our method is more interpretable as it explicitly links the ranking variables with the cost of each interaction. At the same time, ecologists have often been interested not in the nestedness by optimal ranking but in the level of nestedness when rows and columns are ranked by degree32,33 or by ecological properties of interest such as islands’ areas and isolation in biogeography34, species’ abundance in interaction networks35, and more7,36. These different perspectives generally lead to different quantitative insights: for example, the optimal energy derived here correlates positively with the standard NODF metric based on the ranking by degree (Pearson’s r = 0.45 in the 47 analyzed mutualistic networks), yet the correlation is far from one, which indicates that the two metrics convey different information.

We note that these different perspectives on nestedness can be incorporated into our framework as a non-interacting problem where the energy function couples the ranking variables with a suitable external field (see Supplementary Note 1). We emphasize that our work does not provide a criterion on whether to use the optimal ranking by a non-interacting energy function (which leads to the ranking by degree used by the popular NODF33 or other ecological properties of interest, e.g., those used in ecological gradient analyses) or the optimal ranking by the quadratic energy function (which leads to the ranking calculated analytically here). Our work introduces the general framework that directly connects ranking variables, the adjacency matrix’ entries, and the global energy of the network. Which energy function (and nestedness metric) ought to be used is ultimately dependent on the research question of interest. At any rate, even when one is not interested in the maximal nestedness per se, the maximal value provided by our interacting model still has value as a benchmark to understand how far the observed nestedness by a given ecological property of interest (e.g., species abundance) is from the theoretically maximum attainable, which can be used to test which ecological property explains most of the system’s nestedness35.

Second, ecologists have long realized that measures of nestedness may exhibit dependencies on matrix size and density20,33. For this reason, it is a standard practice in ecology to compare the observed degree of nestedness with that of suitably randomized matrices, which typically embed passive sampling mechanisms one wishes to test36. We note here that the same null model analyses performed for other nestedness metrics can also be applied to the energy function derived here, and the choice of the null model should be determined by ecological considerations as prescribed by the relevant literature (see ref. 37 for the most recent review).

From a network science standpoint, we note that by changing the definition of the matrix B, i.e. using measures other than a sequence of ordinal numbers, one can repurpose our algorithm to rank rows and columns of a matrix according to geometric patterns other than nestedness38,39. Therefore, the proposed framework holds promise for the effective detection of a wide range of network structural patterns beyond the nestedness considered here. Finally, the present framework can be easily extended and applied to solve the ranking problem in networks with higher-order interactions. For example, given the adjacency tensor Aiaγ for a system with 3-body interactions, we can define the energy function E(P, Q, R) to be optimized over 3 permutation matrices P, Q, and R following exactly the same steps outlined in this paper for the case of pairwise interactions. This may be especially relevant in the world trade for ranking countries according to both exported and imported goods.

Methods

Derivation of the saddle point equations

In this section, we discuss in detail how to derive the saddle point Eq. (7) given in the main text. We consider the minimization problem defined by

$$({r}^{* },{c}^{* })=\mathop{{{{{{\mathrm{arg}}}}}} \, {{{{{\mathrm{min}}}}}} }\limits_{r\in {{{{{{{{\mathcal{R}}}}}}}}}_{N},c\in {{{{{{{{\mathcal{R}}}}}}}}}_{M}}E(r,c),$$
(12)

where the cost (energy) function is given by

$$E(r,c)=\mathop{\sum }\limits_{i=1}^{N}\mathop{\sum }\limits_{a=1}^{M}{A}_{ia}\,{r}_{i}\,{c}_{a},$$
(13)

and \({{{{{{{{\mathcal{R}}}}}}}}}_{N}\) and \({{{{{{{{\mathcal{R}}}}}}}}}_{M}\) are the sets of all vectors r and c obtained by permuting the entries of the representative vectors r0 and c0 defined as

$${r}^{0} \equiv (1,2,3,...,N),\\ {c}^{0} \equiv (1,2,3,...,M).$$
(14)

Therefore, we can write any two vectors r and c as

$${r}_{i} =\mathop{\sum }\limits_{j=1}^{N}{P}_{ij}{r}_{j}^{0},\\ {c}_{a} =\mathop{\sum }\limits_{a=1}^{M}{Q}_{ab}{c}_{b}^{0},$$
(15)

where P and Q are arbitrary permutation matrices of size N × N and M × M, respectively. Furthermore, we introduce the N × M matrix B, defined as the tensor product of r0 and c0, whose components are explicitly given by

$${B}_{ia}={({r}^{0}\otimes {c}^{0})}_{ia}=ia.$$
(16)

With these definitions, we can rewrite the energy function as the trace of a product of matrices in the following way:

$$E\equiv E(P,Q)={{{{{{{\rm{Tr}}}}}}}}({P}^{{\rm {t}}}AQ{B}^{{\rm {t}}}).$$
(17)

The minimization problem in Eq. (12) can be reformulated as a minimization problem in the space of permutation matrices as follows:

$$({P}^{* },{Q}^{* })=\mathop{{{{{{\rm{arg}}}}}} \, {{{{{\rm{min}}}}}} }\limits_{(P\in {{{{{{{{\mathcal{S}}}}}}}}}_{N},Q\in {{{{{{{{\mathcal{S}}}}}}}}}_{M})}E(P,Q),$$
(18)

where \({{{{{{{{\mathcal{S}}}}}}}}}_{N}\) and \({{{{{{{{\mathcal{S}}}}}}}}}_{M}\) denote the symmetric groups on N and M elements, respectively.

Next we discuss a relaxation of the problem in Eq. (18) that amounts to extending the spaces \({{{{{{{{\mathcal{S}}}}}}}}}_{N}\) and \({{{{{{{{\mathcal{S}}}}}}}}}_{M}\) of permutation matrices onto the spaces of doubly-stochastic (DS) matrices \({{{{{{{{\mathcal{D}}}}}}}}}_{N}\) and \({{{{{{{{\mathcal{D}}}}}}}}}_{M}\). The space \({{{{{{{{\mathcal{D}}}}}}}}}_{N}\) (\({{{{{{{{\mathcal{D}}}}}}}}}_{M}\)) is a superset of the original space \({{{{{{{{\mathcal{S}}}}}}}}}_{N}\) (\({{{{{{{{\mathcal{S}}}}}}}}}_{M}\)). Solving the problem on the \({{{{{{{\mathcal{D}}}}}}}}\)-space means to find two doubly-stochastic matrices X* and Y* that minimize an ‘effective’ cost function F, i.e.

$$F({X}^{* },{Y}^{* })=\mathop{\min }\limits_{(X\in {{{{{{{{\mathcal{D}}}}}}}}}_{N},Y\in {{{{{{{{\mathcal{D}}}}}}}}}_{M})}F(X,Y),$$
(19)

and are only ‘slightly different’ from the permutation matrices P* and Q* (we will specify later what ‘slightly different’ means in mathematical terms and what F actually is). The quantity which plays the fundamental role in the relaxation procedure of the original problem is the partition function, Z(β), defined by

$$Z(\beta )=\mathop{\sum}\limits_{P\in {{{{{{{{\mathcal{S}}}}}}}}}_{N}}\mathop{\sum}\limits_{Q\in {{{{{{{{\mathcal{S}}}}}}}}}_{M}}{{\rm {e}}}^{-\beta E(P,Q)}.$$
(20)

The connection between Z(β) and the original problem in Eq. (18) is established by the following limit:

$$\mathop{\lim }\limits_{\beta \to \infty }-\frac{1}{\beta }\log Z(\beta )=\mathop{\min }\limits_{(P\in {{{{{{{{\mathcal{S}}}}}}}}}_{N},Q\in {{{{{{{{\mathcal{S}}}}}}}}}_{M})}E(P,Q).$$
(21)

The optimization problem in Eq. (18) is thus equivalent to the problem of calculating the partition function in Eq. (20). Ideally, we would like to compute exactly Z(β) for arbitrary β and then take the limit β → . Although an exact calculation of the partition function is, in general, out of reach, in practice we may well expect that the better we estimate Z(β), the closer the limit in Eq. (21) will be to the true optimal solution. In fact, the procedure of relaxation is basically a procedure to assess the partition function for large but finite β. Mathematically, this procedure is called the method of steepest descent. By estimating the partition function via the steepest descent method, we will obtain a system of non-linear equations, called saddle-point equations, whose solution is a pair of doubly-stochastic matrices X*, Y* that solve the relaxed problem given by Eq. (19). Eventually, the solution to the original problem in Eq. (18) can be obtained formally by projecting X*, Y* onto the subspaces \({{{{{{{{\mathcal{S}}}}}}}}}_{N},{{{{{{{{\mathcal{S}}}}}}}}}_{M}\subset {{{{{{{{\mathcal{D}}}}}}}}}_{N},{{{{{{{{\mathcal{D}}}}}}}}}_{M}\) via the limit

$$\mathop{\lim }\limits_{\beta \to \infty }{X}^{* }(\beta ) ={P}^{* },\\ \mathop{\lim }\limits_{\beta \to \infty }{Y}^{* }(\beta ) ={Q}^{* }.$$
(22)

Having explained the rationale for the introduction of the partition function, we move next to discuss the details of the calculation leading to the saddle point equations.

In order to cast the partition function in a form suitable for the steepest-descent evaluation, we need the following preliminary result.

Definition

Semi-permutation matrix: A N × N square matrix is called a semi-permutation matrix if and each row sums to one, i.e. for i = 1, . . . , N, but no further constraint on the column sums is imposed.

We denote the space of semi-permutation matrices:

(23)

Lemma

Consider an arbitary N × N square matrix G and the function W(G) defined by

(24)

Then, W(G) is explicitly given by the following formula

(25)

Proof

Let us write the right-hand side of Eq. (24) as

(26)

where is the ith row of (and thus is a vector) having one component equal to 1 and the remaining N−1 components equal to 0. The sum denotes a summation over all possible choices of the vector : there are N possible such choices, namely Hence, each sum in the right-hand side of Eq. (26) evaluates

(27)

Thus, the left-hand side of Eq. (26) is equal to

(28)

Eventually, by taking the logarithm of both sides of Eq. (28), we prove Eq. (25).

With these tools at hand, we move to derive the integral representation of Z(β). We use the definition of the Dirac δ-function to write the partition function in Eq. (20) as follows

$$Z(\beta )=\mathop{\sum}\limits_{P\in {{{{{{{{\mathcal{S}}}}}}}}}_{N}}\mathop{\sum}\limits_{Q\in {{{{{{{{\mathcal{S}}}}}}}}}_{M}}\int\,DX\int\,DY{{\rm {e}}}^{-\beta E(X,Y)}\mathop{\prod }\limits_{i,j=1}^{N}\delta ({X}_{ij}-{P}_{ij})\mathop{\prod }\limits_{a,b=1}^{N}\delta ({Y}_{ab}-{Q}_{ab}),$$
(29)

where the integration measures are defined by DX ≡ ∏i,jdXij and DY ≡ ∏a,bdYab. The next step is to transform the sum over permutation matrices P, Q into a sum over semi-permutation matrices and then performing explicitly this sum using the Lemma in Eq. (25). In order to achieve this goal, we insert into Eq. (29) N delta functions \(\mathop{\prod }\nolimits_{j = 1}^{N}\delta \left({\sum }_{i}{X}_{ij}-1\right)\) and M delta functions \(\mathop{\prod }\nolimits_{b = 1}^{M}\delta \left({\sum }_{a}{Y}_{ab}-1\right)\) to enforce the conditions that the columns of X and Y do sum up to one. By inserting these delta functions, we can then replace the sum over P, Q by a sum over , thus obtaining

(30)

To proceed further in the calculation, we use the following integral representations of the delta-functions:

(31)

into Eq. (30) and we get

(32)

where we defined the integration measures \(D\hat{X}\equiv {\prod }_{i,j}d{\hat{X}}_{ij}/2\pi i\), \(D\hat{Y}\equiv {\prod }_{a,b}d{\hat{Y}}_{ab}/2\pi i\), Dz ≡ ∏jdzj/2πi, and Dw ≡ ∏bdwb/2πi. Performing the sums over and using Eq. (25) we obtain

$$Z(\beta ) = \int \,DXDYD\hat{X}D\hat{Y}DzDw\,{{\rm {e}}}^{-\beta E(X,Y)}{{\rm {e}}}^{-{{{{{{{\rm{Tr}}}}}}}}(\hat{X}{X}^{t})+W(\hat{X})-{{{{{{{\rm{Tr}}}}}}}}(\hat{Y}{Y}^{t})+W(\hat{Y})}\\ \times {{\rm {e}}}^{-\mathop{\sum}\limits_{j}{z}_{j}\left(\mathop{\sum}\limits_{i}{X}_{ij}-1\right)}{{\rm {e}}}^{-\mathop{\sum}\limits_{b}{w}_{b}\left(\mathop{\sum}\limits_{a}{Y}_{ab}-1\right)}.$$
(33)

Next we introduce the effective cost function \(F(X,\hat{X},Y,\hat{Y},z,w)\) defined as

$$F(X,\hat{X},Y,\hat{Y},z,w) = \,E(X,Y)+\frac{1}{\beta }{{{{{{{\rm{Tr}}}}}}}}(\hat{X}{X}^{t})+\frac{1}{\beta }{{{{{{{\rm{Tr}}}}}}}}(\hat{Y}{Y}^{t})-\frac{1}{\beta }W(\hat{X})-\frac{1}{\beta }W(\hat{Y})\\ +\frac{1}{\beta }\mathop{\sum}\limits_{j}{z}_{j}\left(\mathop{\sum}\limits_{i}{X}_{ij}-1\right)+\frac{1}{\beta }\mathop{\sum}\limits_{b}{w}_{b}\left(\mathop{\sum}\limits_{a}{Y}_{ab}-1\right) \\ \equiv \,E(X,Y)-\frac{1}{\beta }S(X,\hat{X},Y,\hat{Y},z,w)$$
(34)

whereby we can write the partition function as

$$Z(\beta )=\int\,DXDYD\hat{X}D\hat{Y}DzDw\,{{\rm {e}}}^{-\beta F(X,\hat{X},Y,\hat{Y},z,w)},$$
(35)

which can be evaluated by the steepest descent method when β → , as we explain next.

In the limit of large β the integral in Eq. (35) is dominated by the saddle point where E(X, Y) is minimized and \(S(X,\hat{X},Y,\hat{Y},z,w)\) is stationary (in order for the oscillating contributions to not cancel out). In order to find the saddle point, we have to set the derivatives of \(F(X,\hat{X},Y,\hat{Y},z,w)\) to zero, thus obtaining the following saddle point equations

$$\frac{\partial F}{\partial {X}_{ij}} = \frac{\partial E}{\partial {X}_{ij}}+\frac{1}{\beta }\left({\hat{X}}_{ij}+{z}_{j}\right) = 0,\\ \frac{\partial F}{\partial {\hat{X}}_{ij}} = \frac{1}{\beta }{X}_{ij}-\frac{1}{\beta }\frac{\partial W}{\partial {\hat{X}}_{ij}} = 0,\\ \frac{\partial F}{\partial {z}_{j}} =\mathop{\sum}\limits_{i}{X}_{ij}-1=0,$$
(36)

and similar equations for the triplet \((Y,\hat{Y},w)\). The derivative of E with respect to Xij gives

$$\frac{\partial E}{\partial {X}_{ij}}={(AY{B}^{t})}_{ij},$$
(37)

and the derivative of W with respect to \({\hat{X}}_{ij}\) gives

$$\frac{\partial W}{\partial {\hat{X}}_{ij}}=\frac{{{\rm {e}}}^{{\hat{X}}_{ij}}}{{\sum }_{k}{{\rm {e}}}^{{\hat{X}}_{ik}}}.$$
(38)

Solving Eq. (36) with respect to Xij we get

$${X}_{ij}=\frac{{{\rm {e}}}^{-\beta {(AY{B}^{t})}_{ij}-{z}_{j}}}{{\sum }_{k}{{\rm {e}}}^{-\beta {(AY{B}^{t})}_{ik}-{z}_{k}}}.$$
(39)

Analogously, solving with respect to Yab we get

$${Y}_{ab}=\frac{{e}^{-\beta {({A}^{t}XB)}_{ab}-{w}_{b}}}{{\sum }_{c}{e}^{-\beta {({A}^{t}XB)}_{ac}-{w}_{c}}}.$$
(40)

It is worth noticing that Eqs. (39) and (40) are invariant under the tranformations

$$\begin{array}{rcl}{z}_{j}\,&\to &\,{z}_{j}+\zeta ,\\ {w}_{b}\,&\to &\,{w}_{b}+\xi ,\end{array}$$
(41)

for arbitrary values of ζ and ξ. This translational symmetry is due to the fact that the 2N constraints on the row and column sums of P are not linearly independent since the sum of all entries of P must be equal to N, i.e. ∑ijPij = N. The same reasoning applies to the 2M constraints on the row and column sums of Q, of which only 2M−1 are linearly independent since ∑abQab = M. Furthermore, we notice that the solution matrices X and Y in Eqs. (39), (40) automatically satisfy the condition of having row sums equal to one. Next, we derive the equations to determine the Lagrange multipliers zj and wb. To this end, we first introduce the vectors v and \(\nu\) with components

$${v}_{j} ={{\rm {e}}}^{-{z}_{j}},\\ {\nu }_{b} ={{\rm {e}}}^{-{w}_{b}}.$$
(42)

Then, we define the vectors u and μ as

$${u}_{i} ={\left(\mathop{\sum}\limits_{k}{{\rm {e}}}^{-\beta {(AY{B}^{t})}_{ik}}{v}_{k}\right)}^{-1},\\ {\mu }_{a} ={\left(\mathop{\sum}\limits_{c}{{\rm {e}}}^{-\beta {({A}^{t}XB)}_{ac}}{\nu }_{c}\right)}^{-1},$$
(43)

so that we can write the solution matrices X and Y in Eqs. (39), (40) as

$${X}_{ij} =\, {u}_{i}\,{{\rm {e}}}^{-\beta {(AY{B}^{t})}_{ij}}\,{v}_{j},\\ {Y}_{ab} = \, {\mu }_{a}\,{{\rm {e}}}^{-\beta {({A}^{t}XB)}_{ab}}\,{\nu }_{b}.$$
(44)

Finally, imposing the conditions on X and Y to have column sums equal to one, we find the equations to be satisfied by v and \(\nu\)

$${v}_{j} ={\left(\mathop{\sum}\limits_{i}{u}_{i}{{\rm {e}}}^{-\beta {(AY{B}^{t})}_{ij}}\right)}^{-1},\\ {\nu }_{b} ={\left(\mathop{\sum}\limits_{a}{\mu }_{a}{{\rm {e}}}^{-\beta {({A}^{t}XB)}_{ab}}\right)}^{-1},$$
(45)

Equations (43)–(45) are the constitutive equations for the relaxed nestedness-maximization problem corresponding to Eq. (7) given in the main text.

We conclude this section by deriving the self-consistent equations for the stochastic rankings corresponding to Eqs. (9) and (10) given in the main text. We define the stochastic rankings as the two vectors

$${\rho }_{i} = \mathop{\sum }\limits_{k=1}^{N}{X}_{ik}\,k,\\ {\sigma }_{a} = \mathop{\sum }\limits_{a=1}^{M}{Y}_{ab}\,b,$$
(46)

where the term stochastic emphasizes their implied dependence on the doubly stochastic matrices X and Y. Clearly, we have

$$\mathop{\lim }\limits_{\beta \to \infty }{\rho }_{i}={r}_{i},\\ \mathop{\lim }\limits_{\beta \to \infty }{\sigma }_{a}={c}_{a}.$$
(47)

Next, let’s consider the argument of the exponentials in Eq. (44), which we can rewrite as

$${(AY{B}^{t})}_{ij} =\mathop{\sum}\limits_{a}{A}_{ia}\left(\mathop{\sum}\limits_{b}{Y}_{ab}\,b\right)j=j\mathop{\sum}\limits_{a}{A}_{ia}{\sigma }_{a},\\ {({A}^{t}XB)}_{ab} =\mathop{\sum}\limits_{i}{A}_{ia}\left(\mathop{\sum}\limits_{j}{X}_{ij}\,j\right)b=b\mathop{\sum}\limits_{i}{A}_{ia}{\rho }_{i}.$$
(48)

At this point is sufficient to multiply both sides of Eq. (44) by j and b, and sum over j and b, respectively, to obtain

$$\mathop{\sum}\limits_{j}{X}_{ij}\,j ={\rho }_{i}={u}_{i}\mathop{\sum}\limits_{j}{{\rm {e}}}^{-\beta {(AY{B}^{t})}_{ij}}\,{v}_{j}\,j={u}_{i}\mathop{\sum}\limits_{j}{{\rm {e}}}^{-\beta j\mathop{\sum}\limits_{a}{A}_{ia}{\sigma }_{a}}\,{v}_{j}\,j,\\ \mathop{\sum}\limits_{b}{Y}_{ab}\,b ={\sigma }_{a}={\mu }_{a}\mathop{\sum}\limits_{b}{{\rm {e}}}^{-\beta {({A}^{t}XB)}_{ab}}\,{\nu }_{b}\,b={\mu }_{a}\mathop{\sum}\limits_{b}{{\rm {e}}}^{-\beta b\mathop{\sum}\limits_{i}{A}_{ia}{\rho }_{i}}\,{\nu }_{b}\,b.$$
(49)

Using the definition of ui and μa in Eq. (43) we obtain

$${\rho }_{i} =\frac{{\sum }_{j}{{\rm {e}}}^{-\beta j{\sum }_{a}{A}_{ia}{\sigma }_{a}}\,{v}_{j}\,j}{{\sum }_{j}{{\rm {e}}}^{-\beta j{\sum }_{a}{A}_{ia}{\sigma }_{a}}\,{v}_{j}},\\ {\sigma }_{a} = \frac{{\sum }_{b}{{\rm {e}}}^{-\beta b{\sum }_{i}{A}_{ia}{\rho }_{i}}\,{\nu }_{b}\,b}{{\sum }_{b}{{\rm {e}}}^{-\beta b{\sum }_{i}{A}_{ia}{\rho }_{i}}\,{\nu }_{b}},$$
(50)

which are the self-consistent Eq. (9) for ρ and σ given in the main text. There are still two unknown vectors in the previous equations: vectors v and \(\nu\). In order to determine them, we consider Eq. (45) and eliminate ui and μa using Eq. (43), thus obtaining

$${v}_{j} ={\left(\mathop{\sum}\limits_{i}{\left[\mathop{\sum}\limits_{k}{v}_{k}{e}^{-\beta (k-j)\mathop{\sum}\limits_{a}{A}_{ia}{\sigma }_{a}}\right]}^{-1}\right)}^{-1},\\ {\nu }_{b} ={\left(\mathop{\sum}\limits_{a}{\left[\mathop{\sum}\limits_{c}{\nu }_{c}{e}^{-\beta (c-b)\mathop{\sum}\limits_{i}{A}_{ia}{\rho }_{i}}\right]}^{-1}\right)}^{-1},$$
(51)

which are the self-consistent Eq. (10) for v and ν given in the main text.

Algorithm

The algorithm to solve Eqs. (50) and (51) consist of four basic steps, explained below.

  1. 1.

    Initialize ρi uniformly at random in [1, N]; similarly, initialize σa uniformly at random in [1, M]. Also, initialize vj and \(\nu\)b uniformly at random in (\(0,1\)].

  2. 2.

    Choose an initial value for β. To start, initialize β using the following formula:

    $$\beta = {\beta}_{{{{{\rm{init}}}}}}= \frac{1}{{{{{{\rm{max}}}}}} \left[{N}\, {{{{{{\rm{max}}}}}}}_{i} \left\{{k}_{i}\right\},{M}\, {{{{{{\rm{max}}}}}}}_{a} \left\{{k}_{a} \right\}\right]},$$
    (52)

    where ki = ∑aAia, and ka = ∑iAia.

  3. 3.

    Set τ = 1, and a tolerance 0 < TOL  1. Then run the following subroutine.

    1. (a)

      Iterate Eq. (51) according to the following updating rules:

      $${v}_{j}(t+1) ={\left(\mathop{\sum}\limits_{i}{\left[\mathop{\sum}\limits_{k}{v}_{k}(t){{\rm {e}}}^{-\beta (k-j)\mathop{\sum}\limits_{a}{A}_{ia}{\sigma }_{a}}\right]}^{-1}\right)}^{-1},\\ {\nu }_{b}(t+1) ={\left(\mathop{\sum}\limits_{a}{\left[\mathop{\sum}\limits_{c}{\nu }_{c}(t){{\rm {e}}}^{-\beta (c-b)\mathop{\sum}\limits_{i}{A}_{ia}{\rho }_{i}}\right]}^{-1}\right)}^{-1},$$
      (53)

      until vj(t + 1)−vj(t) < TOL for all j AND \(\nu\)b(t + 1)−\(\nu\)b(t) < TOL for all b.

    2. (b)

      Iterate Eq. (50) according to the following updating rules:

      $${\rho }_{i}(t+1) = \frac{{\sum }_{j}{{\rm {e}}}^{-\beta j{\sum }_{a}{A}_{ia}{\sigma }_{a}(t)}\,{v}_{j}\,j}{{\sum }_{j}{{\rm {e}}}^{-\beta j{\sum }_{a}{A}_{ia}{\sigma }_{a}(t)}\,{v}_{j}},\\ {\sigma }_{a}(t+1) = \frac{{\sum }_{b}{{\rm {e}}}^{-\beta b{\sum }_{i}{A}_{ia}{\rho }_{i}(t)}\,{\nu }_{b}\,b}{{\sum }_{b}{{\rm {e}}}^{-\beta b{\sum }_{i}{A}_{ia}{\rho }_{i}(t)}\,{\nu }_{b}},$$
      (54)

      until ρi(t + 1)−ρi(t) < TOL for all i AND σa(t + 1)−σa(t) < TOL for all a.

      Call \({\rho }_{i}^{(\tau )}\) and \({\sigma }_{a}^{(\tau )}\) the converged vectors and compute

      $${{{{{{{\rm{MAXDIFF}}}}}}}}\equiv \max \left\{\mathop{\max }\limits_{i}\left[{{\rho }_{i}^{(\tau )}}-{{\rho }_{i}^{(\tau -1)}}\right],\mathop{\max }\limits_{a}\left[{{\sigma }_{a}^{(\tau )}}-{{\sigma }_{a}^{(\tau -1)}}\right]\right\}.$$
      (55)
    3. (c)

      If MAXDIFF < TOL, then RETURN \({\rho }_{i}^{(\tau )}\) and \({\sigma }_{a}^{(\tau )}\); otherwise increase τ by 1 and repeat from (a).

  4. 4.

    Increase β → β + dβ and repeat from (3) or terminate if the returned vectors did not change from the previous iteration.

Having found the solution vectors ρ and σ, we convert them into integer rankings as follows. The smallest value of ρi is assigned rank 1. The second smallest is assigned rank 2, and so on and so forth. This procedure generates a mapping from 1, 2, . . . , N to i1, i2, . . . , iN that can be represented by a N × N permutation matrix Pij. The same procedure, applied to σa, generates an M × M permutation matrix Qij. Matrices P and Q represent the optimal permutations that solve the nestedness maximization problem. Eventually, the application of the similarity transformation

$$A\to {P}^{t}AQ,$$
(56)

brings the adjacency matrix into its maximally nested form having all nonzero entries clustered in the upper left corner, as seen in Fig. 2c of the main text.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.