Ranking species in complex ecosystems through nestedness maximization

Mariani, Manuel Sebastian; Mazzilli, Dario; Patelli, Aurelio; Sels, Dries; Morone, Flaviano

doi:10.1038/s42005-024-01588-8

Download PDF

Article
Open access
Published: 21 March 2024

Ranking species in complex ecosystems through nestedness maximization

Communications Physics volume 7, Article number: 102 (2024) Cite this article

642 Accesses
6 Altmetric
Metrics details

Subjects

Abstract

Identifying the rank of species in a complex ecosystem is a difficult task, since the rank of each species invariably depends on the interactions stipulated with other species through the adjacency matrix of the network. A common ranking method in economic and ecological networks is to sort the nodes such that the layout of the reordered adjacency matrix looks maximally nested with all nonzero entries packed in the upper left corner, called Nestedness Maximization Problem (NMP). Here we solve this problem by defining a suitable cost-energy function for the NMP which reveals the equivalence between the NMP and the Quadratic Assignment Problem, one of the most important combinatorial optimization problems, and use statistical physics techniques to derive a set of self-consistent equations whose fixed point represents the optimal nodes’ rankings in an arbitrary bipartite mutualistic network. Concurrently, we present an efficient algorithm to solve the NMP that outperforms state-of-the-art network-based metrics and genetic algorithms. Eventually, our theoretical framework may be easily generalized to study the relationship between ranking and network structure beyond pairwise interactions, e.g. in higher-order networks.

Improving microbial phylogeny with citizen science within a mass-market video game

Article Open access 15 April 2024

The genetic landscape of a metabolic interaction

Article Open access 18 April 2024

Principal component analysis

Article 22 December 2022

Introduction

Experience reveals that species forming complex ecological or economic ecosystems are organized in hierarchies. The ranks of such species, namely their position in the hierarchy, are functions of the interactions encoded in the adjacency matrix of the ecological or economic network. While several genetic algorithms^1,2 and simple heuristics^3,4,5 exist to rank species in complex ecosystems, capturing analytically the relationship between species’ ranking and the underlying adjacency matrix has remained elusive so far.

In economic and ecological ecosystems, ranking rows and columns of the adjacency matrix has revealed the existence of nested structures: neighbors of low-rank nodes are subsets of the neighbors of high-rank nodes^1,6,7. For example, nested patterns are found in world trade, in which products exported by low-fitness countries constitute a subset of those exported by high-fitness countries³. In fragmented habitats, species found in the least hospitable islands are a subset of species in the most hospitable islands¹. Nestedness in real-world interaction networks has captured cross-disciplinary interest for three main reasons. First, nested patterns are ubiquitous among complex systems, ranging from ecological networks^1,6 and the human gut microbiome⁸ to socioeconomic systems^3,9 and online social media and collaboration networks^10,11. Second, the ubiquity of nested patterns has triggered intensive debates about the reasons behind the emergence of nestedness in mutualistic systems^12,13,14,15 and socioeconomic networks^9,11. Third, nestedness may have profound implications for the stability and dynamics of ecological and economic communities: highly nested rankings of the nodes have revealed vulnerable species in mutualistic networks^16,17 and competitive actors in the world trade^5,18.

The ubiquity of nestedness and its implications in shaping the structure of biotas have motivated the formulation of the nestedness maximization problem (NMP). This problem can be stated in the following way: find the permutation (i.e. ranking) of the rows and columns of the adjacency matrix of the network resulting in a maximally nested layout of the matrix elements. Originally introduced by Atmar and Patterson¹, the problem has been widely studied in ecology, leading to several algorithms for measuring the nestedness of a matrix, e.g. the popular nestedness temperature calculator and its variants^1,2,19,20. Yet these methods do not attempt to optimize directly the actual cost of a nested solution but exploit some simple heuristics that are deemed to be correlated with nestedness. Another method, called BINMATNEST², optimizes a nestedness cost following a genetic algorithm but lacks the theoretical insight contained in an analytic solution to the problem. More generally, we lack a formal theory to derive the ranking of the nodes and the degree of nestedness of a network from the structure of the adjacency matrix.

Here we introduce an analytic framework to calculate the ranking positions of nodes in bipartite interaction networks. In the proposed framework, the observed interactions are associated with an energy (or cost) function that depends on both the nodes’ ranks and the adjacency matrix of the network. Under this general assumption, the task of ranking species can be cast in the problem of finding a suitable permutation of the rows and columns of the adjacency matrix, and this problem is, fundamentally, a combinatorial one. We solve it through statistical physics techniques for an energy function that captures the nestedness maximization problem, which has attracted long-standing interest in ecology^1,2 and, more recently, has played a central role in the economic complexity field and its policy implications^5,21,22,23. We map the NMP onto the quadratic assignment problem (QAP)²⁴, thereby directly tackling the problem of finding the optimal permutation of rows and columns that maximizes the nestedness of the adjacency matrix. In our formulation, the degree of nestedness is measured by a cost function over the space of all possible rows and column permutations, whose global minimum corresponds to a matrix layout having maximum nestedness. Roughly speaking, the cost function is designed to reward permutations that move the maximum number of non-zero elements of the matrix in the upper left corner and to penalize those permutations that move non-zero elements in the bottom right corner. Next, we set up the theoretical framework, which allows us to obtain the mean-field solution to the NMP as a leading order approximation and, in principle, also calculate next-to-leading order corrections. Lastly, we stress that our theoretical framework easily generalizes also to higher-order interaction networks.

Results

Problem formulation

To formulate the problem, we shall focus in the following discussion on bipartite networks, although we anticipate that the theoretical framework and the algorithm we present here can be applied to any square or rectangular matrix, bipartite or not, directed or undirected, with non-negative entries. We consider bipartite networks where nodes of one kind, representing for example, plants indexed by a variable i = 1, . . . , N, can only be connected with nodes of another kind, e.g. pollinators indexed by another variable a = 1, . . . , M, as seen in Fig. 1a. We denote by A_ia the element of the network’s N × M adjacency matrix: A_ia ≠ 0 if i and a are connected, and A_ia = 0 otherwise. Besides connectivity, the adjacency matrix encodes the interaction strength between nodes such that whenever i and a are connected, the strength of their interaction is A_ia = w_ia > 0. A ranking of the rows is represented by a permutation of the integers {1, 2, . . . , N}, denoted r ≡ {r₁, r₂, . . . , r_N}; a ranking of the columns is represented by a (different) permutation of the integers {1, 2, . . . , M}, denoted c ≡ {c₁, c₂, . . . , c_M}. More precisely, the r sequence arranges rows in ascending order of their ordinal rankings r_i such that row i is ranked higher than row j if r_i < r_j. Similarly, the c sequence arranges columns such that column a ranks higher than column b if c_a < c_b.

**Fig. 1: Modeling the nestedness maximization problem.**

To model the problem, one more concept is needed: network nestedness. Nestedness is the property whereby if j ranks lower than i, then the neighbors of j form a subset of the neighbors of i, as illustrated in Fig. 1b. Different rankings, i.e. different sequences r and c, produce different nested patterns, that is, nestedness is a function of the rankings. Therefore, any cost (energy) function that seeks to quantify matrix nestedness must be a function of the rankings r and c. The simplest energy function that does the job, aside from trivial cases (see Supplementary Note 1), is

$$E(r,c)=\mathop{\sum }\limits_{i=1}^{N}\mathop{\sum }\limits_{a=1}^{M}{A}_{ia}{r}_{i}{c}_{a}.$$

(1)

The product A_iar_ic_a penalizes strong interactions between low-rank nodes since they contribute a large amount to the cost function; thus, low-rank nodes typically interact weakly. Strong interactions are only allowed between high-rank nodes because, when A_ia is large, the product A_iar_ic_a can be made small by choosing r_i and c_a to be small. Furthermore, high-rank nodes can have moderate interactions with low-rank nodes because the product r_iA_iac_a can still be relatively small when r_i is large and c_a is small (or vice versa) provided A_ia is not too large (hence the name ‘moderate’ interaction).

The assumptions of our model are relevant to diverse scenarios where nestedness has been observed. In bipartite networks of countries connected to their exported products, we could interpret r_i as the fitness of country i and c_a as the inverse of the complexity of product a. In this scenario, high-energy links r_iA_iac_a represent the higher barriers faced by underdeveloped countries to produce and export sophisticated products³, whereas low-energy links represent competitive countries exporting ubiquitous products. In mutualistic ecological networks, high-energy links represent the higher extinction risk for specialist pollinators to be connected with specialist plants, whereas low-energy links represent connections within the core of generalist nodes⁶ as depicted in Fig. 1b.

With this premise it should be clear that to maximize nestedness, we have to minimize the energy function in Eq. (1). More precisely, nestedness maximization is the mathematical optimization problem in which we seek to find the optimal sequences r* and c* that minimize the energy function, i.e. $\mathop{\min }\limits_{r,c}E(r,c)=E({r}^{* },{c}^{* })$. Since the sequence r is a permutation of the ordered sequence {1, 2, . . . , N}, we can always write ${r}_{i}=\mathop{\sum }\nolimits_{n = 1}^{N}{P}_{in}n$, where P is a N × N permutation matrix. Similarly, we can write ${c}_{a}=\mathop{\sum }\nolimits_{m = 1}^{M}{Q}_{am}m$ where Q is a M × M permutation matrix. Therefore, the energy function, considered as a function of the permutation matrices P and Q, can be rewritten in the form

$$E(r,c)=E(P,Q)={{{{{{{\rm{Tr}}}}}}}}\left({P}^{t}AQ{B}^{t}\right),$$

(2)

where B is an N × M matrix with entries B_ia = ia, as shown in Fig. 1c. In this language, the NMP is simply the problem of finding the permutations P^* and Q^* that minimizes the energy function given by Eq. (2), which mathematically reads

$$({P}^{*},{Q}^{*})=\mathop{{{{{{\mathrm{arg}}}}}} \, {{{{{\mathrm{min}}}}}} }\limits_{P,Q}E(P,Q).$$

(3)

The geometric meaning of the optimal permutations P^* and Q^* is clear if we apply them to the adjacency matrix as P^tAQ = A_nest, in that the nested structure of A is visually manifest in A_nest, as illustrated in Fig. 1c. The optimization problem defined by Eqs. (2) and (3) can be recognized as an instance of the Quadratic Assignment Problem in the Koopmans-Beckmann form²⁴, one of the most important problems in combinatorial optimization, that is known to be NP-hard. The formal mathematical mapping of the NMP onto an instance of QAP represents our first and most important result. Having formulated the NMP in the language of permutation matrices, we move next to solve it using a statistical physics approach.

Solving the NMP with statistical physics

Our basic tool to study the NMP is the partition function Z(β) defined by

$$Z(\beta )=\mathop{\sum}\limits_{P,Q}{e}^{-\beta E(P,Q)},$$

(4)

where β is an external control parameter akin to the inverse temperature in the statistical physics language. The partition function Z(β) provides a tool to determine the global minimum of the energy function via the limit

$$E({P}^{* },{Q}^{* })=-\mathop{\lim }\limits_{\beta \to \infty }\frac{1}{\beta }\ln Z(\beta )$$

(5)

Calculating the partition function may seem hopeless since it requires to evaluate and sum up N!M! terms. Nonetheless, the calculation is greatly simplified in the limit of large β, since we can evaluate Z(β) via the steepest descent method. The strategy consists of two main steps. The first step is to work out an integral representation of Z(β) of the form

$$Z(\beta )=\int\,DXDY\,{{\rm {e}}}^{-\beta F(X,Y)},$$

(6)

where the integral is over the space of N × N doubly-stochastic (DS) matrices X and M × M DS matrices Y, that converge onto permutation matrices P and Q when β → ∞; and F(X, Y) is an effective cost function that coincides with E(P, Q) for β → ∞. The second step is to find the stationary points of F(X, Y) by zeroing the derivatives ∂F/∂X = ∂F/∂Y = 0, resulting in a set of self-consistent equations for X and Y, called saddle point equations. All steps of the calculation are explained in great detail in the “Methods” subsection “Derivation of the saddle point equations”. The resulting saddle point equations are given by

$${X}_{ij} ={u}_{i}\exp \left[-\beta {(AY{B}^{t})}_{ij}\right]{v}_{j},\\ {Y}_{ab} ={\mu }_{a}\exp \left[-\beta {({A}^{t}XB)}_{ab}\right]{\nu }_{b},$$

(7)

where u, v are N-dimensional vectors and μ, υ are M-dimensional vectors determined by imposing that all row and column sums of X and Y are equal to 1. At this point, we can exploit the specific form of matrix B, i.e. B_ia = ia, to further simplify Eq. (7). Specifically, we define the stochastic rankings ρ_i and σ_a as

$${\rho }_{i}=\mathop{\sum }\limits_{k=1}^{N}{X}_{ik}\,k,\quad {\sigma }_{a}=\mathop{\sum }\limits_{b=1}^{M}{Y}_{ab}\,b,$$

(8)

whereby we can cast Eq. (7) in the following vectorial form (details in the “Methods” subsection “Derivation of the saddle point equations”)

$${\rho }_{i} = \frac{{\sum }_{k}k\,{v}_{k}\,{{\rm {e}}}^{-\beta k{\sum }_{a}{A}_{ia}{\sigma }_{a}}}{{\sum }_{k}{v}_{k}\,{{\rm {e}}}^{-\beta k{\sum }_{a}{A}_{ia}{\sigma }_{a}}},\\ {\sigma }_{a} = \frac{{\sum }_{c}c\,{\nu }_{c}\,{{\rm {e}}}^{-\beta c{\sum }_{i}{A}_{ia}{\rho }_{i}}}{{\sum }_{c}{\nu }_{c}\,{{\rm {e}}}^{-\beta c{\sum }_{i}{A}_{ia}{\rho }_{i}}},$$

(9)

where the normalizing vectors v and υ satisfy

$$\frac{1}{{v}_{j}} = \mathop{\sum}\limits_{i}{\left[\mathop{\sum}\limits_{k}{v}_{k}{{\rm {e}}}^{-\beta (k-j)\mathop{\sum}\limits_{a}{A}_{ia}{\sigma }_{a}}\right]}^{-1},\\ \frac{1}{{\nu }_{b}} = \mathop{\sum}\limits_{a}{\left[\mathop{\sum}\limits_{c}{\nu }_{c}{{\rm {e}}}^{-\beta (c-b)\mathop{\sum}\limits_{i}{A}_{ia}{\rho }_{i}}\right]}^{-1}.$$

(10)

Equations (9) and (10) represent our second most important result and, when interpreted as iterative equations, provide a simple algorithm to solve the NMP, whose implementation is discussed in detail in the “Methods” subsection “ Algorithm”. Note that ρ and σ converge to the actual ranking r and c for β → ∞. However, in practice, we solve Eqs. (9) and (10) iteratively at finite β. Once we reach convergence, we estimate r and c by simply sorting the entries of ρ and σ. We observe that larger values of β give better results, i.e., lower values of the cost E(r, c), as seen in Fig. 2a. A full discussion of convergence and bounds of our algorithm will be published elsewhere (see also discussion in Supplementary Note 2). Here, we test its performance by applying it to many real mutualistic and economic networks and show that we obtain better results than state-of-the-art network metrics and genetic algorithms, as discussed next.

**Fig. 2: Numerical solution and comparison with other methods in ecological networks.**

Nestedness maximization in empirical matrices

We apply our algorithm to two classes of empirical bipartite networks: plant-pollinator networks relevant to the study of mutualistic networks in ecology and country-activity networks relevant to the economic complexity field, where the activities represent the technological outputs on green innovations (green technology networks) or the export flows of physical goods (international trade networks). The 47 ecological mutualistic networks are freely downloadable at https://www.web-of-life.es/, whose filenames can be found in the first column of Table 1. The green technology networks that link countries and green technologies are sourced from REGPAT, accessible upon request at [https://www.oecd.org/sti/inno/intellectual-property-statistics-and-analysis.htm#ip-data]. These green technologies fall under the classification of the Cooperative Patent Classification (CPC) within the Y02/Y04S classes encompassing technologies linked to Climate Change Mitigation and Adaptation. Further, the international trade networks that link the countries with the goods they competitively export are based on the COMTRADE data collected by the United Nations (the database is not freely available, but sample data can be found at https://comtradeplus.un.org/). The exported products are classified following the SITC version-2 classification, which includes 184 classes of goods. On the geographical dimension, we consider the top 65 countries in terms of trade volume, which accounts for more than 90% of the total volume.

Table 1 Numerical results on real mutualistic networks from the Web of Life database

Full size table

To standardize the comparison with existing methods, we first consider binary matrices. Subsequently, we shall study the impact of links’ weights on our algorithm’s performance. We binarize the adjacency matrices of the mutualistic networks by setting A_ij = 1 if nodes i and j are connected and zero otherwise, thus ignoring the weights; for the economic networks, we binarize the matrices with the same thresholding criterion used in previous works^3,5, based on the countries’ revealed comparative advantage (RCA) on the technologies or products. Despite this simplification, we emphasize that our algorithm can be applied, as is, to any bipartite network with nonnegative weights of the most general form, as shown below. We run four different methods comprising: naive degree²⁵, fitness-complexity (FC)³, minimal extremal metric (MEM)⁴, and BINMATNEST². While BINMATNEST is the state-of-the-art algorithm in ecology for nestedness maximization²⁶, the effectiveness of the FC^27,28 and MEM⁴ has been proved in recent works in economic complexity, which also connected the FC to the Sinkhorn algorithm from optimal transport^28,29,30.

We compute the value of the cost function E(r, c) for r and c returned by each of the analyzed methods and compare it to the value E_NMP returned by our algorithm (see Supplementary Note 1 for implementation details of the analyzed methods). We find that the proposed algorithm outperforms all the state-of-the-art methods in finding a better (i.e. lower) cost of the nodes’ ranking. In ecological mutualistic networks, our method finds a better (i.e. lower) cost than degree, FC, and MEM on 100% of the empirical mutualistic networks (Fig. 2b). When compared to BINMATNEST (the state-of-the-art method in ecology for the NMP), we find a better (or equal) cost in 80% of the instances, as seen in Fig. 2b and Table 1. In the green technology and international trade networks, our algorithm finds a better (i.e. lower) cost than FC—the state-of-the-art method in economic complexity—across all the analyzed years (see Fig. 3). In Fig. 2c, we show an application of the similarity transformation that brings the adjacency matrix to its maximally nested form. We call P and Q the optimal permutations that solve the QAP in Eq. (3) (details in the “Methods” subsection “Algorithm”), and we perform the similarity transformation

$$A\to {P}^{{t}}AQ,$$

(11)

which reveals the nested structure of the adjacency matrix (see Fig. 2c).

**Fig. 3: Comparison with other methods in economic networks.**

Next, we find that including the links’ weights in the optimal ranking calculation does not substantially alter the rankings or the convergence time. We perform a comparison between the optimal rankings when using binarized versus weighted adjacency matrices; we do that on a subset of 12 networks, which are weighted. For each one of these 12 networks, we run our algorithm first on the weighted adjacency matrix and then on its binarized version. Then, we measure the correlation between the rankings obtained in the two cases, separately for the rows’ and columns’ rankings. In Fig. 4a, we report the square of the correlation coefficient R² between the rankings of the weighted and binarized matrices and show that including the weights on the interaction links do not change the rankings substantially: high correlations are found in all data sets (Fig. 4b) but one (Fig. 4c).

**Fig. 4: Comparison of the rankings in binary versus weighted adjacency matrices.**

Furthermore, we asked whether the weights have any impact on the convergence time of the algorithm. Therefore, we measured the execution time of the algorithm in both the binary and weighted cases. In Fig. 4d we plot the ratio T_b/T_w, and we observe that, for the majority of networks, it is <1, meaning that convergence is faster in the weighted case than in the binary one. More generally, all data points are contained in the strip T_b/T_w ∈ [0.5, 1.5], which means that both execution times are of the same order of magnitude.

Finally, we suggest that deviations between the ranks by degree and our method (Fig. 4a) may reveal structurally important or vulnerable species. For example, in Network31, pollinator (row) 31 has a much better rank by NMP than by degree (upward blue arrow in Fig. 5b); the reverse is true for higher-degree pollinator 2 (downward red arrow in in Fig. 5b). Pollinator 2 has indeed higher degree than pollinator 31 (7 vs. 2 interaction partners). However, both Pollinator 31’s interaction partners are specialists that only interact with Pollinator 31. As a result, pollinator 31’s extinction would trigger their extinction in a co-extinction cascade process^16,31. The same is not true for pollinator 2’s interaction partners, which all have at least one extra interaction partner besides pollinator 2, which makes them less vulnerable to pollinator 2’s potential extinction. Therefore, pollinator 31’s higher NMP rank reflects its higher importance for the robustness of the system compared to the higher-degree pollinator 2. Similar examples can be found among the columns ranks (Fig. 5c), and they point to the better ability of the NMP to identify structurally important species compared to the degree, a hypothesis that may be falsified by future empirical tests.

**Fig. 5: Comparison of Nestedness maximization vs ranking by degree.**

Discussion

In this work we introduced a framework to calculate analytically the ranks of the nodes of a bipartite network. The proposed approach requires the specification, for each link, of a cost function that depends on the rankings of the interacting nodes. This formulation allowed us to recast the Nestedness Maximization Problem as an instance of the Quadratic Assignment Problem, which we tackled with statistical physics techniques. In particular, we obtained a mean field solution by using the steepest-descent approximation of the partition function. The corresponding saddle-point equations depend on a single hyper-parameter (the inverse temperature β) and can be solved by iteration to find the optimal rankings of the rows and columns of the adjacency matrix that result in a maximally nested layout. We benchmarked our algorithm against other methods on several real ecological and economic networks and showed that our algorithm outperforms the best existing algorithm in +80% of the instances.

We conclude by outlining the potential applications of our work in economic complexity and ecology and discuss how the proposed method could be generalized to higher-order interactions. In economic complexity, algorithms such as the fitness-complexity and its variants provide heuristic solutions to the NMP; these algorithms have been recently validated via their ability to forecast the future development trajectories of nations⁵ and provided policy-relevant insights^21,22,23. Our work provides an analytic ranking algorithm based on first principles, which helps move toward a microfoundation of the methods routinely used in the field²⁸ and could potentially inspire the next generation of economic-complexity algorithms.

In ecology, measures of nestedness based on the NMP have been widely popular, especially in biogeography¹. The use of the energy function derived here as a measure of nestedness comes with two important caveats. First, aligned with the nestedness temperature and BINMATNEST, using the optimal nestedness energy as a measure of nestedness would assume that the relevant degree of nestedness is the one provided by the optimal ranking. Under this assumption, compared to genetic algorithms that provide no insight into the ranking mechanism and act as black boxes³¹, our method is more interpretable as it explicitly links the ranking variables with the cost of each interaction. At the same time, ecologists have often been interested not in the nestedness by optimal ranking but in the level of nestedness when rows and columns are ranked by degree^32,33 or by ecological properties of interest such as islands’ areas and isolation in biogeography³⁴, species’ abundance in interaction networks³⁵, and more^7,36. These different perspectives generally lead to different quantitative insights: for example, the optimal energy derived here correlates positively with the standard NODF metric based on the ranking by degree (Pearson’s r = 0.45 in the 47 analyzed mutualistic networks), yet the correlation is far from one, which indicates that the two metrics convey different information.

We note that these different perspectives on nestedness can be incorporated into our framework as a non-interacting problem where the energy function couples the ranking variables with a suitable external field (see Supplementary Note 1). We emphasize that our work does not provide a criterion on whether to use the optimal ranking by a non-interacting energy function (which leads to the ranking by degree used by the popular NODF³³ or other ecological properties of interest, e.g., those used in ecological gradient analyses) or the optimal ranking by the quadratic energy function (which leads to the ranking calculated analytically here). Our work introduces the general framework that directly connects ranking variables, the adjacency matrix’ entries, and the global energy of the network. Which energy function (and nestedness metric) ought to be used is ultimately dependent on the research question of interest. At any rate, even when one is not interested in the maximal nestedness per se, the maximal value provided by our interacting model still has value as a benchmark to understand how far the observed nestedness by a given ecological property of interest (e.g., species abundance) is from the theoretically maximum attainable, which can be used to test which ecological property explains most of the system’s nestedness³⁵.

Second, ecologists have long realized that measures of nestedness may exhibit dependencies on matrix size and density^20,33. For this reason, it is a standard practice in ecology to compare the observed degree of nestedness with that of suitably randomized matrices, which typically embed passive sampling mechanisms one wishes to test³⁶. We note here that the same null model analyses performed for other nestedness metrics can also be applied to the energy function derived here, and the choice of the null model should be determined by ecological considerations as prescribed by the relevant literature (see ref. ³⁷ for the most recent review).

From a network science standpoint, we note that by changing the definition of the matrix B, i.e. using measures other than a sequence of ordinal numbers, one can repurpose our algorithm to rank rows and columns of a matrix according to geometric patterns other than nestedness^38,39. Therefore, the proposed framework holds promise for the effective detection of a wide range of network structural patterns beyond the nestedness considered here. Finally, the present framework can be easily extended and applied to solve the ranking problem in networks with higher-order interactions. For example, given the adjacency tensor A_iaγ for a system with 3-body interactions, we can define the energy function E(P, Q, R) to be optimized over 3 permutation matrices P, Q, and R following exactly the same steps outlined in this paper for the case of pairwise interactions. This may be especially relevant in the world trade for ranking countries according to both exported and imported goods.

Methods

Derivation of the saddle point equations

In this section, we discuss in detail how to derive the saddle point Eq. (7) given in the main text. We consider the minimization problem defined by

$$({r}^{* },{c}^{* })=\mathop{{{{{{\mathrm{arg}}}}}} \, {{{{{\mathrm{min}}}}}} }\limits_{r\in {{{{{{{{\mathcal{R}}}}}}}}}_{N},c\in {{{{{{{{\mathcal{R}}}}}}}}}_{M}}E(r,c),$$

(12)

where the cost (energy) function is given by

$$E(r,c)=\mathop{\sum }\limits_{i=1}^{N}\mathop{\sum }\limits_{a=1}^{M}{A}_{ia}\,{r}_{i}\,{c}_{a},$$

(13)

and ${{{{{{{{\mathcal{R}}}}}}}}}_{N}$ and ${{{{{{{{\mathcal{R}}}}}}}}}_{M}$ are the sets of all vectors r and c obtained by permuting the entries of the representative vectors r⁰ and c⁰ defined as

$${r}^{0} \equiv (1,2,3,...,N),\\ {c}^{0} \equiv (1,2,3,...,M).$$

(14)

Therefore, we can write any two vectors r and c as

$${r}_{i} =\mathop{\sum }\limits_{j=1}^{N}{P}_{ij}{r}_{j}^{0},\\ {c}_{a} =\mathop{\sum }\limits_{a=1}^{M}{Q}_{ab}{c}_{b}^{0},$$

(15)

where P and Q are arbitrary permutation matrices of size N × N and M × M, respectively. Furthermore, we introduce the N × M matrix B, defined as the tensor product of r⁰ and c⁰, whose components are explicitly given by

$${B}_{ia}={({r}^{0}\otimes {c}^{0})}_{ia}=ia.$$

(16)

With these definitions, we can rewrite the energy function as the trace of a product of matrices in the following way:

$$E\equiv E(P,Q)={{{{{{{\rm{Tr}}}}}}}}({P}^{{\rm {t}}}AQ{B}^{{\rm {t}}}).$$

(17)

The minimization problem in Eq. (12) can be reformulated as a minimization problem in the space of permutation matrices as follows:

$$({P}^{* },{Q}^{* })=\mathop{{{{{{\rm{arg}}}}}} \, {{{{{\rm{min}}}}}} }\limits_{(P\in {{{{{{{{\mathcal{S}}}}}}}}}_{N},Q\in {{{{{{{{\mathcal{S}}}}}}}}}_{M})}E(P,Q),$$

(18)

where ${{{{{{{{\mathcal{S}}}}}}}}}_{N}$ and ${{{{{{{{\mathcal{S}}}}}}}}}_{M}$ denote the symmetric groups on N and M elements, respectively.

Next we discuss a relaxation of the problem in Eq. (18) that amounts to extending the spaces ${{{{{{{{\mathcal{S}}}}}}}}}_{N}$ and ${{{{{{{{\mathcal{S}}}}}}}}}_{M}$ of permutation matrices onto the spaces of doubly-stochastic (DS) matrices ${{{{{{{{\mathcal{D}}}}}}}}}_{N}$ and ${{{{{{{{\mathcal{D}}}}}}}}}_{M}$. The space ${{{{{{{{\mathcal{D}}}}}}}}}_{N}$ (${{{{{{{{\mathcal{D}}}}}}}}}_{M}$) is a superset of the original space ${{{{{{{{\mathcal{S}}}}}}}}}_{N}$ (${{{{{{{{\mathcal{S}}}}}}}}}_{M}$). Solving the problem on the ${{{{{{{\mathcal{D}}}}}}}}$-space means to find two doubly-stochastic matrices X^* and Y^* that minimize an ‘effective’ cost function F, i.e.

$$F({X}^{* },{Y}^{* })=\mathop{\min }\limits_{(X\in {{{{{{{{\mathcal{D}}}}}}}}}_{N},Y\in {{{{{{{{\mathcal{D}}}}}}}}}_{M})}F(X,Y),$$

(19)

and are only ‘slightly different’ from the permutation matrices P^* and Q^* (we will specify later what ‘slightly different’ means in mathematical terms and what F actually is). The quantity which plays the fundamental role in the relaxation procedure of the original problem is the partition function, Z(β), defined by

$$Z(\beta )=\mathop{\sum}\limits_{P\in {{{{{{{{\mathcal{S}}}}}}}}}_{N}}\mathop{\sum}\limits_{Q\in {{{{{{{{\mathcal{S}}}}}}}}}_{M}}{{\rm {e}}}^{-\beta E(P,Q)}.$$

(20)

The connection between Z(β) and the original problem in Eq. (18) is established by the following limit:

$$\mathop{\lim }\limits_{\beta \to \infty }-\frac{1}{\beta }\log Z(\beta )=\mathop{\min }\limits_{(P\in {{{{{{{{\mathcal{S}}}}}}}}}_{N},Q\in {{{{{{{{\mathcal{S}}}}}}}}}_{M})}E(P,Q).$$

(21)

The optimization problem in Eq. (18) is thus equivalent to the problem of calculating the partition function in Eq. (20). Ideally, we would like to compute exactly Z(β) for arbitrary β and then take the limit β → ∞. Although an exact calculation of the partition function is, in general, out of reach, in practice we may well expect that the better we estimate Z(β), the closer the limit in Eq. (21) will be to the true optimal solution. In fact, the procedure of relaxation is basically a procedure to assess the partition function for large but finite β. Mathematically, this procedure is called the method of steepest descent. By estimating the partition function via the steepest descent method, we will obtain a system of non-linear equations, called saddle-point equations, whose solution is a pair of doubly-stochastic matrices X^*, Y^* that solve the relaxed problem given by Eq. (19). Eventually, the solution to the original problem in Eq. (18) can be obtained formally by projecting X^*, Y^* onto the subspaces ${{{{{{{{\mathcal{S}}}}}}}}}_{N},{{{{{{{{\mathcal{S}}}}}}}}}_{M}\subset {{{{{{{{\mathcal{D}}}}}}}}}_{N},{{{{{{{{\mathcal{D}}}}}}}}}_{M}$ via the limit

$$\mathop{\lim }\limits_{\beta \to \infty }{X}^{* }(\beta ) ={P}^{* },\\ \mathop{\lim }\limits_{\beta \to \infty }{Y}^{* }(\beta ) ={Q}^{* }.$$

(22)

Having explained the rationale for the introduction of the partition function, we move next to discuss the details of the calculation leading to the saddle point equations.

In order to cast the partition function in a form suitable for the steepest-descent evaluation, we need the following preliminary result.

Definition

Semi-permutation matrix: A N × N square matrix is called a semi-permutation matrix if and each row sums to one, i.e. for i = 1, . . . , N, but no further constraint on the column sums is imposed.

We denote the space of semi-permutation matrices:

(23)

Lemma

Consider an arbitary N × N square matrix G and the function W(G) defined by

(24)

Then, W(G) is explicitly given by the following formula

(25)

Proof

Let us write the right-hand side of Eq. (24) as

(26)

where is the i^th row of (and thus is a vector) having one component equal to 1 and the remaining N−1 components equal to 0. The sum denotes a summation over all possible choices of the vector : there are N possible such choices, namely Hence, each sum in the right-hand side of Eq. (26) evaluates

(27)

Thus, the left-hand side of Eq. (26) is equal to

(28)

Eventually, by taking the logarithm of both sides of Eq. (28), we prove Eq. (25).

With these tools at hand, we move to derive the integral representation of Z(β). We use the definition of the Dirac δ-function to write the partition function in Eq. (20) as follows

$$Z(\beta )=\mathop{\sum}\limits_{P\in {{{{{{{{\mathcal{S}}}}}}}}}_{N}}\mathop{\sum}\limits_{Q\in {{{{{{{{\mathcal{S}}}}}}}}}_{M}}\int\,DX\int\,DY{{\rm {e}}}^{-\beta E(X,Y)}\mathop{\prod }\limits_{i,j=1}^{N}\delta ({X}_{ij}-{P}_{ij})\mathop{\prod }\limits_{a,b=1}^{N}\delta ({Y}_{ab}-{Q}_{ab}),$$

(29)

where the integration measures are defined by DX ≡ ∏_i,jdX_ij and DY ≡ ∏_a,bdY_ab. The next step is to transform the sum over permutation matrices P, Q into a sum over semi-permutation matrices and then performing explicitly this sum using the Lemma in Eq. (25). In order to achieve this goal, we insert into Eq. (29) N delta functions $\mathop{\prod }\nolimits_{j = 1}^{N}\delta \left({\sum }_{i}{X}_{ij}-1\right)$ and M delta functions $\mathop{\prod }\nolimits_{b = 1}^{M}\delta \left({\sum }_{a}{Y}_{ab}-1\right)$ to enforce the conditions that the columns of X and Y do sum up to one. By inserting these delta functions, we can then replace the sum over P, Q by a sum over , thus obtaining

(30)

To proceed further in the calculation, we use the following integral representations of the delta-functions:

(31)

into Eq. (30) and we get

(32)

where we defined the integration measures $D\hat{X}\equiv {\prod }_{i,j}d{\hat{X}}_{ij}/2\pi i$, $D\hat{Y}\equiv {\prod }_{a,b}d{\hat{Y}}_{ab}/2\pi i$, Dz ≡ ∏_jdz_j/2πi, and Dw ≡ ∏_bdw_b/2πi. Performing the sums over and using Eq. (25) we obtain

$$Z(\beta ) = \int \,DXDYD\hat{X}D\hat{Y}DzDw\,{{\rm {e}}}^{-\beta E(X,Y)}{{\rm {e}}}^{-{{{{{{{\rm{Tr}}}}}}}}(\hat{X}{X}^{t})+W(\hat{X})-{{{{{{{\rm{Tr}}}}}}}}(\hat{Y}{Y}^{t})+W(\hat{Y})}\\ \times {{\rm {e}}}^{-\mathop{\sum}\limits_{j}{z}_{j}\left(\mathop{\sum}\limits_{i}{X}_{ij}-1\right)}{{\rm {e}}}^{-\mathop{\sum}\limits_{b}{w}_{b}\left(\mathop{\sum}\limits_{a}{Y}_{ab}-1\right)}.$$

(33)

Next we introduce the effective cost function $F(X,\hat{X},Y,\hat{Y},z,w)$ defined as

$$F(X,\hat{X},Y,\hat{Y},z,w) = \,E(X,Y)+\frac{1}{\beta }{{{{{{{\rm{Tr}}}}}}}}(\hat{X}{X}^{t})+\frac{1}{\beta }{{{{{{{\rm{Tr}}}}}}}}(\hat{Y}{Y}^{t})-\frac{1}{\beta }W(\hat{X})-\frac{1}{\beta }W(\hat{Y})\\ +\frac{1}{\beta }\mathop{\sum}\limits_{j}{z}_{j}\left(\mathop{\sum}\limits_{i}{X}_{ij}-1\right)+\frac{1}{\beta }\mathop{\sum}\limits_{b}{w}_{b}\left(\mathop{\sum}\limits_{a}{Y}_{ab}-1\right) \\ \equiv \,E(X,Y)-\frac{1}{\beta }S(X,\hat{X},Y,\hat{Y},z,w)$$

(34)

whereby we can write the partition function as

$$Z(\beta )=\int\,DXDYD\hat{X}D\hat{Y}DzDw\,{{\rm {e}}}^{-\beta F(X,\hat{X},Y,\hat{Y},z,w)},$$

(35)

which can be evaluated by the steepest descent method when β → ∞, as we explain next.

In the limit of large β the integral in Eq. (35) is dominated by the saddle point where E(X, Y) is minimized and $S(X,\hat{X},Y,\hat{Y},z,w)$ is stationary (in order for the oscillating contributions to not cancel out). In order to find the saddle point, we have to set the derivatives of $F(X,\hat{X},Y,\hat{Y},z,w)$ to zero, thus obtaining the following saddle point equations

$$\frac{\partial F}{\partial {X}_{ij}} = \frac{\partial E}{\partial {X}_{ij}}+\frac{1}{\beta }\left({\hat{X}}_{ij}+{z}_{j}\right) = 0,\\ \frac{\partial F}{\partial {\hat{X}}_{ij}} = \frac{1}{\beta }{X}_{ij}-\frac{1}{\beta }\frac{\partial W}{\partial {\hat{X}}_{ij}} = 0,\\ \frac{\partial F}{\partial {z}_{j}} =\mathop{\sum}\limits_{i}{X}_{ij}-1=0,$$

(36)

and similar equations for the triplet $(Y,\hat{Y},w)$. The derivative of E with respect to X_ij gives

$$\frac{\partial E}{\partial {X}_{ij}}={(AY{B}^{t})}_{ij},$$

(37)

and the derivative of W with respect to ${\hat{X}}_{ij}$ gives

$$\frac{\partial W}{\partial {\hat{X}}_{ij}}=\frac{{{\rm {e}}}^{{\hat{X}}_{ij}}}{{\sum }_{k}{{\rm {e}}}^{{\hat{X}}_{ik}}}.$$

(38)

Solving Eq. (36) with respect to X_ij we get

$${X}_{ij}=\frac{{{\rm {e}}}^{-\beta {(AY{B}^{t})}_{ij}-{z}_{j}}}{{\sum }_{k}{{\rm {e}}}^{-\beta {(AY{B}^{t})}_{ik}-{z}_{k}}}.$$

(39)

Analogously, solving with respect to Y_ab we get

$${Y}_{ab}=\frac{{e}^{-\beta {({A}^{t}XB)}_{ab}-{w}_{b}}}{{\sum }_{c}{e}^{-\beta {({A}^{t}XB)}_{ac}-{w}_{c}}}.$$

(40)

It is worth noticing that Eqs. (39) and (40) are invariant under the tranformations

$$\begin{array}{rcl}{z}_{j}\,&\to &\,{z}_{j}+\zeta ,\\ {w}_{b}\,&\to &\,{w}_{b}+\xi ,\end{array}$$

(41)

for arbitrary values of ζ and ξ. This translational symmetry is due to the fact that the 2N constraints on the row and column sums of P are not linearly independent since the sum of all entries of P must be equal to N, i.e. ∑_ijP_ij = N. The same reasoning applies to the 2M constraints on the row and column sums of Q, of which only 2M−1 are linearly independent since ∑_abQ_ab = M. Furthermore, we notice that the solution matrices X and Y in Eqs. (39), (40) automatically satisfy the condition of having row sums equal to one. Next, we derive the equations to determine the Lagrange multipliers z_j and w_b. To this end, we first introduce the vectors v and $\nu$ with components

$${v}_{j} ={{\rm {e}}}^{-{z}_{j}},\\ {\nu }_{b} ={{\rm {e}}}^{-{w}_{b}}.$$

(42)

Then, we define the vectors u and μ as

$${u}_{i} ={\left(\mathop{\sum}\limits_{k}{{\rm {e}}}^{-\beta {(AY{B}^{t})}_{ik}}{v}_{k}\right)}^{-1},\\ {\mu }_{a} ={\left(\mathop{\sum}\limits_{c}{{\rm {e}}}^{-\beta {({A}^{t}XB)}_{ac}}{\nu }_{c}\right)}^{-1},$$

(43)

so that we can write the solution matrices X and Y in Eqs. (39), (40) as

$${X}_{ij} =\, {u}_{i}\,{{\rm {e}}}^{-\beta {(AY{B}^{t})}_{ij}}\,{v}_{j},\\ {Y}_{ab} = \, {\mu }_{a}\,{{\rm {e}}}^{-\beta {({A}^{t}XB)}_{ab}}\,{\nu }_{b}.$$

(44)

Finally, imposing the conditions on X and Y to have column sums equal to one, we find the equations to be satisfied by v and $\nu$

$${v}_{j} ={\left(\mathop{\sum}\limits_{i}{u}_{i}{{\rm {e}}}^{-\beta {(AY{B}^{t})}_{ij}}\right)}^{-1},\\ {\nu }_{b} ={\left(\mathop{\sum}\limits_{a}{\mu }_{a}{{\rm {e}}}^{-\beta {({A}^{t}XB)}_{ab}}\right)}^{-1},$$

(45)

Equations (43)–(45) are the constitutive equations for the relaxed nestedness-maximization problem corresponding to Eq. (7) given in the main text.

We conclude this section by deriving the self-consistent equations for the stochastic rankings corresponding to Eqs. (9) and (10) given in the main text. We define the stochastic rankings as the two vectors

$${\rho }_{i} = \mathop{\sum }\limits_{k=1}^{N}{X}_{ik}\,k,\\ {\sigma }_{a} = \mathop{\sum }\limits_{a=1}^{M}{Y}_{ab}\,b,$$

(46)

where the term stochastic emphasizes their implied dependence on the doubly stochastic matrices X and Y. Clearly, we have

$$\mathop{\lim }\limits_{\beta \to \infty }{\rho }_{i}={r}_{i},\\ \mathop{\lim }\limits_{\beta \to \infty }{\sigma }_{a}={c}_{a}.$$

(47)

Next, let’s consider the argument of the exponentials in Eq. (44), which we can rewrite as

$${(AY{B}^{t})}_{ij} =\mathop{\sum}\limits_{a}{A}_{ia}\left(\mathop{\sum}\limits_{b}{Y}_{ab}\,b\right)j=j\mathop{\sum}\limits_{a}{A}_{ia}{\sigma }_{a},\\ {({A}^{t}XB)}_{ab} =\mathop{\sum}\limits_{i}{A}_{ia}\left(\mathop{\sum}\limits_{j}{X}_{ij}\,j\right)b=b\mathop{\sum}\limits_{i}{A}_{ia}{\rho }_{i}.$$

(48)

At this point is sufficient to multiply both sides of Eq. (44) by j and b, and sum over j and b, respectively, to obtain

$$\mathop{\sum}\limits_{j}{X}_{ij}\,j ={\rho }_{i}={u}_{i}\mathop{\sum}\limits_{j}{{\rm {e}}}^{-\beta {(AY{B}^{t})}_{ij}}\,{v}_{j}\,j={u}_{i}\mathop{\sum}\limits_{j}{{\rm {e}}}^{-\beta j\mathop{\sum}\limits_{a}{A}_{ia}{\sigma }_{a}}\,{v}_{j}\,j,\\ \mathop{\sum}\limits_{b}{Y}_{ab}\,b ={\sigma }_{a}={\mu }_{a}\mathop{\sum}\limits_{b}{{\rm {e}}}^{-\beta {({A}^{t}XB)}_{ab}}\,{\nu }_{b}\,b={\mu }_{a}\mathop{\sum}\limits_{b}{{\rm {e}}}^{-\beta b\mathop{\sum}\limits_{i}{A}_{ia}{\rho }_{i}}\,{\nu }_{b}\,b.$$

(49)

Using the definition of u_i and μ_a in Eq. (43) we obtain

$${\rho }_{i} =\frac{{\sum }_{j}{{\rm {e}}}^{-\beta j{\sum }_{a}{A}_{ia}{\sigma }_{a}}\,{v}_{j}\,j}{{\sum }_{j}{{\rm {e}}}^{-\beta j{\sum }_{a}{A}_{ia}{\sigma }_{a}}\,{v}_{j}},\\ {\sigma }_{a} = \frac{{\sum }_{b}{{\rm {e}}}^{-\beta b{\sum }_{i}{A}_{ia}{\rho }_{i}}\,{\nu }_{b}\,b}{{\sum }_{b}{{\rm {e}}}^{-\beta b{\sum }_{i}{A}_{ia}{\rho }_{i}}\,{\nu }_{b}},$$

(50)

which are the self-consistent Eq. (9) for ρ and σ given in the main text. There are still two unknown vectors in the previous equations: vectors v and $\nu$. In order to determine them, we consider Eq. (45) and eliminate u_i and μ_a using Eq. (43), thus obtaining

$${v}_{j} ={\left(\mathop{\sum}\limits_{i}{\left[\mathop{\sum}\limits_{k}{v}_{k}{e}^{-\beta (k-j)\mathop{\sum}\limits_{a}{A}_{ia}{\sigma }_{a}}\right]}^{-1}\right)}^{-1},\\ {\nu }_{b} ={\left(\mathop{\sum}\limits_{a}{\left[\mathop{\sum}\limits_{c}{\nu }_{c}{e}^{-\beta (c-b)\mathop{\sum}\limits_{i}{A}_{ia}{\rho }_{i}}\right]}^{-1}\right)}^{-1},$$

(51)

which are the self-consistent Eq. (10) for v and ν given in the main text.

Algorithm

The algorithm to solve Eqs. (50) and (51) consist of four basic steps, explained below.

1.
Initialize ρ_i uniformly at random in [1, N]; similarly, initialize σ_a uniformly at random in [1, M]. Also, initialize v_j and $\nu$_b uniformly at random in ($0,1$].
2.
Choose an initial value for β. To start, initialize β using the following formula:
$$\beta = {\beta}_{{{{{\rm{init}}}}}}= \frac{1}{{{{{{\rm{max}}}}}} \left[{N}\, {{{{{{\rm{max}}}}}}}_{i} \left\{{k}_{i}\right\},{M}\, {{{{{{\rm{max}}}}}}}_{a} \left\{{k}_{a} \right\}\right]},$$
(52)
where k_i = ∑_aA_ia, and k_a = ∑_iA_ia.
3.
Set τ = 1, and a tolerance 0 < TOL ≪ 1. Then run the following subroutine.
1. (a)
  Iterate Eq. (51) according to the following updating rules:
  $${v}_{j}(t+1) ={\left(\mathop{\sum}\limits_{i}{\left[\mathop{\sum}\limits_{k}{v}_{k}(t){{\rm {e}}}^{-\beta (k-j)\mathop{\sum}\limits_{a}{A}_{ia}{\sigma }_{a}}\right]}^{-1}\right)}^{-1},\\ {\nu }_{b}(t+1) ={\left(\mathop{\sum}\limits_{a}{\left[\mathop{\sum}\limits_{c}{\nu }_{c}(t){{\rm {e}}}^{-\beta (c-b)\mathop{\sum}\limits_{i}{A}_{ia}{\rho }_{i}}\right]}^{-1}\right)}^{-1},$$
  (53)
  until ∣v_j(t + 1)−v_j(t)∣ < TOL for all j AND ∣$\nu$_b(t + 1)−$\nu$_b(t)∣ < TOL for all b.
2. (b)
  Iterate Eq. (50) according to the following updating rules:
  $${\rho }_{i}(t+1) = \frac{{\sum }_{j}{{\rm {e}}}^{-\beta j{\sum }_{a}{A}_{ia}{\sigma }_{a}(t)}\,{v}_{j}\,j}{{\sum }_{j}{{\rm {e}}}^{-\beta j{\sum }_{a}{A}_{ia}{\sigma }_{a}(t)}\,{v}_{j}},\\ {\sigma }_{a}(t+1) = \frac{{\sum }_{b}{{\rm {e}}}^{-\beta b{\sum }_{i}{A}_{ia}{\rho }_{i}(t)}\,{\nu }_{b}\,b}{{\sum }_{b}{{\rm {e}}}^{-\beta b{\sum }_{i}{A}_{ia}{\rho }_{i}(t)}\,{\nu }_{b}},$$
  (54)
  until ∣ρ_i(t + 1)−ρ_i(t)∣ < TOL for all i AND ∣σ_a(t + 1)−σ_a(t)∣ < TOL for all a.
  
  Call ${\rho }_{i}^{(\tau )}$ and ${\sigma }_{a}^{(\tau )}$ the converged vectors and compute
  $${{{{{{{\rm{MAXDIFF}}}}}}}}\equiv \max \left\{\mathop{\max }\limits_{i}\left[{{\rho }_{i}^{(\tau )}}-{{\rho }_{i}^{(\tau -1)}}\right],\mathop{\max }\limits_{a}\left[{{\sigma }_{a}^{(\tau )}}-{{\sigma }_{a}^{(\tau -1)}}\right]\right\}.$$
  (55)
3. (c)
  If MAXDIFF < TOL, then RETURN ${\rho }_{i}^{(\tau )}$ and ${\sigma }_{a}^{(\tau )}$; otherwise increase τ by 1 and repeat from (a).
4.
Increase β → β + dβ and repeat from (3) or terminate if the returned vectors did not change from the previous iteration.

Having found the solution vectors ρ and σ, we convert them into integer rankings as follows. The smallest value of ρ_i is assigned rank 1. The second smallest is assigned rank 2, and so on and so forth. This procedure generates a mapping from 1, 2, . . . , N to i₁, i₂, . . . , i_N that can be represented by a N × N permutation matrix P_ij. The same procedure, applied to σ_a, generates an M × M permutation matrix Q_ij. Matrices P and Q represent the optimal permutations that solve the nestedness maximization problem. Eventually, the application of the similarity transformation

$$A\to {P}^{t}AQ,$$

(56)

brings the adjacency matrix into its maximally nested form having all nonzero entries clustered in the upper left corner, as seen in Fig. 2c of the main text.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Data that support the findings of this study are publicly available in the Web of Life database at https://www.web-of-life.es/and the COMTRADE database at https://comtradeplus.un.org/.

Code availability

The source code of the NMP solver can be downloaded from GitHub at https://github.com/flavianoM/NMP.

References

Atmar, W. & Patterson, B. D. The measure of order and disorder in the distribution of species in fragmented habitat. Oecologia 96, 373–382 (1993).
Article ADS Google Scholar
Rodríguez-Gironés, M. A. & Santamaría, L. A new algorithm to calculate the nestedness temperature of presence–absence matrices. J. Biogeogr. 33, 924–935 (2006).
Article Google Scholar
Tacchella, A., Cristelli, M., Caldarelli, G., Gabrielli, A. & Pietronero, L. A new metrics for countries’ fitness and products’ complexity. Sci. Rep. 2, 1–7 (2012).
Article Google Scholar
Wu, R.-J., Shi, G.-Y., Zhang, Y.-C. & Mariani, M. S. The mathematics of non-linear metrics for nested networks. Physica A 460, 254–269 (2016).
Article ADS MathSciNet Google Scholar
Tacchella, A., Mazzilli, D. & Pietronero, L. A dynamical systems approach to gross domestic product forecasting. Nat. Phys. 14, 861–865 (2018).
Article Google Scholar
Bascompte, J., Jordano, P., Melián, C. J. & Olesen, J. M. The nested assembly of plant–animal mutualistic networks. Proc. Natl Acad. Sci. USA 100, 9383–9387 (2003).
Article ADS Google Scholar
Mariani, M. S., Ren, Z.-M., Bascompte, J. & Tessone, C. J. Nestedness in complex networks: observation, emergence, and implications. Phys. Rep. 813, 1–90 (2019).
Article ADS MathSciNet Google Scholar
Cobo-López, S., Gupta, V. K., Sung, J., Guimerá, R. & Sales-Pardo, M. Stochastic block models reveal a robust nested pattern in healthy human gut microbiomes. PNAS Nexus 1, pgac055 (2022).
Article Google Scholar
König, M. D., Tessone, C. J. & Zenou, Y. Nestedness in networks: a theoretical model and some applications. Theor. Econ. 9, 695–752 (2014).
Article MathSciNet Google Scholar
Palazzi, M. J., Cabot, J., Canovas Izquierdo, J. L., Solé-Ribalta, A. & Borge-Holthoefer, J. Online division of labour: emergent structures in open source software. Sci. Rep. 9, 1–11 (2019).
Article Google Scholar
Palazzi, M. J. et al. An ecological approach to structural flexibility in online communication systems. Nat. Commun. 12, 1–11 (2021).
Article Google Scholar
Suweis, S., Simini, F., Banavar, J. R. & Maritan, A. Emergence of structural and dynamical properties of ecological mutualistic networks. Nature 500, 449–452 (2013).
Article ADS Google Scholar
Valverde, S. et al. The architecture of mutualistic networks as an evolutionary spandrel. Nat. Ecol. Evol. 2, 94–99 (2018).
Article Google Scholar
Maynard, D. S., Serván, C. A. & Allesina, S. Network spandrels reflect ecological assembly. Ecol. Lett. 21, 324–334 (2018).
Article Google Scholar
Cai, W., Snyder, J., Hastings, A. & D’Souza, R. M. Mutualistic networks emerging from adaptive niche-based interactions. Nat. Commun. 11, 1–10 (2020).
Article Google Scholar
Domínguez-García, V. & Munoz, M. A. Ranking species in mutualistic networks. Sci. Rep. 5, 1–7 (2015).
Article Google Scholar
Morone, F., Del Ferraro, G. & Makse, H. The k-core as a predictor of structural collapse in mutualistic ecosystems. Nat. Phys. 15, 95–102 (2019).
Article Google Scholar
Sciarra, C., Chiarotti, G., Ridolfi, L. & Laio, F. Reconciling contrasting views on economic complexity. Nat. Commun. 11, 1–10 (2020).
Article Google Scholar
Almeida-Neto, M., R. Guimarães Jr, P. & M. Lewinsohn, T. On nestedness analyses: Rethinking matrix temperature and anti-nestedness. Oikos 116, 716–722 (2007).
ADS Google Scholar
Payrató-Borràs, C., Hernández, L. & Moreno, Y. Measuring nestedness: a comparative study of the performance of different metrics. Ecol. Evol. 10, 11906–11921 (2020).
Article Google Scholar
Cristelli, M. C. A., Tacchella, A., Cader, M. Z., Roster, K. I. & Pietronero, L. On the Predictability of Growth. World Bank Policy Research Working Paper (World Bank, 2017).
Zaccaria, A., Mishra, S., Cader, M. Z. & Pietronero, L. Integrating Services in the Economic Fitness Approach. World Bank Policy Research Working Paper (World Bank, 2018).
Lin, J., Cader, M. & Pietronero, L. What African Industrial Development Can Learn from East Asian Successes. World Bank Publications—Reports 34852 (The World Bank Group, 2020).
Koopmans, T. C. & Beckmann, M. Assignment problems and the location of economic activities. Econometrica 25, 53–76 (1957).
Araujo, A. I., Corso, G., Almeida, A. M. & Lewinsohn, T. M. An analytic approach to the measurement of nestedness in bipartite networks. Physica A 389, 1405–1411 (2010).
Article ADS Google Scholar
Dormann, C. F. Using bipartite to describe and plot two-mode networks in r. R Package Version 4, 1–28 (2020).
Google Scholar
Lin, J.-H., Tessone, C. J. & Mariani, M. S. Nestedness maximization in complex networks through the fitness-complexity algorithm. Entropy 20, 768 (2018).
Article ADS Google Scholar
Mazzilli, D., Mariani, M. S., Morone, F. & Patelli, A. Equivalence between the fitness-complexity and the Sinkhorn–Knopp algorithms. J. Phys. Complex. 5, 015010 (2024).
Sinkhorn, R. & Knopp, P. Concerning nonnegative matrices and doubly stochastic matrices. Pacific J. Math. 21, 343–348 (1967).
Article MathSciNet Google Scholar
Marshall, A. W. & Olkin, I. Scaling of matrices to achieve specified row and column sums. Numer. Math. 12, 83–90 (1968).
Article MathSciNet Google Scholar
Bascompte, J. & Jordano, P. Mutualistic Networks (Princeton University Press, 2013).
Patterson, B. D. & Atmar, W. Nested subsets and the structure of insular mammalian faunas and archipelagos. Biol. J. Linn. Soc. 28, 65–82 (1986).
Article Google Scholar
Almeida-Neto, M., Guimaraes, P., Guimaraes Jr, P. R., Loyola, R. D. & Ulrich, W. A consistent metric for nestedness analysis in ecological systems: reconciling concept and measurement. Oikos 117, 1227–1239 (2008).
Article ADS Google Scholar
Lomolino, M. V. Investigating causality of nestedness of insular communities: selective immigrations or extinctions? J. Biogeogr. 23, 699–703 (1996).
Article Google Scholar
Krishna, A., Guimaraes Jr, P. R., Jordano, P. & Bascompte, J. A neutral-niche theory of nestedness in mutualistic networks. Oikos 117, 1609–1618 (2008).
Article ADS Google Scholar
Ulrich, W., Almeida-Neto, M. & Gotelli, N. J. A consumer’s guide to nestedness analysis. Oikos 118, 3–17 (2009).
Article ADS Google Scholar
Neal, Z. et al. Pattern detection in bipartite networks: a review of terminology, applications and methods. arXiv preprint arXiv:2310.01284 (2023).
Morone, F. Clustering matrices through optimal permutations. J. Phys.: Complex. 3, 035007 (2022).
ADS Google Scholar
De Bacco, C., Larremore, D. B. & Moore, C. A physical model for efficient ranking in networks. Sci. Adv. 4, eaar8260 (2018).
Article ADS Google Scholar

Download references

Acknowledgements

The Flatiron Institute is a division of the Simons Foundation. We acknowledge support from Air Force Office of Scientific Research(AFOSR): Grant FA9550-21-1-0236. M.S.M. acknowledges financial support from the URPP Social Networks at the University of Zurich and the Swiss National Science Foundation, Grant 100013-207888. D.M. and A.P. acknowledge the financial support from project PRIN 20223W2JKJ “WECARE”.

Author information

Authors and Affiliations

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, 610054, Chengdu, P.R. China
Manuel Sebastian Mariani
URPP Social Networks, University of Zurich, CH-8050, Zurich, Switzerland
Manuel Sebastian Mariani
Enrico Fermi Research Center, via Panisperna 89a, 00184, Rome, Italy
Dario Mazzilli & Aurelio Patelli
Department of Physics, New York University, New York, NY, USA
Dries Sels & Flaviano Morone
Center for Computational Quantum Physics, Flatiron Institute, New York, NY, USA
Dries Sels

Authors

Manuel Sebastian Mariani
View author publications
You can also search for this author in PubMed Google Scholar
Dario Mazzilli
View author publications
You can also search for this author in PubMed Google Scholar
Aurelio Patelli
View author publications
You can also search for this author in PubMed Google Scholar
Dries Sels
View author publications
You can also search for this author in PubMed Google Scholar
Flaviano Morone
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.S.M. and F.M. conceived the study and the ranking model; M.S.M. and F.M. performed all the analytic calculations and implemented the code for the new algorithm; M.S.M. implemented the baseline algorithms and applied them to the mutualistic networks; D.M. and A.P. preprocessed and analyzed the economic networks; M.S.M. and F.M. wrote the first draft of the manuscript; M.S.M., D.M., A.P., D.S., F.M. discussed the results as well as their implications, commented on the research at all stages, and helped revising the manuscript.

Corresponding author

Correspondence to Flaviano Morone.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Physics thanks Tim Larock and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer review file

Supplementary Information

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mariani, M.S., Mazzilli, D., Patelli, A. et al. Ranking species in complex ecosystems through nestedness maximization. Commun Phys 7, 102 (2024). https://doi.org/10.1038/s42005-024-01588-8

Download citation

Received: 14 July 2023
Accepted: 05 March 2024
Published: 21 March 2024
DOI: https://doi.org/10.1038/s42005-024-01588-8

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.