Introduction

Assessing the stability and robustness of complex ecosystems is a fundamental problem in conservation ecology1,2,3,4,5. The loss of an individual “keystone” species can induce cascade effects –i.e. a series of secondary extinctions triggered by the primary one– propagating the damage through the network. Thus, the relative “importance” of a given species within a ecological network could be gauged as a function of the eventual size of the cascade of extinctions its loss would potentially cause. A successful ranking of species importance should rank first those species that trigger larger extinction cascades.

In the context of food webs, species rankings have been long sought (see e.g. Refs. 6, 7). For example, Allesina and Pascual8 successfully applied the Google's PageRank algorithm9 to order species within food webs, much as Google ranks webpages.

Mutualistic ecological communities such as those formed by plants and their pollinators, plant seeds and their dispersers, or anemone and the fishes that inhabit them, etc. constitute another broadly studied set of ecological networks. These comprise two different sets of living beings that benefit from each other and as such can be represented in terms of bipartite networks10. Mutualistic networks turn out to have a very particular “nested” architecture11,12,13 in which specialist species –interacting with only a few mutualistic partners– tend to be connected with generalists (Figure 1). Such a nested design is believed to confer robustness against species loss and other systemic damages, thus fostering biodiversity14,15.

Figure 1
figure 1

Example of two different bipartite networks with different levels of nestedness.

For simplicity, we focus on binary networks: blue squares correspond to existing interactions while empty ones describe absent links. A perfectly nested network (A) shows a characteristic interaction matrix in which specialist species –with low connectivity– interact only with generalist ones. The matrix in (B) has a lesser degree of nestedness (see Refs. 16, 13 and 17 for quantification of nestedness).

Determining a ranking of species importance in mutualistic communities poses an important practical challenge, as it would be highly desirable to know which species are more crucial for the long-term stability of the community. The goal would be to establish a proper ordering of species, ranking them in order of decreasing importance for the community. This would facilitate the design of sound conservation policies protecting the most important species.

Following the experience from food webs we could, in principle, employ the PageRank algorithm to rank mutualistic species in bipartite networks. PageRank8,9 is a linear-algebra iterative algorithm which, in a nutshell, computes the “importance” of a given node as the linear superposition of the importance of the nodes connecting to it, in a recursive and self-consistent way. However, in this work, taking inspiration from a recent breakthrough on economics/econometrics18,19, we propose to employ a novel non-linear algorithm specially designed for bipartite networks.

Tacchella et al.18,19 analyzed economic data from the world trade network (i.e. the bipartite network of countries and the products they export). The goal was to infer an objective ranking of countries in terms of their “fitness” and a classification of the products in terms of increasing “complexity”. Inspection of such economic data reveals that rich (high fitness) countries are not specialized into producing complex products (such as high-tech devices) exclusively. Rather, they export a highly diversified variety of goods, including less-complex ones (e.g. cereals). On the other hand, poor (low fitness) countries only produce low-complexity merchandises. These facts are reflected in the nested structure of the corresponding bipartite network18,19, with a shape similar to that in Figure 1. The main idea behind the novel algorithm of Tacchella et. al. is that while the fitness of a country can be safely defined as the linear average of the complexity of the products it exports, the reverse does not make sense. Indeed, the complexity of a given product cannot be meaningfully estimated as the average fitness of the countries producing it, but is much better characterized by the minimal fitness required to produce it18. To implement this idea Tacchella et al. proposed an iterative non-linear algorithm (see below) and were able to compute the fitness of all countries and the complexity of all products in a self-consistent way, using solely information contained in the matrix of economic transactions. The novel algorithm clearly outperforms PageRank and leads to striking implications for understanding the global trade market18,19.

Here, we consider a set of 63 real mutualistic networks –all of them with a characteristic nested structure– taken from the literature (45 pollination networks, 16 frugivore seed-dispersal and 2 other networks; see Table 1) and rank the species accordingly to different criteria (such as node-connectivity, betweenness centrality, PageRank, etc.) including the novel non-linear algorithm. Each of the employed criteria leads to a different ranking of species. We analyze the quality of any of these orderings by monitoring how fast the network collapses if species are sequentially removed in order of decreasing ranking. The best ranking is the one for which the network breaks down more rapidly. Our conclusion is that the non-linear algorithm clearly outperforms all others, thus providing us with an efficient and powerful scheme to gauge the relative importance of species in mutualistic communities.

Table 1 Dataset of different mutualistic networks used throughout the study, with Amax active and Pmax passive species

Results

The non-linear ranking algorithm for mutualistic networks

Inspired by the work of Tacchella et. al.18,19, we propose a novel ranking algorithm for mutualistic networks of ecological relevance. We shall refer to it as mutualistic species rank (MusRank). To establish a common terminology for plant-pollinator, seed-disperser and anemone-fish networks, we refer to plants, seeds and anemones as “passive” (P) elements, while pollinators, dispersers and fishes are their “active” (A) partners; rather than fitness and complexity now we use the terms importance and vulnerability, for the two emerging species rankings, respectively. It is natural to identify products with passive components and countries with active ones (but the opposite identification can also be made; see below). We assume that the importance of an active species, is determined by the number of its mutualistic passive partners, each one weighted with its own vulnerability: the more partners and the more vulnerable they are, the more important an active element is.

On the other hand, the vulnerability of a passive element will be bounded by the less important species it interacts with. The rationale behind this is that, given that mutualistic networks are nested, specialized species tend to interact with generalists. If a passive element interacts only with generalists it is most certainly a specialist and therefore highly vulnerable as it can disappear if a few generalists go extinct.

The non-linear algorithm, encoding these ideas, is summarized in eq. 1. The importance of active elements, and the vulnerability of passive ones, , are computed at iteration n as a function of their values in iteration n − 1 using the interaction (or adjacency) matrix MAP as the only input:

Here, as in the work by Tacchella et al., the adjacency matrix is considered to take binary values, but generalization to allow for real values is straightforward. In a first step (left), intermediate values of the importance and vulnerability are calculated for each species: the first as the average of vulnerabilities of its partners and the second as the inverse of the average of its partners inverse importances18. In a second step (right), both values are normalized to their mean values. In this way, starting from arbitrary initial conditions (e.g. IA(0) = 1 for all A and VP(0) = 1, for all P) the two-step transformation above is iterated until a fixed point is reached (let us remark that we make no attempt here to prove that such a fixed point actually exists nor to investigate the conditions for the convergence of the method; see ref. 20). Such a fixed point –which does not depend on initial conditions– defines the output of the algorithm: a ranking of importances and vulnerabilities for active and passive species, respectively.

Assessing the quality of a given ranking

In order to evaluate the quality of any possible ranking of species for a given mutualistic network we proceed by computationally implementing the following protocol (see Figure 2A). Active species are removed progressively following the ordering prescribed by any specified ranking algorithm. The ranking is kept fixed along this process, i.e. no re-evaluations of the ranking are performed once species are removed. Secondary extinctions are monitored (a species is declared extinct when it no longer has any mutualistic partners to interact with). The process is iterated until all the species in the network have gone extinct. The total fraction of extinct species as a function of the number of deleted species defines a extinction curve8. For each possible sequence of eradications the extinction area is obtained as the integral of the extinction curve (see Figure 2B). This procedure allows for a quantitative discrimination of species rankings: the best possible ordering of species would be the one for which the largest extinction area is obtained upon progressively removing active species in order of decreasing rank.

Figure 2
figure 2

Left: schematic representation of the extinction protocol for an empirical mutualistic network (Arctic community21) with 18 active (pollinators) and 11 passive (plants) species).

Both active (left) and passive (right) species are ordered following some prescribed ranking; from the highest ranked species (top) to the lower-ranked ones (bottom). The (blue and red) lines represent mutualistic interactions as encoded in the interaction (or adjacency) matrix. Active species are progressively removed from the community, their corresponding (red) links are erased and passive species are declared extinct whenever they lose all their connections. Right: extinction curve, showing the fraction of extinct passive species as a function of the number of sequentially removed active ones for a given specified ranking. The shaded region is the extinction area for the ranking under consideration. Different rankings lead to different extinction areas. The larger the area the better the ranking.

An exhaustive search of the optimal ranking (in the space of all possible orderings) can be performed for relatively small networks but becomes an unfeasible task for larger ones. To have an estimation of the optimal ranking we implemented a genetic algorithm (GA) (see Methods) devised to obtain the maximal possible extinction area by searching in the space of all possible orderings. For some of the largest networks we studied (in particular, for Montane forest and grassland, Beech forest and Phryganic ecosystem with 275; 678 and 666 active nodes respectively; see table) the computational time required for the genetic algorithm to converge is exceedingly large and satisfactory results were not found.

Let us finally mention that we have also implemented a slightly modified version of the extinction protocol in which the ranking is re-evaluated after each species extinction. Beside being computationally much more expensive, this modified protocol leads, in general, to slightly worse results than the original one; however, even in this form MusRank outperforms all other rankings.

Algorithm testing and comparison with other rankings

We compared different rankings based on (see Methods) : a) decreasing closeness centrality (CLOS), b) decreasing eigenvector centrality (EIG), c) decreasing betweenness centrality (BTW), d) decreasing degree centrality (DEG), e) increasing contribution to nestedness (NES) as described in ref. 16, f) decreasing PageRank (PAGE) and g) decreasing importance as measured by MusRank (MUS).

The average extinction area of the different algorithms was obtained for all networks in the dataset. In the frequent case in which the order is degenerate (more than one node were rated with the same value), we considered 103 different randomizations and computed the averaged extinction area.

For the sake of completeness we have also repeated all the protocol above, but exchanging in Eq.(1) the roles of active and passive species, i.e. assigning importances to passive species and vulnerabilities to active ones. We refer to this as “reversed” algorithm. We have also studied extinction areas by progressively removing passive species (rather than active ones) and monitoring secondary extinctions of active species.

Computational results

Figure 3 illustrates the performance of the different rankings/algorithms for three different instances of mutualistic networks. Extinction areas are plotted for each of the considered ranking algorithms. In the three cases MusRank gives results closest to the corresponding optimal solutions as derived from the genetic algorithm. In almost all of the 63 studied cases, results are much better for the novel ranking than for any of the other ones (see Figure 3). PageRank gives similar results to MusRank in a few cases (including a relative large network with 102 nodes). Apart from this, only for very small networks (with less than 17 active species) some other method different from PageRank gives extinction areas similar to the ones of the novel algorithm. In about one third of the networks, the ranking provided by MusRank is as good as the one found by the GA and in some cases (networks for which the GA could not converge in a reasonable time) extinctions areas are larger for MusRank than for the GA.

Figure 3
figure 3

Extinction areas for three different mutualistic networks (names and sizes, specified above) as obtained employing the different ranking schemes described in the text.

The upper dashed line shows the optimal performance corresponding to the ranking found by the genetic algorithm (GA) search and the lower one the null-expectation, that is the averaged area obtained when targeting nodes in a random order. The different algorithms used to rank the nodes are: closeness centrality (CLOS), eigenvector centrality (EIG), betweenness centrality (BTW), degree centrality (DEG), nestedness centrality (NES), PageRank (PAGE) and importance as measured by the MusRank (MUS). MUSrev corresponds to the reversed version of the algorithm in which the roles of active and passive species are exchanged. The height of the boxes corresponds to the standard deviation of the results when averaging over 103 random ways to break degeneracies in the orderings.

Figure 4 gives a global picture of the performance of the different rankings. It shows the difference, averaged over 60 mutualistic networks, between the optimal solution as found by the GA and that of each specific ranking (the 3 networks for which the GA does not converge are excluded from this analysis). Figure 4A illustrates that the ranking provided by the MusRank –either in the direct or the reversed form– greatly outperforms all others.

Figure 4
figure 4

Averaged deviation of the extinction area obtained for each of the employed rankings (or algorithms) from the maximal possible value as determined using the genetic algorithm (average over 60 networks in the database).

The left A (right B) panel shows results when active (passive) species are targeted and passive (active) species undergo secondary extinctions. Results are consistently much better for the MusRank, in either the direct or the reversed version, than for any other ranking scheme.

The same conclusion can be reached when progressively removing passive rather than active species, ordered in a sequence of increasing vulnerability (rather than decreasing importance), see Figure 4B. Therefore, both targeting strategies and both the direct and the reversed versions of the algorithm provide results of similar quality.

Optimally packed matrices

The ranking provided by MusRank, in which nodes are arranged by their level of importance or vulnerability, permits us to obtain a highly packed matrix as illustrated in Figure 5. By “packed” we mean that a neat curve separates densely occupied and empty parts of the matrix. It could be thought that this ordering might be somewhat similar to the one that allegedly packs the matrix in the most efficient way (as defined by existing algorithms usually employed in the literature to measure nestedness17). However, as Figure 5 vividly illustrates, the ordering provided by MusRank gives a more packed matrix than that obtained by the standard method employed by nestedness calculators17. This suggests that MusRank should be used (rather than existing ones) to measure nestedness in bipartite matrices.

Figure 5
figure 5

Interaction matrix of a mutualistic community in the Andes22 composed of 42 pollinators and 61 plants ordered by decreasing importance and increasing vulnerability respectively, as measured by MusRank.

Panels A and B show two different shots of the iteration process: the initial random condition and the final (fixed-point) ranking obtained after iteration. Panel C shows the same matrix but with nodes labeled in an order which gives the maximally packed matrix according to the nestedness calculator of Atmar and Patterson17. The novel algorithm provides a much more “packed” matrix than this frequently employed method.

Discussion

In this paper we have presented a novel framework to asses the relative importance of species in mutualistic networks. Inspired by a recent work on economics/econometrics we employ an algorithm, similar in spirit to Google's PageRank but of non-linear character, that we have named MusRank. The algorithm provides two complementary rankings: one for active species (such as insects, birds, fish,…) in terms of their importance and one for passive species (plants and their seeds, anemone, etc) in terms of their vulnerability. We also propose a criterion to assess the quality of any given ranking of species: good rankings lead to a fast break-down of the corresponding mutualistic network when species are progressively removed in decreasing ranking order.

In most of the empirical mutualistic networks we have analyzed the use of our novel framework rendered a ranking which clearly outperforms all the alternative ones used as workbench. Results are robust in the sense that different implementations lead to similar rankings. In many cases, the resulting ordering coincides or is very close to the optimal one as found by a -computationally very costly- genetic algorithm. Moreover, MusRank is much faster and finds excellent rankings even for large mutualistic networks for which the genetic algorithm is not able to find optimal solutions in a reasonable computational time. Therefore, the emerging ranking allows for assessing the importance of individual species within the whole system in a meaningful, efficient and robust way. We conclude that rankings of species importance in mutualistic networks should be constructed employing MusRank.

Furthermore, as a by-product, the excellent packing of nested matrices provided by this non-linear approach (see Figure 5) calls for a redefinition of the way in which nestedness is measured. In particular we suggest that nestedness calculators should use the ranking provided by the present algorithm, which clearly outperforms others in making the nested architecture evident. Indeed, we believe that the nested structure of mutualistic networks is essential for the success of MusRank; it remains to be seen what is the performance of this scheme for bipartite networks without a nested architecture.

The novel approach –introduced here for the first time in the context of mutualistic ecological networks– may prove of practical use for ecosystem management and biodiversity preservation, where decisions on what aspects of ecosystems to explicitly protect need to be made.

Methods

Algorithms

  • CLOS: Nodes are sorted in order of decreasing closeness centrality. The closeness centrality of a node is measured as the inverse of the average shortest distance to all other nodes in the network. We computed it using the closeness_centrality function of the bipartite section of algorithms of the Python package NetworkX.

  • EIG: Nodes are sorted in decreasing order of their overlap with the highest eigenvalue. To calculate the eigenvector centrality of the bipartite network we used the gsl functions for solving non-symmetric matrices.

  • BTW: Nodes are sorted in order of decreasing betweenness centrality. The betweenness centrality of a node measures the fraction of shortest paths between all possible node pairs in the network, in which it appears. We used the betweeness_centrality function of the bipartite section of algorithms in the Python package NetworkX.

  • DEG: Nodes are sorted in order of decreasing number of connections.

  • NES: Nodes are sorted in order of the inverse contribution to network nestedness. We calculate the total nestedness of a given bipartite matrix and the contribution of each species to the total as described in ref. 16. Species that contribute most to the community nestedness are the most vulnerable ones23. In order to look for the fastest community collapse we target them in order of increasing contribution to nestedness.

  • PAGE: Nodes are sorted in decreasing order of Google's PageRank. The ranking is given by the projection over each node of the leading eigenvalue of the matrix H, whose elements are defined as

The constant d is a “damping factor” needed to warrant that the matrix is irreducible and aij are the elements of the adjacency matrix. The value of d has been set to 0:999, but results are not very sensitive to this choice.

Genetic algorithm

GA

The genetic algorithm is designed to seek for those sequences of extinction that maximize the extinction area. We start with 104 different random orderings of the Amax active species. At each iteration-step two of these orderings are randomly selected. Each one beats the other with a probability proportional to its associated extinction area (normalized to the sum of both extinction areas). The loser sequence is erased from the set and a copy of the winner will occupy its place. With a small probability, μ = 0.005, this copy suffers a mutation, meaning that two random nodes exchange their positions in the ordering. The algorithm is iterated until no better solutions are found in a sufficiently large time window, that is, until no appreciable changes are seen in the extinction area with increasing time. If the network is too large, this algorithm might not be able to find a stationary optimal solution within a reasonable computation time.