Feasibility and coexistence of large ecological communities

The role of species interactions in controlling the interplay between the stability of ecosystems and their biodiversity is still not well understood. The ability of ecological communities to recover after small perturbations of the species abundances (local asymptotic stability) has been well studied, whereas the likelihood of a community to persist when the conditions change (structural stability) has received much less attention. Our goal is to understand the effects of diversity, interaction strengths and ecological network structure on the volume of parameter space leading to feasible equilibria. We develop a geometrical framework to study the range of conditions necessary for feasible coexistence. We show that feasibility is determined by few quantities describing the interactions, yielding a nontrivial complexity–feasibility relationship. Analysing more than 100 empirical networks, we show that the range of coexistence conditions in mutualistic systems can be analytically predicted. Finally, we characterize the geometric shape of the feasibility domain, thereby identifying the direction of perturbations that are more likely to cause extinctions.

where n i is the population abundance of species i and r i is its intrinsic growth rate, and A ij is the effect of a unit change in species j's density on species i's per capita growth rate. For notational convenience, we collect the coefficients A ij into the interaction matrix A, and n i and r i into the vectors n and r, respectively.
In principle, the interaction matrix A may depend on n. We discuss this more general case in section Supplementary Note 13. In the following, we consider the simpler case of A being independent of n; then, equation (1) is a general system of Lotka-Volterra population equations.
A fixed point is feasible if n * i > 0 for all i. A feasible fixed point (if it exists) is then a solution to the equation and therefore, assuming A is invertible, A fixed point n * i is locally stable if the system returns to it following any sufficiently small perturbation of the population abundances. Introducing n i = n * i + δn i in equation 1 and assuming that δn i is small, we obtain, by expanding around δn i = 0, where M ij is the (i, j)th entry of the Jacobian evaluated at the fixed point (also called the community matrix), which, in the case of equation 1, reduces to Substituting into equation 5, we get There are two possible scenarios for the dynamics of equation 5. If all eigenvalues of M have negative real parts, then the perturbation δn decays exponentially to zero and n * i is locally stable. If at least one eigenvalue of M has a positive real part, then there exists an infinitesimal perturbation such that the system does not return to equilibrium. If we order the eigenvalues λ i of M according to their real parts, i.e., (λ 1 ) > (λ 2 ) > · · · > (λ S ), then stability depends exclusively on (λ 1 ): if it is negative, n * i is dynamically locally stable; otherwise, it is unstable [1].
A fixed point is globally stable if it is the final outcome of the dynamics from any initial condition involving strictly positive population abundances.

Supplementary Note 2 Disentangling stability and feasibility
As we can see from equations 4 and 7, both feasibility and stability depend on both r and A and, at least in principle, a fixed point can be stable or unstable, independently of the fact that it is feasible or not.
We want to study the proportion of conditions (i.e., the number of combinations of the growth rates r out of all possible combinations) leading to coexistence, i.e., leading to stable and feasible equilibria. Therefore in principle we should, for a fixed matrix A, look for growth rates r that satisfy both stability and feasibility. In probabilistic terms, we want to measure the likelihood that a random combination of the intrinsic growth rates corresponds to a stable and feasible solution.
In the case of equation 1, it is possible to disentangle feasibility and stability by applying a mild condition on the interaction matrix A. To this end, we introduce some terminology [2, section 2.1.2]: • Stability. A real matrix B is stable if all its eigenvalues have negative real parts.
• D-stability. A real matrix B is D-stable if D B is stable for any diagonal matrix D with strictly positive diagonal entries.
• Diagonal stability. A real matrix B is diagonally stable if there exists a positive diagonal matrix D such that D B + B T D is stable (where B T is the transpose of B).
We also consider • Negative definiteness (in a generalized sense). A real matrix B is negative definite if ij x i B ij x j < 0 for any non-zero vector x [3].
These properties are closely related to each other [2,4]: Negative definiteness =⇒ Diagonal stability =⇒ D-stability =⇒ Stability (8) • Negative definiteness =⇒ Diagonal stability. A matrix B is negative definite if and only if all the eigenvalues of B + B T are negative [3]. If this condition hold, then the positive diagonal matrix satisfying the definition of diagonal stability is simply the identity matrix. • D-stability =⇒ Stability. This follows from the definition of D-stability when D is the identity matrix.
In the case of equation 1, those conditions applied to the matrix A are related to the stability of the system. One can use the definition of the community matrix (equation 6) to show that Dstability of A implies the local asymptotic stability of any feasible fixed point. This is because the community matrix with entries M ij = n * i A ij can be written as N A, where N is the diagonal matrix with N ii = n * i . If the fixed point is feasible and A is D-stable, then local asymptotic stability is guaranteed. Moreover it is possible to show [5,6] that diagonal stability of A =⇒ global stability.
Thus, we have a condition on A that makes it possible to disentangle the problems of stability and feasibility: A is negative definite =⇒ global stability of the feasible fixed point [7].
Therefore, if we assume A is negative definite, then feasibility of the equilibrium is sufficient to guarantee its global stability as well, i.e., feasibility guarantees globally stable coexistence.
Consistently with this, it is known that the largest eigenvalue of (A + A T )/2 is always larger than or equal to the real part of A's leading eigenvalue [8], i.e. negative definiteness implies stability.
While this was indeed observed before, it is important to underline that, in the case of ref. [8], this property was considered on the community matrix M (which also depends on the fixed point's position in phase space) and not on the interaction matrix A.
Since we are interested in studying how interactions (i.e., the matrix A) determine coexistence, and which properties of the former determine the latter, we will restrict our analysis to negative definite matrices A and focus only on the problem of feasibility. This condition has the advantage of being analytically computable for large random matrices (see section Supplementary Note 5.1).

Supplementary Note 3 Geometrical properties of the feasibility domain
In section Supplementary Note 2 we showed how to separate feasibility and stability, i.e., we have a sufficient condition on the interaction matrix that guarantees (global) stability of the feasible fixed point. The problem of determining the size of the coexistence domain is therefore reduced to that of determining the size of the feasibility domain. The ecological interpretation of this volume is the proportion of different conditions leading to feasible equilibria out of all possible conditions.
The larger this volume is, the higher the probability that the system is able to sustain biodiversity.
In terms of equation 1, we want to quantify the proportion of growth rate vectors r corresponding to a feasible fixed point.
This geometrical approach was pioneered in [1] where the space of feasible solution was studied for dissipative systems, and the size of that domain was computed in the case S = 3 (see section Supplementary Note 12).
At this point, it is important to observe that if a vector r corresponds to a feasible solution, then cr, c being an arbitrary positive constant, also corresponds to a feasible solution. This is because the equilibrium solution n * i is given by equation 4, which is linear in r i . Therefore, the equilibrium corresponding to cr i is simply cn * i , and since c is positive, cn * i is also feasible.
This fact implies that, given a large number of growth rate vectors r, the expected proportion of vectors corresponding to a feasible fixed point is independent of r's norm. In other words, r is feasible if and only if r/ r is feasible, where r = i r 2 i is the Euclidean norm of r. The proportion of feasible growth rates among all possible ones is therefore equal to the proportion of feasible growth rates calculated using only growth rate vectors with r = 1; i.e., those lying on the unit sphere.
Before proceeding with the mathematical definition of the size of the feasibility domain, we discuss the geometrical interpretation of equation 4. From this equation, the feasibility condition This equation defines a convex polyhedral cone in the S-dimensional space of growth rates. A convex polyhedral cone [9] is a subset of R S whose elements x can be written as positive linear combinations of N G different S-dimensional vectors g k called the generators of the cone: where the λ k are arbitrary positive constants. Due to this arbitrariness, if g k is a generator of a given convex polyhedral cone, then also cg k (where we rescale just the kth generator with the positive constant c, leaving the others unchanged) will be a generator of the same cone [1]. In the case of equation 3, each and every growth rate vector belonging to the feasibility domain can be written as where, by definition, n * k is feasible and therefore a positive constant. One can easily see that this equation corresponds to equation 10 where the number of generators N G is equal to S and the ith component of the vector g k is proportional to −A ik . As the lengths of the generators can be set to any positive value, we will normalize them to one, i.e., The generators completely define the feasibility domain in the space of growth rates. A growth rate vector corresponds to a feasible equilibrium if and only if it can be written as a linear combination of the generators with positive coefficients. Biologically the generators correspond to the growth rate vectors that bound the coexistence domain. They correspond to nonfeasible equilibria with just one species with positive abundance (and all the others with zero abundance), such that there exist arbitrarily small perturbations of the growth rate vector that make the equilibrium feasible.
The set of all the growth rate vectors leading to a feasible equilibrium is therefore a convex polyhedral cone, defined by Equivalently, it can be defined in terms of the generators: where the generators g k (A) are defined in equation 12. In section Supplementary Note 12 we show explicitly how these concepts pan out in the case of S = 3.
This geometrical definition and characterization of the feasibility domain allows us to identify classes of matrices having the exact same feasibility domain: they are simply matrices having the same set of generators. In particular, there are two basic transformations of the matrix A (and their combinations) that leave the set of generators unchanged: permutations and positive rescaling. A square matrix P is a permutation matrix if each row and column has one and only one nonzero entry and the value of that entry is equal to one. A positive rescaling is performed by a positive diagonal matrix D. The set of generators of A is the same as those of P A and D A.
This can be seen by observing that a permutation of the rows just changes the order of the generators but not the generators themselves. In the same way, a generator with the same direction but different length generates the same cone, and so any positive constant that rescales a row of the matrix leaves the feasibility domain unchanged. It is important to note however that these two transformations do not leave the properties of the matrix A unchanged: both exchanging rows of a matrix and rescaling rows by different constants will in general change the structure of the matrix.
Using this geometrical framework, one can easily identify the center of the feasibility domain (also known as structural vector [6]). There are several possible ways to define the center of a hypervolume and, without additional assumptions, all the definitions are different. One natural choice is the barycenter ("center of mass") of the domain of feasible intrinsic growth rates. Any plane passing through the barycenter divides the volume into two subvolumes of equal size. The barycenter is equivalent to the center of mass of the volume (in the case of constant density). Then, the vector x b pointing from the origin to the barycenter is given by where ∩ is the intersection of two sets, and S S = {r ∈ R S | r = 1} represents the surface of the S-dimensional unit sphere. The variable y is therefore integrated over the feasibility do-main restricted to the unit sphere's surface. All points in the feasibility domain are positive linear combinations of the generators, i.e., where the λ k are positive constants. The fact that we consider only the points lying on the unit sphere, i.e., y = 1, can be expressed as a constraint on λ (the vector of λs). Thus, we can write equation 15 as where q is an appropriate distribution, introduced to take into account three different constraints: all the components of λ must be positive; the vector k λ k g k must lie on the unit sphere; and those vectors must be sampled uniformly on the feasibility domain. One can show that the distribution q(λ) has the following form where the proportionality constant is given by the normalization, the exponential term guarantees that the vectors k λ k g k are sampled uniformly on the sphere and the Heaviside function Θ(λ k ) constrains all the coefficients λ k to be positive. Therefore, by defining, we obtain Supplementary Note 4 Definition and calculation of Ξ As explained in section Supplementary Note 3, the proportion of feasible growth rates can be calculated considering only growth rate vectors of length one, i.e., r = 1. This proportion can be interpreted as the volume of the intersection of a convex cone and the surface of a sphere.
Equivalently, it is the solid angle of the convex polyhedral cone [10,11].
We define the quantity Ξ as Ξ = 2 S # growth rate vectors corresponding to a feasible fixed point total # growth rate vectors .
The factor 2 S that appears in this equation is an arbitrary choice, and it has been introduced to have Ξ = 1 when species are not interacting (A ij = 0 if i = j). In this case equation 1 reduces to S independent logistic equations with equilibrium densities n * i = −r i /A ii . Taking each A ii to be negative (otherwise each species would have an unstoppable positive feedback on itself), this equilibrium is feasible if and only if each r i is positive. For a single species then, the probability of randomly drawing a feasible (i.e., positive) growth rate out of all possible growth rates is one half. For two species, both growth rates must have the correct sign to have the two species with positive abundance, and therefore the proportion of growth rate vectors satisfying this condition is 1/4. For S species the combinations of the growth rates leading to a feasible fixed point is 2 −S . Ξ, defined as in equation 21, is therefore equal to one when species do not interact.
In terms of geometrical properties and the convex polyhedral cone, Ξ can be defined as where K(A) is defined in equation 13, S S is the unit sphere in R S , while vol S (·) means volume in S dimensions. This definition is equivalent to the one in equation 21 [10,11].
These two equivalent definitions can be expressed in terms of an integral in the space of the growth rate vectors: where vol S−1 (S S ) is the volume of the unit sphere's surface in S dimensions, Θ(·) is the Heaviside function (equal to 1 is the argument is positive and to zero otherwise), and δ(·) is the Dirac delta function. In this expression, we integrate over the surface of the S-dimensional unit sphere. The integral of a function f (x) on the unit sphere is given by where the term δ( x 2 − 1) that appears in the integration constrains x on the surface of the unit sphere, and the factor 2 x is the derivative of the delta function's argument, which is needed because the Dirac delta is nonlinear in r . The factor vol S−1 (S S ), the surface of sphere in S dimensions, can be obtained by setting f (x) = 1: where Γ(·) is the Gamma function. Finally, the term S i=1 Θ(n * i (r)) in equation 23  Unfortunately, direct numerical computation of Ξ is inefficient when the number of species S is large. To evaluate the integral in equation 23, e.g., via Monte Carlo integration, we should draw intrinsic growth rates at random and count how many of them, out of the total, lead to a feasible equilibrium. In order to have a reliable estimate of this proportion, we should sample the space in such a way that the number of feasible growth rates found is large. This goal requires an exponentially increasing sampling effort as S increases. In this section we provide an alternative, much faster and reliable, way of estimating Ξ.
The equilibrium solution and the growth rates are linearly related via Our strategy is to use this to perform a change of variables in equation 23, and integrate over n * instead of r. Since A is negative definite (and thus stable and not singular), it is invertible, and so it is always possible to perform this change of variables. Note that, more generally, the change of variables can be performed if A is nonsingular (i.e., det(A) = 0). We then where | det(A)| is the determinant of A, which is also the Jacobian of the change of variables.
After the change of variables, the integration is now performed over the feasible equilibrium points and so the condition of feasibility is automatically implemented.
It is still difficult to evaluate the previous expression numerically, because of the constraint that appears in the delta function. We can further simplify it by introducing polar coordinates. In particular, we write the vector n as n = nu, where n = n and u is a vector of unit length. We can perform a new change of variables, passing from n to n and u. Specifically, for any function Using this expression in equation 26, we obtain where we used the fact that Θ(n i ) = Θ(u i ) (since n i = nu i , and n is positive by definition), and we have introduced the matrix G ij = k A ki A kj . We can now perform the integration over n, and therefore the integral of equation 23 finally reads where we have used the fact that det(G) = det(A T A) = det(A) 2 . In terms of the interaction matrix, the equation reads Equation 30 shows explicitly the role of the generators. The matrix G can indeed be rewritten as Unfortunately, the integral in equation 30 cannot be computed analytically. As mentioned before, when the integral is written in the form of equation 23 it is impractical to evaluate it numerically, since it would require an exponentially increasing sampling to get a reasonable precision.
Fortunately, this is not the case when the integral is written as in equations 30 and 31. The main difference is that, after changing variables, we are directly sampling the space of feasible solutions, without losing computational time in randomly exploring the space of intrinsic growth rates looking for feasible solutions.
To evaluate the integral, we use the usual approach of Monte Carlo algorithms. In particular, it is possible to write the integral as an average over random points: when T → ∞. In this expression u a are independently drawn random vectors uniformly distributed on the unit sphere and with only positive components. These two conditions are introduced to satisfy the constraints S i=1 Θ(u i ) and 2δ( u 2 − 1) that appear in the integral. T is the sample size, and the average on the left hand side of equation 33 converges to the right hand side in the large T limit.
One always has a finite sample size T , used to approximate the integral. It is therefore important to have an estimate of the error made due to T < ∞. Since the left hand side of equation 33 is an average of a function over random vectors, this error can be estimated by simply using the variance of the function's values. In particular, the error σ MC is defined as The numerical simulation presented in the work where obtained were obtained with different sampling effort T . Instead of fixing T a priori, we determined a precision goal, that we measured in terms of the relative error σ M C /Ξ. We ran the simulations until σ M C /Ξ < 0.05. In order to avoid artificially small samples and to have enough statistical power not to undershoot to much σ MC , we ran 10 × S Monte Carlo steps before checking the condition for the first time.
Supplementary Note 5 Stability, negative definiteness, and feasibility in random matrices Random matrices are a useful tool in ecology, and have been studied since May's seminal paper [12]. Mostly, they have been used to model the community matrix [12,13]. In the context of this work, we use random matrices to model interaction matrices A. We consider random matrices constructed in the following way: • Each pair (A ij , A ji ) is set equal to a pair of random variables drawn from a joint distribution with probability density function q(x, y).
• The random variables are exchangeable-i.e., the probability distribution function is symmetric in its arguments: q(x, y) = q(y, x)-and all the moments are finite.
We show that the three most important quantities for our problem are the moments In the limit of large S, they can be computed as proper sample means of A's entries: The parameterization used by May [12] would correspond to where δ(·) is the Dirac delta function and p(x) is an arbitrary distribution with mean zero and variance σ 2 . The connectance C sets the probability that each entry is equal to zero (with probability 1 − C) or randomly drawn from the probability distribution p(x) with probability C. In this case In the following, we summarize known results on the spectra, negative definiteness conditions, and properties of Ξ for these matrices.
Supplementary Note 5.1 Known results on the spectra of random matrices Under the assumptions of the previous section, the eigenvalues of A in the limit of large S are uniformly distributed in an ellipse in the complex plane. If E 1 = 0 there is always an eigenvalue λ m whose value is approximately independently of the rest of the eigenvalue distribution. The ellipse is centered at −d − E 1 , its axes are aligned with the real and imaginary axes, and their lengths are If λ m = 0, the eigenvalue with the largest real part(s) is approximated by the rightmost point of the ellipse. The system is stable if its real part is negative. In the most general case, this condition is equivalent to In section Supplementary Note 2 we introduced the concept of negative definiteness. In particular, we showed that when the matrix is negative definite then it is possible to disentangle stability and feasibility. The matrix is negative definite if the eigenvalues of A + A T are all negative. This condition reads [8] − Figure 1 shows the values of parameters leading to the possible combinations of stability and negative definiteness in random matrices for the case E 1 = 0. Since we imposed that A is negative definite, the region of parameters we explore is the one above the negative definiteness line. One can see that in this way we are missing some parameterizations, corresponding to those that lead to a stable but not negative definite matrices. From equations 45 and 46 one can see that the case E 1 < 0 is very similar to the case E 1 = 0. More interestingly, for E 1 > 0, the conditions for stability and negative definiteness converge in the large S limit, implying that we are considering all the possible cases.
What is remarkable in these conditions and in the distribution of eigenvalues is that they are universal [14][15][16][17]. Universality means that they depend only on S, E 1 , E 2 , and E c (and d, but via a trivial dependence). The spectrum of eigenvalues does not depend on the detailed form of the distribution q(x, y).
For instance, consider the case q(x, y) = p(x)p(y), where the upper and lower triangular entries A ij and A ji are independent random variables. In this case E c = 0 and E 1 and E 2 are the mean and standard deviation of the distribution p(x). The distribution of eigenvalues and the conditions for stability and negative definiteness are the same for any probability distribution p(x) as long as their mean E 1 and standard deviation E 2 are the same (provided some mild conditions on higher moments hold). For instance, a Lognormal distribution, a Gaussian distribution and an exponential distribution, having same mean and standard deviation, produce the same eigenvalue distribution, and therefore the same conditions for stability [18].
From an ecological perspective, one can consider different interaction matrices corresponding to different interaction types. The interaction type is given by the signs of the pairs (A ij , A ji ): competitive interactions will have both entries with a negative sign, while in trophic interactions the entries will have opposite sign. The interaction pairs (A ij , A ji ) for competitive interactions can for instance be obtained from the following distribution: where h − is a probability distribution function with support on the negative axis (i.e., the random variables are always negative), and C is the connectance (a pair is different from zero with probability C). In the case of trophic interactions we could consider where p + and p − are two probability distribution functions with positive and negative support, respectively. Suppose that the moments of h − , p + , and p − are chosen in such a way that q comp (x, y) and q troph (x, y) have the same values of E 1 , E 2 , and E c . The interaction matrices will still look very different in the two cases: one describes a foodweb and the other a competitive system. Despite this difference, the two will have the same stability properties. In other words, different interaction types influence the stability properties of the system only via E 1 , E 2 and E c .
Supplementary Note 5.2 Universality of Ξ In this section we show that, apart from their spectral distribution, Ξ is also a universal quantity in large random matrices. That is, in the large S limit, its value does not depend on the entire distribution of the coefficients, but only on the three moments E 1 , E 2 , and E c . It is important to remark that this result applies to the large S limit: the sub-leading corrections depend in principle on all the moments.
In order to show that Ξ is universal, we parameterized random networks with different distributions and checked whether Ξ depends only on E 1 , E 2 , E c , and S, but not on other properties.
To do this, we constructed several S × S matrices. Each individual matrix had its entries drawn from some fixed distribution, but the shape of the distribution was different across matrices. However, regardless of the distribution's shape, their moments were fixed at E 1 , E 2 , and E c . We then checked whether these matrices led to the same value of Ξ.
In our simulations we considered a distribution of the pairs (A ij , A ji ) of the form where the connectance C is the probability that two species i and j interact. The probability distribution p(x, y) in equation 49 depends on three parameters µ, σ, and ρ, which define the mean, variance, and correlation of the pairs drawn from p(x, y). Given the values of E 1 , E 2 , and E c , we can arbitrary choose C and tune µ, σ, and ρ to obtain any desired E 1 , E 2 , and E c . If Ξ is universal, then different matrices built with different values of C, µ, σ, and ρ but the same values of E 1 , E 2 , and E c will lead to the same Ξ.
We considered five parameterizations of the distribution p(x, y): • Random signs, normal distribution: The distribution BN (x, y|µ, σ, ρ) is a bivariate normal distribution with marginal means equal to µ, marginal variances equal to σ 2 , and correlation equal to ρσ 2 . The pairs can in principle assume all possible combinations of signs.
• Random signs, four corners: The pairs (x, y) can take on only four different, discrete values, potentially corresponding to all combinations on signs. The probability distribution depends on three parameters µ and σ 2 are means an variances of the distribution, while the correlation ρσ 2 can be obtained from ρ = 2q − 1.
• (+, +), Lognormal: The distribution LBN (x, y|µ, σ, ρ) is a bivariate lognormal distribution with marginal means equal to µ > 0, marginal variances equal to σ 2 , and correlation equal to ρσ 2 . The pairs can in principle assume only positive signs. Note that not all values of ρ between −1 and 1 can be obtained when a Lognormal distribution is considered.
It has marginal means equal to µ < 0, marginal variances equal to σ 2 , and correlation equal to ρσ 2 . The pairs assume only negative signs. Note that not all values of ρ between −1 and 1 can be obtained when a Lognormal distribution is considered.
In ecological terms, the first two distributions correspond to a random community (where the signs of the interaction strength are random), the (+, +) case corresponds to a mutualistic community, (−, −) to a competitive community, while (+, −) corresponds to a food web. The mutualistic/competitive matrices can lead only to positive/negative means E 1 , respectively, while the other settings can produce arbitrarily values of E 1 . Figure 2 shows the value of Ξ and of the largest eigenvalue λ for interaction matrices constructed with different connectances C and distributions, but with the same values of E 1 , E 2 , and E c . As seen from the figure, the values of Ξ and λ in any particular case match up precisely with the average values over several different realizations, strongly suggesting that these two quantities are indeed universal.

Supplementary Note 6 Mean-field approximation of Ξ
The goal of this section is to compute an approximation for Ξ in the limit of large S. The volume Ξ is defined (see section Supplementary Note 4) as where the matrix G can be obtained from the generators of the polytope (see equations 12 and 32), and therefore from the interaction matrix A.
We can introduce a Gaussian function in equation 55 using the fact that, for any positive constant c, Introducing this Gaussian integral in equation 55 by letting which can be rewritten as where z i = ru i . We can rewrite this equation as where we used the fact that the diagonal entries of G, when expressed in terms of the normalized generators, are equal to one.
The reader familiar with statistical mechanics will notice that equation 59, which can be written as has the form of a partition function. For instance one can recover the Ising model [19] with the choice q(z) = i δ(z 2 i = 1) or the spherical model [20] when q(z) = δ(S − i z 2 i ). The term z i G ij z j in particular plays the role of the interactions of the system.
Integrals of the form 60 are the most studied objects of statistical mechanics, and yet in most cases are not analytically solvable. There are, on the other hand, many techniques that can be used to obtain good approximations to 60. The most celebrated one is probably the mean-field approximation [19] and it is the one we are using in this section. In particular, the idea of the mean-field approximation is to replace the interactions of an entity (spins in the case of the Ising model or species in our case) with an average "effective" interaction. This reduces a many-body problem, where all interactions of spins or populations are coupled, into an effective one-body problem.
If the system is large enough (in our case if S → ∞), the mean-field approximation is know to be exact in the case of "fully connected" interactions. In terms of equation 60, this corresponds to a matrix G with the same constant in all its offdiagonal entries. The matrix G is constant when A has constant offdiagonal entries. We will consider therefore the case of A's diagonal entries being equal to −1 and its offdiagonal entries to a constant E 1 . Using equation 12, the ith component of the kth generator is then (62) Using equation 32, we therefore obtain that the diagonal entries of G are equal to 1, while the offdiagonal ones are constant and equal to We define the constant β as and therefore we have G ii = 1 and G ij = β/S for i = j. The determinant of G in this case turns out to be where the last form holds for large S. In this case of constant interactions, we obtain, from equation 59, up to subleading terms in S.

Equation 66
can be written as where where erfc(·) is the complementary error function, defined as The average · h is defined as Using Jensen's inequality in equation 70 we have that In the following we will approximate the first expression with the second one. It is possible to prove that, in the large S limit, the second expression converges to the first one.
Applying the mean-field approximation we neglect fluctuations of the variables, i.e. we have where By introducing equation 72 in equation 71 we have This equation is a function of h, which is a free parameter. Since it is a lower bound for the actual value of Ξ, the best approximation would correspond to the value of h which maximizes the approximation. We have therefore that h is a solution of the following equation where m is given by equation 73. We obtain therefore m = h/(2β) and then, by neglecting sub-leading terms in S and introducing m = h/(2β) in equation 74 By maximizing this equation respect to h we obtain Equation 77 cannot be solved exactly. By expanding around h = 0 we obtain which is solved by .
One can observe that the solution h = 0 corresponds to β = 0, i.e. to a non-interacting ecosystem.
which is our final result. In figure 3 we compare this equation with the volume computed numerically in the case of constant interactions, finding a very good match.
In the most general case of an interaction matrix with nonconstant offdiagonal entries, we can consider equation 72 as an approximation valid in the case of E 2 → 0. As β was defined in terms of the generators, we can extend the approximation to the case E 2 > 0 by considering β as the expected value of G's entries, which corresponds to the average overlap of two rows of the interaction matrix cos(η) , defined in equation 109. In this more general case the mean-field value of Ξ is expected to be a good approximation when var(cos(η)) is small enough. By substituting β = cos(η) , using equation 112, into equation 72 we obtain When var(cos(η)) is not small, we observed that the empirical formula explains well the values obtained in simulations. This is the formula we used to make figure 2 in the main text.
In order to simplify the expression and make it more readable, we can expand equation 80 around β = 0, i.e., when the interactions between species are small. By expanding (Ξ M F ) 1/S around β = 0 and taking the logarithm of the expression, we obtain Equation 2 of the main text was obtained by substituting β = cos(η) , using equation 112, in the case of E 2 = 0.

Supplementary Note 7 Feasibility of consumer-resource communities
This section considers explicitly a community with two trophic levels and consumer-resource interactions. While empirical communities have a more complicated interaction structure, this example is particularly relevant to better understand how Ξ should be interpreted.
We consider a system with S R resource and S C consumer (S R +S C = S) populations, whose dynamics is described by equation 1 with the interaction matrix where C is an S R × S R nonnegative matrix, B is an S R × S C nonnegative matrix, while Z and W are two positive diagonal matrices of dimension S C × S C and S R × S R , respectively.
If C is a positive diagonal matrix, any feasible fixed point is globally asymptotically stable [21]. When C is not diagonal, one can prove that any feasible fixed point is globally asymptotically stable if CW −1 is positive definite (i.e., −CW −1 is negative definite). Assuming that this condition holds, stability of feasible fixed points is ensured and we can study feasibility alone.
Using equation 3, we obtain the equations where r R and r C are the intrinsic growth rates of resources and consumers, while n R * and n C Assuming that the determinant of A is different from zero, we can use equation 26 obtaining Given the structure of the matrix A, it is convenient to write z = (v, u), where v and u are two vectors with S R and S C components respectively. The argument of the exponential can be rewritten By integrating over the variables u, we finally obtain .
(89) Figure 4 shows the size of the feasibility domain of a consumer-resource community, computed using Monte Carlo integration as explained in section Supplementary Note 4. We consider an interaction matrix with the structure of equation 84, with a diagonal C (i.e., C ij = 1 if i = j and zero otherwise) and scalar matrices Z and W (i.e., The elements of the rectangular matrix B were independently drawn from a lognormal distribution with mean µ and variance c 2 v µ 2 , where c v is the coefficient of variation. Since C is equal to the identity matrix, then the interaction matrix is diagonally stable and therefore any feasible point is globally stable [21]. Figure 4 shows the effect of η, µ and c v on the size Ξ of the feasibility domain. Interestingly, η and µ have a small effect on Ξ, while the coefficient of variation has a strong influence on it. It is important to notice that, as explained above, as the interspecific interaction goes to zero (and therefore both c v and µ tend to zero), Ξ → 0 as well.

Supplementary Note 8 Empirical networks and randomizations
We considered 89 mutualistic networks and 15 food webs. Empirical networks are encoded in terms of adjacency matrices L, with L ij = 1 if species j affects species i and zero otherwise. where references to the original works can be found. When the original network was not fully connected, we considered the largest connected component.
In the case of mutualistic networks, the adjacency matrix L is bipartite, i.e., it has the struc- where L b is a S A × S P matrix (S A and S P being the number of animals and plants respectively).

The adjacency matrix contains information only about the interactions between animals and plants,
but not about competition within plants or animals.
We parameterized the interaction matrix in the following way: where the symbol • indicates the Hadamard or entrywise product (i.e., (A • B) ij = A ij B ij ), while W A , W AP , W P A , and W P are all random matrices. W A and W P are both square matrices (of dimension S A × S A and S P × S P ), while W AP and W P A are rectangular matrices of size S A × S P and S P × S A respectively. The diagonal elements W A ii and W P ii were set to −1, while the pairs (W A ij , W A ji ) and (W P ij , W P ji ) were drawn from a bivariate normal distribution with mean µ − , variance σ 2 + = cµ 2 − , and correlation ρσ 2 + . Since these two matrices represent competitive interactions, µ − < 0. The the pairs (W AP ij , W P A ji ) were extracted from a bivariate normal distribution with mean µ + , variance σ 2 − = cµ 2 + , and correlation ρσ 2 − , where µ + > 0.
We analyze more than 600 parameterizations, obtained by considering different values of µ − , µ + , c, and ρ. For each network and parametrization we computed the size of feasibility domain Ξ.
The bottom panel of Figure 2 in the main text was obtained by comparing Ξ obtained in this way with the analytical prediction obtained in equation 81.

Supplementary Note 8.2 Food webs A summary of the properties and reference of the food
webs can be found in table 1. In the case of food webs the adjacency matrix L is not symmetric, and an entry L ij = 1 indicates that species j consumes species i. We removed all cannibalistic loops. Since both L ij and L ji are never simultaneously equal to one (there are no loops of length two), we parameterized the offdiagonal entries of A as while the diagonal was fixed at −1. Both W + and W − are random matrices, where the pairs  Figure 5 shows the comparison between Ξ of random and empirical networks. As expected from the fact that the analytical prediction for random matrices works well, the empirical values and the values obtained with randomizations are compatible.
Comparing this figure with figure 2 of the main text we observe that the empirical values and the ones obtained with randomizations match also in the cases were the analytical approximation failed. This implies that the reason of the mismatch is due to the difference between the analytical approximation and the randomizations, and it is not due to the specific structure of the empirical interactions. There are two main sources of errors in this case. On one hand, ours analytical prediction is expected to work is the number of species is large enough and if the variance of interactions is not to high (that is not always true for the parametrizations used). On the other hand, our approximation was formulated for random matrices, while randomizations of mutualistic networks still conserve a bipartite structure.
The randomization procedure explained above and figure 5 show that the size of the coexistence domain obtained with empirical network structure is well predicted by the one obtained with random structure. This result does not imply that structure has no effect on Ξ, but it shows that, if this effect exists, it must be relatively small (compared for instance to the variation of Ξ obtained by changing the interaction strengths), i.e. the relative error made by approximating empirical networks with random structure must be small.
Since the effect of structure is small, it is also expected to be very sensible to the interaction strengths. When we parametrized empirical networks and their randomizations to obtain figure 5, we drawn the interaction strengths several times from a given distributions. The realized coefficients were therefore different across different networks, and the values of Ξ shown in figure 5 were averaged over these independent extractions. Since the difference between randomizations and empirical structure is small, it might be impossible to detect any difference with this procedure.
In order to explore and quantify the effect of the empirical structure on the size of feasibility domain, we adopted a different parametrization and randomization method. Given an empirical network, we drawn the interaction strengths only once from a given distribution (as described in As the competition level is increased and once variation in the interaction strengths is introduced, the effect of the network topology on the total size of feasibility domain becomes negligible.
Supplementary Note 9.2 Food webs We compared the size of feasibility domain of empirical networks with their corresponding randomizations and a network generated accordingly to the cascade model [23].
For each network, we randomized the adjacency matrix L 100 times, by generating connected networks with the same size and number of links.
We also generated networks generated accordingly to the cascade model (using the same method explained in [24]). In this case the adjacency matrix was obtained by generating connected networks with the same size and number of links, by assigning a link between species i and j only if i > j. The volume Ξ quantifies how many growth rate vectors are compatible with coexistence.
Let us consider a feasible growth rate vector, and perturb it in a random direction. What is the probability that the new vector is still feasible? This is not just a function of the size Ξ of the feasibility domain. Indeed, one can imagine that the feasibility domain is about equally spread in every direction-or that, for the exact same value of Ξ, the feasibility domain is streched in some directions but is very narrow in some other ones. A perturbation in one of the "narrow" directions is much more likely to lead out of the feasibility domain in the latter case than in the former.
To quantify this property, one strategy could be to measure the different responses on the perturbation (i.e., the probability of being feasible) depending on the direction of the perturbation (in which direction we change the growth rate vector). This choice has the big disadvantage of depending not only on the properties of interactions (the interaction matrix A), but also on the strength of the perturbation (the angular displacement between the initial and the final growth rate vector) and the growth rate vector before the perturbation (e.g., if the initial vector is close or far from the edge of the feasibility domain). We propose instead a purely geometrical method to quantify the response to different perturbations (see figure 1 of the main text). We can measure the distribution of the side lengths. Imagine we have two interaction matrices with the same Ξ, but with very different distributions of side lengths. One of them has all sides of equal length, while the other one has a more heterogeneous distribution. In the first case any direction of the perturbation is expected to have a similar effect, and there are no particularly dangerous directions. In the second case there are some directions of the perturbation that are much more dangerous than others, and even a small change of conditions along one of those dangerous direction can lead to the extinction of one or more species.
We know that the feasibility domain is a convex polyhedral cone (see section Supplementary Note 3). Its "corners" are identified by its generators and its sides are determined by all pairs of generators (see section Supplementary Note 12 for the S = 3 case).
Since we are considering growth rates on the unit (hyper)sphere, and the generators are normalized to one, any pair of generators will lie on the sphere's surface. The scalar product of two generators is the cosine of the angle between the two. Since the two generators are on the unit ball's surface, the arc between the two (which is the side length) is equal to the angle. We have therefore that the length of the side of the feasibility domain corresponding to a pair of generators g i and g j is Using equation 12, we can express the S(S − 1)/2 side lengths of the convex polytope explicitly in terms of the interaction matrix: (95) We are interested in the distribution of the side lengths, and in particular in its heterogeneity. In the following section we will calculate these quantities for random matrices.
Supplementary Note 10.1 The distribution of side lengths in random matrices In this section we obtain the distribution of sides length for large random matrices, whose entries are distributed accordingly to an arbitrarily bivariate distribution.
We assume that the diagonal elements of A are all equal to −d (this hypothesis can be easily generalized), while the offdiagonal pairs (A ij , A ji ) are random variables with distribution q(x, y).
Our goal is to find the distribution of the side lengths η in the large S limit, defined as Since we are summing over all i and j, and all the rows are identically distributed, we can remove the sum and consider just two rows: Since we are interested in the large S limit, we have that where E 1 and E 2 are the first and second marginal moments of q (equations 35 and 36). Let us call this quantity Z. In this limit we therefore obtain where q(z) is the marginal distribution of q(x, y): We can introduce the distribution of the sum: q s (t) = dxd yq(x, y)δ(t − (x + y)).
The term is the distribution of a sum of S − 2 uncorrelated random variables. These random variables are the product zw of two random variables whose distribution is q. Since the second moment of q(x) is finite, the central limit theorem holds and this distribution converges, in the large S limit, to a Gaussian distribution with mean and variance We have therefore The distribution of η is not universal as it depends on q s (t), which depends on the distribution of the coefficients. On the other hand, the dependence is explicit, and it is possible to calculate P (η) for any distribution q(x, y).
We show explicitly the case of q(x, y) being a bivariate normal distribution, i.e., In this case q s (t) is a normal distribution, and can be obtained from eq 101 Substituting into equation 105, we see that P (η) has the form of a convolution of two Gaussians, and turns out to be equal to The mean cos(η) and variance var(cos(η)) will be computed in the next section in the most general case of an arbitrary interaction distribution.
Supplementary Note 10.2 Moments for random matrices As explained in the previous section, the distribution of the side lengths is not a universal quantity, as it depends on the distribution of interaction strengths. In this section we compute the mean and the variance in the general case, showing that they depends only on E 1 , E 2 and E c .
Here and in the main text we do not report the moments of the side length η, but the moments of its cosine. The cosine of the side length measures the overlap between two rows of the interaction matrix (or the scalar product of two generators of the convex polytope). As its value gets close to one, the side length approaches zero.
Starting from equation 95, we have that Since we are interested in the large S limit, we can write the denominator as in equation 98 and obtain cos(η) = 1 and then . (111) In the large S limit, this becomes to leading order in S.
In a similar way, we can write the second moment as In the large S limit we obtain We can compute the averages of the different terms, obtaining and We finally get that, in the large S limit, var(cos(η)) = cos(η) 2 − cos(η) (118)

Supplementary Note 11 Side heterogeneity for different structures and empirical networks
In figure 13 we considered the effect of four nonrandom structures on the mean and variance of the side lengths. The interaction strengths were drawn from a normal distribution with given mean, variance, and correlation. For some structures we considered multiple interaction types and therefore multiple means (one positive and one negative), in which case the coefficient of variation of the interactions and the correlation was constant and independent of the mean. Networks were parametrized as explained in section Supplementary Note 8.
• Modular. In this case we considered interaction matrices with a perfect block structure (to generate figure 3 we considered four blocks of equal size).
• Bipartite. In this case we considered an interaction matrix with two bipartite blocks of equal size. The mean interaction of the offdiagonal blocks was set to be negative, while the one of the in-diagonal blocks was positive.
• Nested. The interaction matrix had a bipartite structure. The diagonal blocks had a random structure with negative mean interaction strength. In the offdiagonal blocks, we consider a connectance equal to one half and we built a perfectly nested matrix. The mean interaction strength was positive in the offdiagonal blocks.
• Cascade. We build a matrix using the cascade model, and parameterize it with a positive and a negative mean depending on the role of the species in the interaction.
In the case of empirical structures, figure 3 of the main text, was obtained considering the same networks and the same parameterizations considered in section Supplementary Note 8. We compared var(cos(η)) with the values expected in the random case. Figure 14 shows the comparison between cos(η) obtained for empirical networks with the null prediction. Its value is well predicted by the null expectation for mutualistic networks, while the null expectations underestimates this value for food webs. This is consistent with the fact that the size of feasibility domain of random networks is larger that the one of empirical networks.

Supplementary Note 12 Feasibility domain for S = 3
When S = 3, it is possible to visualize in three dimensions a convex polyhedral cone and the feasibility domain [1]. In figure 15 we show a convex polyhedral cone in three dimensions and its generators.
An important feature of convex polyhedral cones is that if r belongs to the cone, then so does cr for any positive constant c. As explained in section Supplementary Note 3, this is a consequence of the linearity of equation 1. It is relevant therefore to limit our analysis to the growth rate vectors on the unit sphere, i.e., to vectors r such that When we consider the vector in the feasibility domain on the surface of a unit sphere we obtain the areas of figure 1 in the main text. In this case, the quantity Ξ is the area of the triangle, while the side lengths are the three sides of the triangle. Note that the polygon is not a triangle (as it lies on a sphere), but rather a spherical triangle. Its sides are arcs of a circumference, while its corners are identified by the three generators of the convex polyhedral cone.
In the S = 3 case it is possible to obtain a closed expression for the area Ξ [11]: where the second term adds one to the first term when the argument of the arctangent is negative, while the matrix G is defined as Equation 120 can be expressed directly in terms of the matrix A using equation 12.

Supplementary Note 13 Nonlinear per capita growth rates
In general, the effect of a species on the per capita growth rate of other species is not linear.
Equation 1 assumes this to be linear and the results presented in this paper were obtained under this assumption. Nonlinearity of the per capita growth rates can be thought of as a dependence of the interaction matrix A on n: For instance, in the case of predator-prey interactions with a Holling type II functional response, it would have the form where the h ij are the handling times.
The presence of nonlinearity has strong consequences for both feasibility and stability. It is no longer possible to disentangle feasibility and stability with a simple condition on A 0 ij . This means that feasibility will depend not only on the direction of r, but also on its length.
The results presented here are a necessary stepping stone for assessing the feasibility of nonlinear systems. When the degree of nonlinearity is small (e.g., h ij ≈ 0), one can use our results, valid for the case h ij = 0, to find the center of the feasibility domain and the generators. that these two quantities are equal and the bottom panels quantify their deviations. We know that λ is universal, and since Ξ has a similar behavior, we conclude that Ξ is also universal.   M_PL_008   M_PL_011  M_PL_024  M_PL_032  M_PL_033  M_PL_036  M_PL_042  M_PL_045  M_PL_050  M_PL_059  M_SD_001  M_SD_002  M_SD_003  M_SD_005  M_SD_006  M_SD_008  M_SD_009  M_SD_011  M_SD_014  M_SD_015  M_SD_017  M_SD_021  M_SD_023  M_SD_024  M_SD_025  M_SD_026  M_SD_027  M_SD_028  M_SD_029 M_PL_008   M_PL_011  M_PL_024  M_PL_032  M_PL_033  M_PL_036  M_PL_042  M_PL_045  M_PL_050  M_PL_059  M_SD_001  M_SD_002  M_SD_003  M_SD_005  M_SD_006  M_SD_008  M_SD_009  M_SD_011  M_SD_014  M_SD_015  M_SD_017  M_SD_021  M_SD_023  M_SD_024  M_SD_025  M_SD_026  M_SD_027  M_SD_028  M_SD_029 M_PL_008   M_PL_011  M_PL_024  M_PL_032  M_PL_033  M_PL_036  M_PL_042  M_PL_045  M_PL_050  M_PL_059  M_SD_001  M_SD_002  M_SD_003  M_SD_005  M_SD_006  M_SD_008  M_SD_009  M_SD_011  M_SD_014  M_SD_015  M_SD_017  M_SD_021  M_SD_023  M_SD_024  M_SD_025  M_SD_026  M_SD_027  M_SD_028  M_SD_029 M_PL_008   M_PL_011  M_PL_024  M_PL_032  M_PL_033  M_PL_036  M_PL_042  M_PL_045  M_PL_050  M_PL_059  M_SD_001  M_SD_002  M_SD_003  M_SD_005  M_SD_006  M_SD_008  M_SD_009  M_SD_011  M_SD_014  M_SD_015  M_SD_017  M_SD_021  M_SD_023  M_SD_024  M_SD_025  M_SD_026  M_SD_027  M_SD_028  M_SD_029 figure 12 but with µ − = 0.25µ max and µ + = 2µ − q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Null <cos(η)> Observed <cos(η)> q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q  Null <cos(η)> Observed <cos(η)> Food webs q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q  surface. Since we are considering the unit sphere, the arc length η is equal to the angle between the two generators.