An Approach to Study Species Persistence in Unconstrained Random Networks

The connection between structure and stability of ecological networks has been widely studied in the last fifty years. A challenge that scientists continue to face is that in-depth mathematical model analysis is often difficult, unless the considered systems are specifically constrained. This makes it challenging to generalize results. Therefore, methods are needed that relax the required restrictions. Here, we introduce a novel heuristic approach that provides persistence estimates for random systems without limiting the admissible parameter range and system behaviour. We apply our approach to study persistence of species in random generalized Lotka-Volterra systems and present simulation results, which confirm the accuracy of our predictions. Our results suggest that persistence is mainly driven by the linkage density, whereby additional links can both favour and hinder persistence. In particular, we observed “persistence bistability”, a rarely studied feature of random networks, leading to a dependency of persistence on initial species densities. Networks with this property exhibit tipping points, in which species loss can lead to a cascade of extinctions. The methods developed in this paper may facilitate the study of more general models and thereby provide a step forward towards a unifying framework of network architecture and stability.


A Existence and uniqueness of solutions
In this appendix, we show that a unique solution exists for the initial value problem x(0) = x 0 ∈ [0, 1] n , with i ∈ I := {1, . . . , n}, x = (x 1 , . . . , x n ), g i (x) := r i + ∑ j∈I a ji x j . Note that unlike in the main text, g i defined here includes the term a ii x i .
It is easy to extend the presented results also for other initial conditions (with entries larger than 1) and for capacities K i = 1 (refer to system (2) in the main text).

A.1 Existence of solutions
As the right hand side of system (S1) is discontinuous, there will be be conditions at which the solution to (S1) is not differentiable.Therefore, we need a sufficiently general notion of solution of the differential equation. We say [1] that x is a solution to system (S1) if (S1) is satisfied almost everywhere and x is absolutely continuous. This is equivalent to claiming that x solves the integral version of (S1) [1]. Such solutions are called Caratheodory solutions [2].
An introduction to differential equations with discontinuous right hand sides is given in [1]. Refer to reference [2] for a more in-depth analysis of differential equations with discontinuous right hand sides.
Proof. To show existence of solutions of (S1), we consider first the differential inclusion A differential inclusion differs from a differential equation in that the right hand side returns sets instead of single values or vectors. A Caratheodory solution x to (S2) is an absolutely continuous map for which (S2) is satisfied almost everywhere.
Proposition S2 in [1] states that if is locally bounded and takes non-empty, compact, and convex values, the initial value probleṁ x = F (t, x(t)), with x(0) = x 0 has a Caratheodory solution if 1. for each x ∈ R n , the set-valued map t → F (t, x) is measurable and 2. for each t ∈ R, the set-valued map x → F (t, x) is upper semicontinuous.
Here, B(R n ) denotes the set of all subsets of R n .
Note that x i ≤ 1 for all i ∈ I if x i (0) ≤ 0. Therefore, min 0, r i + ∑ j∈I,a ji <0 a ji ≤ x i g i (x) ≤ max 0, r i + ∑ j∈I,a ji >0 a ji .
Therefore, G is bounded. Furthermore, G takes non-empty, compact, and convex values only. Since G is independent of t, it is measurable in t.
A set-valued map F is upper semicontinuous if for any argument x and any series x (k) → x, any series y (k) with y (k) ∈ F (x (k) ) converges to a value y ∈ F (x). This is true for G . Thus, the differential inclusion (S2) has a Caratheodory solution.
It remains to be shown that the solutions of (S2) also solve (S1). A classical differential equation is equivalent to a differential inclusion in which the right hand side consist of single-valued sets only. That is, system (S1) is equivalent to Hence, the only difference of a solution x of (S2) compared to a solution of (S1) is that for any i ∈ I,ẋ i can take non-zero values for a time of measure larger than zero if x i = 1 and g i (x) > 0. That is, there may exist a set T of times with a positive Lebesque measure λ (T ) > 0 such that x i (t) = 1, g i (x(t)) > 0, and d x i dt (t) > 0 for all t ∈ T . We argue that this cannot happen in practice. Suppose for contradiction a set T as described above exists. By construction of the Lebesque measure, there is a Borel set T B ⊇ T with Lebesque measure λ (T B ) = λ (T ). Since T B is a Borel set, it can be be written as the union of non-empty closed intervals T We know that ε > 0. We can write for all t ∈ T B . We know that x i (t 1 ) = 1 and x i (t) = 1. This is in contradiction with inequality (S4). Hence, d x i dt = 0 if x i = 1 and g i (x) > 0 almost every time and every solution x of (S2) is also a solution of (S1).

A.2 Uniqueness of solutions
Reference [1] also contains results on uniqueness of Caratheodory solutions. However, our system does not satisfy their requirements. Hence, we will use a proof based on well-known results for continuous differential equations to show that system (S1) has a unique solution.
Proof. We will conduct a proof by contradiction. Suppose there were two distinct solutions x = y to system (S1) with x(0) = y(0) = x 0 . As both x and y are continuous, there must be an open interval T = (t 1 ,t 2 ) with x(t 1 ) = y(t 1 ) and x(t) = y(t) for all t ∈ T . Now consider the following system: withx(t 1 ) = x(t 1 ). Note that the right hand side of system (S5) is continuous, because the condition does not depend onx or t, and min (·, ·) is a continuous function. In fact, the right hand side of system (S5) is Lipschitz continuous. Therefore, system (S5) has a unique solution.
Let us now consider a solutionx of system (S5). For all i ∈ I, let Observe that min T min Note thatt > t 1 . In the intervalT := [t 1 ,t), system (S1) is equivalent to system (S5). This however, contradicts the existence of two distinct solutions to system (S1) in the interval T , because T ∩T = / 0 and system (S5) has a unique solution. Therefore, system (S1) must have a unique solution, too.

B Average per-capita growth rate and persistence
We show how the non-truncated per-capita growth rate a ji x j with i ∈ I := {0, . . . , n} and I i := I\ {i} is related to survival and extinction of a species i in system (2) (main text). Throughout this appendix, we suppose that the existence assumption from the main text is satisfied; i.e., the limit exists. In our proofs we will consider the actual per-capita growth rate a ji x j < 0 0 else, and the averages In equations (S6) and (S7), we write g i (t) := g i (x(t)) and h i (t) := h i (x(t)) as functions of t. This is only possible if the solution x to system (2) is already known. That is, the techniques used in this appendix are not suitable to find a solution to the system.

3/30
We rather show that the solution must have certain properties. For example, the solution must satisfẏ for all i ∈ I. It follows that we can write the solution as with c = x i (0). We will use this form of the solution to relate it to properties ofh i andḡ i .
We proceed by proving two helpful lemmas aboutḡ i ,h i , andx i . Afterwards, we will show how the sign ofḡ i indicates persistence of a species.
Proof. Suppose for contradiction thath i > 0. By definition of the limit, there is for each ε > 0 a T such that for all t ≥ T it is Let us chooseε =h i 2 . Then with t ≥ T , That is, be the difference between the non-truncated and the truncated per-capita growth rate and

4/30
and that be the total time between 0 and T where x(t) = 1 holds. Here I · (·) is the indicator function. Definẽ and note thatx i (t) ≤ x(t). From our initial assumptions we get Hence, Theorem.x i > 0 if and only ifḡ i > 0.
Proof. To prove this equivalency, we consider the possible casesḡ i > 0,ḡ i = 0, andḡ i < 0 and show that the correspondingx i values satisfy our claim. Sinceḡ i must always assume some value, no further proofs are required for the reverse direction.
We start with the proof thatḡ i > 0 impliesx i > 0. Let us assume for contradiction thatḡ i > 0 andx i = 0. By Lemma 4, it is h i =ḡ i > 0. However, by Lemma 3, it ish i ≤ 0. This is a contradiction, and thus the statement is shown.
We proceed by showing thatḡ i ≤ 0 impliesx i = 0. Suppose for contradiction thatḡ i ≤ 0 andx i > 0. Then By the definition of the limit, we can find for each ε > 0 a T ≥ 0 so that for all t ≥ T

5/30
Since the left hand side converges to 0 exponentially fast, it must bex i = 0, which contradicts our initial assumption. Therefore, it must holdx i = 0. This concludes the proof.

C Covariance bounds
In this appendix, we derive bounds for the covariances between the model parameters r i , a ji and the average species densitiesx i . These covariances play a major role in the approximation of the mean and variance of the average non-truncated per-capita growth rateḡ with In the first subsection, we will introduce notation and basic assumptions used in the later sections. Then we describe the structure of the covariance matrices, before we derive bounds for the covariance terms by exploiting the structure of the covariance matrices and the fact that they must be positive semi-definite.

C.1 Basic assumptions and notation
Throughout this appendix, we suppose that the existence assumption from the main text holds; i.e., the limitx i := lim exists for all i ∈ I := {1, . . . .n}. Recall that in our model 0 ≤x i ≤ 1. Whenever not explicitly defined otherwise, we use the variables i, j, k for arbitrary pair-wise different indices i, j, k ∈ I.
Suppose f is the probability density function of the average species densitiesx i . Let us start by making a note on exchangeability: as the differential equations for the x i have all the same shape and the random parameters r i and a ji of these equations are drawn according to the same rules, respectively, the random variablesx i are exchangeable. That is, given any permutation π : I → I and an arbitrary vectorx ∈ R n , it is f (x 1 , . . . ,x n ) = f x π(1) , . . . ,x π(n) . This has consequences on the covariances between the model parameters and thex i . For all pair-wise different i, j, k ∈ I, it holds whereby the constants on the right hand sides are independent of the indices of the variables on the left hand sides. Note that we ignore the parameters a ii here, as they are irrelevant for the characterization of extinct species and not included in our survival Recall that the parameters a ji and r i are drawn independently from each other. Therefore, the covariances between these parameters are 0. Following the notation in the main text, we write Note the subtle difference between the parametersμ a ,μ x ,σ 2 a , andσ 2 x defined here and the parameters µ a , µ x , σ 2 a , and σ 2 x defined in the main text. While µ a = E(a ji |a ji = 0) and σ 2 a = V(a ji |a ji = 0) refer to the distribution of the interaction coefficients given that the interactions exist (i.e. a ji = 0),μ a andσ 2 a refer to the distribution of the interaction coefficients directly. Similarly, µ x andσ 2 x defined here differ to the µ x and σ 2 x specified in the main text in that the latter parameters are conditioned on survival:

C.2 Covariance matrices
After these considerations, we can build the covariance matrices. To simplify future computations, it is advisable to consider the covariances between the growth constants r i and the long time average species densitiesx i separately from the covariances between the interaction coefficients a ji and thex i .

C.2.1 Growth rates and long time average species densities
The covariance matrix C r for the growth rates r i and the average species densitiesx i can be written in block form: whereby the matrix C rr describes the covariances between the parameters r i , respectively; the matrix C rx contains the covariances between the parameters r i and the variablesx i , and the matrix C xx represents the covariances between the average densitiesx i .
To describe the blocks, let I be the n × n identity matrix and J an n × n matrix consisting of ones only.
Since all the r i are drawn independently, the matrix C rr is given by a diagonal matrix: The matrix C xx contains the varianceσ 2 x on the diagonal and is filled with ξ everywhere else: The matrix C rx contains the covariances Cov(r i ,x i ) = ρ on the diagonal and the covariances Cov(r i ,x j ) = κ in the remaining entries:

C.2.2 Interaction coefficients and long time average species densities
As we did for the covariance between the rates r i and the densitiesx i , we can write the covariance matrix C a for the interaction parameters a ji and the densitiesx i in block matrix form whereby the matrix C aa describes the covariances between the parameters a ji , the matrix C ax contains the covariances between the parameters a ji and the variablesx i , and the matrix C xx represents the covariances between the average densitiesx i . As above, let I be the n × n identity matrix and J an n × n matrix consisting of ones only. Furthermore, let P be the n × n cyclic permutation matrix. This matrix contains ones at the entries directly below the diagonal and in the upper right corner. That is, for n = 5 dimensions, P looks as follows: Multiplying P to a matrix M rotates the rows of M downwards. That is, the ith row becomes the (i + 1 mod n)th row. Consequently,

7/30
Now we can proceed deriving the blocks of C a . Since the parameters a ji are drawn independently, C aa is a diagonal n (n − 1) × n (n − 1) matrix withσ 2 a at all diagonal entries. The matrix, C xx is filled everywhere with ξ except for the diagonal, where the matrix is filled withσ 2 x . That is, The matrix C ax has a more complicated structure. To ease notation, we write the index 0 for the index n if convenient. That is, for example, a i0 := a in andx 0 :=x n . Let A i := (a 1,i+1 mod n , a 2,i+2 mod n , . . . , a n,i+n mod n ). For example, if n = 5, then A 2 := (a 1,3 , a 2,4 , a 3,5 , a 4,1 , a 5,2 ). Now we can form a vector A := (A 1 , . . . , A n−1 ) containing all parameters a ji with i = j. Note that the entries A 1 , . . . , A n−1 of A are vectors themselves, which are concatenated to form a long vector. The matrix C (i) ax describing the correlation between the variables in A i with the entries of the vectorx = (x 1 , . . . ,x n ) has the form That is, the diagonal entries of the matrix are given by Cov(A i j ,x j ) = Cov a j,i+ j mod n ,x j = β ; the entries of the shifted diagonal are Cov A i j ,x i+ j mod n = Cov a j,i+ j mod n ,x i+ j mod n = α, and the remaining entries of the matrix are γ. With the parameters a ji sorted as in the vector A, the block C ax is given by

C.3 Covariance bounds
Covariance matrices must be positive definite. We can use this property to bound the covariances between the per-capita growth rates, the interaction coefficients, and the long time average species densities.

C.3.1 Growth rates and long time average species densities
Lemma 5. The characteristic polynomial of the matrix C r is given by Proof. LetC r := C r − λ I 2n×2n ,C rr := C rr − λ I, andC xx := C xx − λ I. Using the determinant computation formula for block matrices, we get SinceC rr is a diagonal matrix, it is easy to see that

8/30
It is well known (and easy to show by induction) that the characteristic polynomial of a matrix of ones J is given by = (a + nb) a n−1 . Thus, Inserting the partial results in our formula (S9) yields Proof. As the matrix C r is positive semi-definite, all its eigenvalues must be non-negative. These eigenvalues are the zeros of the characteristic polynomial found in Lemma 5. Hence, the eigenvalues satisfy either The smaller solution of equation (S11) is given by Hence, ρ ∈ κ ± σ r σ 2 x − ξ . Note thatσ 2 x ≥ ξ by the Cauchy-Schwarz inequality. Let us now consider the smaller solution of equation (S10): Inserting our bounds for ρ yields Withσ 2 x ≥ ξ , we obtain |κ| ≤ σ r σx 1 n + 1 √ n . Asx ∈ [0, 1], we know by Popoviciu's inequality on variances that σx ≤ 1 2 . Furthermore, σ 2 r is a known constant. Hence, the absolute value of κ decreases at least in O 1 √ n as n increases.

C.3.2 Interaction coefficients and long time average species densities
Lemma 6. The characteristic polynomial of the matrix C a is given by Proof. We proceed similar to the proof for Lemma 5. LetC a := C a −λ I n 2 ×n 2 ,C aa := C aa −λ I n(n−1)×n(n−1) , andC xx := C xx −λ I.
Using the determinant formula for block matrices, we get (S12)

10/30
SinceC aa is a diagonal matrix, it is a −λ I n(n−1)×n(n−1) and therefore Hence, and thus Consequently, Recall from the proof of Lemma 5 that for a, b ∈ R, det (aI + bJ) = (a + nb) a n−1 .
Inserting and simplifying yields Proof. The proof is analogous to the proof of Corollary 1. As the matrix C a is positive semi-definite, all its eigenvalues are non-negative. These eigenvalues are the zeros of the characteristic polynomial found in Lemma 6. Hence, the eigenvalues satisfy either It is clear that the varianceσ 2 a is non-negative. Equation (S13) has no negative solution if and only if 0 ≤σ 2 Secondly, equation (S14) has non-negative solutions only if and only if In particular, (S17) From here it is a short exercise to bound the absolute values of α, β , and γ by terms dependent on constants and n. Inserting (S15) in (S16) yields

13/30
Repeating this procedure and inserting the result gives us Since |ξ | ≤σ 2 x is bounded, we obtain after backsubstitution Proof. The covariance matrix for the variables a 2 ji andx 2 k with i, k ∈ I, j ∈ I i has the same structure as the covariance matrix for the variables a ji andx k as constructed in section C.2.2. Hence, Lemma 6 holds with appropriate relabelling of the constants.
The distribution of a 2 ji along with its mean and variance are known. Furthermore,x 2 k ∈ [0, 1], which implies that V x 2 k ≤ 1 4 . Thus, the proof of this corollary is analog to the proof of Corollary 2.

D Approximation of mean and variance of the average non-truncated per-capita growth rateḡ i
In this appendix, we derive approximations for the mean and the variance of the non-truncated per-capita growth ratē g i = r i + ∑ j∈I i a jix j .

14/30
Let Φ i = I (0,1] (x i ) be the random variable modelling the survival of a species i. Here, I · (·) denotes the indicator function.
Let furthermore χ i =x i (x i > 0) be a random variable modelling the average density of species i if it is known to survive. We can writex i = Φ i χ i and regard the random variables χ i and Φ i as independent from each other. Consequently, By Popoviciu's inequality on variances, it is σ 2 with some constant ω 1 . Provided the intra-specific competition does not hinder species from growing to the sharp density bound, this approach leads to good quantitative estimates of σ 2 x if we choose ω 1 = 1 2 (see Appendix E). For the case of strong intra-specific competition. we suggest a different estimate for σ 2 x in Approximation 2 in the main text. However, without loss of generality, we will proceed with the estimate given above. In the case of strong self-damping, our estimates can be adjusted easily.
We note that σ 2 x is often small, as surviving species either are either likely to grow to their capacity, implying that µ x is large, or may be kept at a small density by intra-specific competition. Therefore, it can be of advantage to neglect σ 2 x completely and regardx i as a scaled Bernoulli random variable if we are not seeking for quantitative estimates of the varianceσ 2 x but rather for approximations with the right order of magnitude. That is, we may approximatex i as and hencẽ

D.2 Mean
In the main text, we have already noted that In appendix C we have shown that Cov(a ji ,x j ) = O 1 √ n . Hence, we may neglect the covariance term: If cµ a Pµ x ≈ 0, then the covariance terms can become significant. This cannot happen if µ a > 0, as this results in large values of P and µ x . Furthermore, if c = 0, all variables are independent of each other and the covariance is 0. Nonetheless, the lack of precise covariance terms can distract our model precision if µ a = 0 or if P is small. Below we will show, however, that the covariance is small in these cases.
Here, the ∆ denotes the centralized versions of the respective random variables. That is, for a random variable Y , it is The higher moments in equations (S23) and (S24) are hard to compute. However, it is a common practice to approximate covariances ignoring these higher moments. Hence, we can write Cov(a ji x j , a ki x k ) = O ξ + 1 n applying the covariance bounds we found in appendix C. Consequently,

16/30
For a useful variance estimate, it remains to be shown that the O n 2 ξ does not affect the variance significantly. To do this, we recall that we may approximatex i ≈ µ x Φ i and concentrate on estimating To do this, we make use of the following lemma and corollary.

Proof. It is
On the other hand, Consequently, as claimed.
Corollary 4. Let X, Y , and Z be random variables and Z ∼ Bernoulli(p). Then Proof. The covariance formula follows from inserting the result of Lemma 7 into and the variance formula follows from V(X|Z) = Cov(X, X|Z).

17/30
Rearranging the equation in Lemma 7 leads to being the difference between the probability that a species survives given that another species survives and the unconditioned probability that a species survives.
We argue that |ε| is small if n is large. If g as approximated in the main text, we need to determine for some j = i Ignoring terms with lower orders of n, it is with ζ being the constant representing the O 1 n terms in Cov(a ji x j , a ki x k ). Furthermore, ≈ n cµ a Pµ x + cµ a ξ Pµ x .
Here, we again applied Lemma 7 and neglected higher moments. Before approximating the conditional variance V ḡ i Φ j = 1

18/30
note that Note that |ξ | < 1 and therefore O ξ + ξ 2 = O(ξ ). With equation (S25), we may write whereby ν is a small constant accounting for the higher moments initially neglected in equation (S25). The reason to consider higher moments here is that the terms dominating higher moments in Cov(a kixk a lixl , Φ j ) and 2E(a kixk )Cov(a kixk , Φ j ) cancel each other out in (S26). It is reasonable to assume that ν ∈ [−1, 1]. As will be apparent later, ν < −1 would result in a negative

19/30
variance. With these observations, we can estimate the conditional variance for large n: Now we can attempt to estimate ξ for large n. Let us first assume that ξ > O 1 n . Then the n 2 terms dominate the variance and we obtain If µ a = 0, it follows directly that ξ = 0, which contradicts our claim ξ > O 1 n . Hence, we consider µ a = 0. Letξ := ξ under the constraint P = S N 0, sign (µ a ) P,ξ P .
We argue that this system of equations does not have a "significant" solution for reasonable choices of ν. Let us first note that ξ > 0, because V(ḡ i ) would turn negative for large n otherwise. Observe that since P,ξ > 0, the term S N 0, P +ξ ,ξ P(1 + ν) is decreasing as ν is increasing. We know that ν ≥ −1. Hence, F N is the cumulative density function of the normal distribution. It is a simple exercise to show that F N 0, √ P,ξ is decreasing if P increases. A plot of F N 0, √ P,ξ −ξ for P = 0.2 shows that (S30) does not have a solution for P ≥ 0.2. That is, if P becomes sufficiently large, then our assumption ξ > O 1 n must be wrong. Since µ a > 0 implies that the mean value µ g is positive for large n (see equation (S29)), it must be ξ = O 1 n if µ a > 0. Now let us consider the case of µ a < 0. Under this condition, equation system (S28)-(S29) does have a solution, which can be determined numerically for each given ν. We have already noted that S N 0, P +ξ ,ξ P(1 + ν) is decreasing in ν.
Consequently,ξ and therefore ξ is maximized if ν is chosen to be large. If ν ≤ 1 as deemed reasonable above, then the largest 20/30 ξ value we can obtain is attained with ν = 1. Solving system (S28)-(S29) numerically yields ξ ≈ 4.3 · 10 −4 µ x . For ν = 0.5, we obtain ξ ≈ 2.2 · 10 −5 µ x and for ν ≤ 0, the system does not have a solution. That is, if ξ does not vanish, it becomes important in large systems (1000 species in order of magnitude) only, and only if higher order moments are of same order of magnitude as lower order moments. This observation and numerical simulations (see Appendix E) show that our heuristic will do well in the systems we considered in this study.
In summary, the variance approximation is valid whenever µ a ≥ 0 and a reasonable approximation for µ a < 0, unless n is very large.
Remark. We conducted all the computations above under the assumption that there is a unique estimate for P. However, in systems that exhibit "persistence bistability", i.e. P depends on the initial condition, Φ j = 1 may imply a move from one persistence state to another. As a consequence, it is possible that in such systems either all species survive or all species die; i.e.
ξ ≈ µ 2 x P (1 − P). In this case, our persistence estimates can be far from the actual persistence values. Therefore, our heuristic performs poorly in the direct neighbourhood of its jump discontinuities. The size of this neighbourhood depends on the variance of the initial condition. This can be observed in Figure 3e in the main text, where our heuristic performs particularly poorly in the case with 60% invaders (an initial condition with large variance). Note also the respective figures in Appendix E.

E Numerical estimates of unknown quantities crucial for the derivation of the heuristic
Our derivation of a heuristic for the expected portion P of persisting species requires a number of assumptions: 2. The conditional variance σ 2 x = V(x i |x i > 0) is small and reasonably well approximated by σ 2 4. The covariance ξ = Cov(x i ,x j ), i = j is small.
In the main text, we have provided arguments supporting these claims, and in Appendix C, we have shown that the third claim is true for large systems. However, we did not provide any quantitative estimates for the approximated quantities. In this appendix, we present numerical estimates for µ x , σ 2 x , and ξ and add a short discussion of these results regarding their implications for our heuristic.
As the purpose of this appendix is to confirm that the approximations made in the main text and Appendix C are valid, we need to show that the ξ term does not play a dominant role in the variance estimate Therefore, it makes sense to consider the quantity nξ , as it is crucial that this product is bounded. Therefore, we will present the covariance results in terms of the quantity nξ below.

E.1 Method
To estimate any of the quantities of interest numerically, we need a sufficiently large sample of long-time average densitiesx.
We computed this sample with the same method described in the section 'Simulation method' in the main text. We drew the random model parameters according to their respective distributions and simulated the resulting systems for T = 1000 time 21/30 steps. To avoid a bias due to the "lock-in phase", in which the species densities are far from their average, we determinedx by averaging the species densities in the time interval 3 4 T, T . We repeated this procedure N = 1000 times. To save computing time, we used the samex sample here that we also used to obtain the results for P presented in the main text.
Letx (k) be the vector of average species densities in the k th simulation run. To estimate µ x , we simply took the average of allx i > ε = 0.001, whereby epsilon was the extinction threshold described in the main text. That is The conditional variance σ 2 x can can be computed accordingly: In the same manner, we computed the variance σ 2 P : The covariance ξ is defined as We computed the expected value E(x i ) = E(x j ) by taking the average over allx (k) i ; i ∈ I, k ∈ {1, . . . , N}. To compute the expected value E(x ix j ) of the product of different average species densities, we determined for each k ∈ {1, . . . , N} and each i ∈ I, j ∈ I i := I\ {i} the product φ j and computed the mean of the results. In summary, A main result of this paper is that the behaviour of the complete system can be deduced from looking at an "average" species.
Therefore, the proportion of persisting species is driven by the number of links per species. It is reasonable to hypothesize that the same applies to the quantities considered in this Appendix. Therefore, we will present our numerical results in relation to the linkage density l = c (n − 1). Also, this makes it easy to link the results presented in this appendix with the results presented in the main text.

E.2 Results
In this section, we briefly describe the simulation results that we obtained.
The results for the average density µ x of surviving species is shown in figure S1. The density of surviving species is mainly 22/30 driven by the linkage density l. For multiple combinations of the parameters µ r , σ 2 r , µ a , and σ 2 a , we see a steep initial ascent of the µ x curve, which then levels out. Except for the case with strongly negatively biased interactions ( Figure S1d) and the bistable case ( Figure S1f), the initial densities of the species do not affect µ x strongly. In systems with strong damping ( Figure   S1e), the average density of surviving species is small. In these systems, µ x could be estimated with high accuracy. Figure S2 shows the variance σ 2 x of the average densities of surviving species. Similar to the mean µ x , the variance of the average densities is driven by the linkage density. For some parameter combinations, the curves increase for small l. The larger the linkage density gets, however, the closer the curves approach a limiting value, which is often small. The initial densities have a small effect on σ 2 x except for the bistable case ( Figure S2e) and systems with strongly negatively biased interactions ( Figure S2d). In the latter case, the curve for the colonization scenario is qualitatively different to the respective curve for the scenario with high initial densities.
In most systems, the estimate of the variance based on the value of the mean µ x yielded good results. Though quantitative deviations are visible, the estimates had the right order of magnitude and had a qualitatively similar dependency on l as the computed exact variances.
In Figure S3, we present simulation results for the variance σ 2 P of the proportion of coexisting species. In most systems, there is no connection between the linkage density and σ 2 P visible. However, in all simulated scenarios, σ 2 P was very small in systems with large linkage density. In systems with persistence bistability (Figure S3e), the variance has a sharp spark in a neighbourhood of the jump discontinuity of the heuristic. The sharpness of the spike is related to the variance of the distribution of initial densities: the region of large σ 2 P is widest in the scenario with some invaders (low density) and some native species (high density).
The simulation results for the scaled inter-species correlation nξ are displayed in Figure S4. For some parameter combinations, we see a clear relationship between nξ and the linkage density l. However, in other systems, the values are more scattered. In all considered systems, nξ approaches a respective limit value below 0.5. The covariance is positive in all systems, even in those in which species affect each other negatively on average (Figures S4b and S4d). The quantity nξ becomes large in the systems with persistence bistability only ( Figure S4e). The covariance peaks around the jump discontinuity of our heuristic.
The spike is widest in scenarios with highly variant initial densities.

E.3 Discussion
Most of the quantities considered in this appendix are strongly dependent on the linkage density l. This suggests that it may be possible to derive heuristic estimates for these quantities in future and thereby refine the heuristic presented. Regardless of this potential for future improvement, most of the approximations our heuristic is based on turned out to be valid.
As assumed for the derivation of the heuristic, the mean average density µ x of surviving species is large and mostly constant for systems without strong intra-specific competition. In systems with strong self-damping ( Figure S1e), the heuristic estimates are highly accurate. The results in Figure S1f are scattered for small l due to the small sample size of persistent species, as surviving species are rare in systems with small persistence values P. Of higher concern are the initial rapid changes of µ x with l found in all other simulated systems. This change could make our heuristic imprecise. Note, however, that the distribution of the growth constants is the main driver of persistence in systems with small linkage density. That is, in systems with small linkage density, µ x has only a minor effect on persistence. Therefore, our heuristic approximation based on a constant and large value of µ x performs well in general.
To reduce model complexity, we approximated of the variance σ 2 x of the average densities of surviving species using the corresponding mean µ x . This approximation turned out to yield valuable estimates. Though the approximation is not always quantitatively precise, the order of magnitude of σ 2 x is well approximated. Different scaling factor than ω 1 = 1 2 and ω 2 = 2 for the estimates σ 2 x ≈ ω 1 µ x (1 − µ x ) and σ 2 x ≈ ω 2 3 µ 2 x , respectively may yield even better estimates. However, the value of this constant would differ from case to case. As we do not know a mechanistic model for σ 2 x , we suggest to keep the (admittedly somewhat arbitrary) choice of "intermediate" ω 1 = 1 2 and ω 2 = 2. The smaller the variance σ 2 P of the fraction of persisting species the more precise will our approximation of the mean of the 23/30  Figure S1. Mean long-time average density µ x of persistent species as function of the linkage density l. Each subfigure corresponds to networks with different distributions of growth and interaction constants, respectively. Each marker represents a network with specific species richness n and connectivity c. The blue dots refer to networks with high initial species densities (b = 1), and orange triangles to networks with very low initial densities (colonization scenario, b = 0). The green squares in Subfigure S1f depict an intermediate scenario with b = 0.4. Note that the estimates of µ x are imprecise in networks in which only few species survive. Therefore, the markers in Subfigure S1f are scattered. In the scenario with strong damping (Subfigure S1e), specific estimates for µ x could be computed. These estimates are plotted as a solid line. It is visible that µ x assumes high values in most networks except for those with strong damping (Subfigure S1e). Nonetheless, µ x changes with the linkage density l. x are imprecise in networks in which only few species survive. Therefore, the markers in Fig. S2e are scattered. It is visible that the variance estimate based on µ x is reasonably precise. Furthermore, σ 2

24/30
x is rather small in general. Each subfigure corresponds to networks with different distributions of growth and interaction constants, respectively. Each marker represents a network with specific species richness n and connectivity c. The blue dots refer to networks with high initial species densities (b = 1), and orange triangles to networks with very low initial densities (colonization scenario, b = 0). The green squares in Fig. S3e depict an intermediate scenario with b = 0.4. Our heuristic requires that σ 2 P is small if l is large and P is small. This is satisfied in general except for systems with persistence bistability ( Figure S3e) where the condition is violated in the neighbourhood of the jump discontinuity of the persistence values. As σ 2 P is strongly related to ξ , refer to the concluding remark D.3 of Appendix D.3 for an explanation of this system behaviour.  Figure S4. Covariance ξ between different species' average densities as function of the linkage density l. The covariances are displayed multiplied with the species richnesses of the respective networks, as this product appears in the variance estimate for the average per-capita growth rateḡ i . Our heuristic approximation requires that nξ is bounded. Each subfigure corresponds to networks with different distributions of growth and interaction constants, respectively. Each marker represents a network with specific species richness n and connectivity c. The blue dots refer to networks with high initial species densities (b = 1), and orange triangles to networks with very low initial densities (colonization scenario, b = 0). The green squares in Fig. S4e depict an intermediate scenario with b = 0.4. The covariance term is bounded by a small value as required by our heuristic. Only in the network with persistence bistability (Fig. S4e), the covariance term gets large around the jump point. We explained this behaviour in the concluding remark D.3 of Appendix D.3.

27/30
exponential term in our heuristic be. Though σ 2 P is consistently small for systems with large linkage density, it turns our to be too large to make our heuristic precise even in large, highly linked systems with few persisting species.
The spark of σ 2 P around jump points of the persistence value P in systems with persistence bistability can be well explained by the impact of the initial conditions on the result. Initial conditions have a strong impact on the behaviour of systems exhibiting persistence bistability. If the initial conditions are drawn from a distribution with large variance, it is as well possible that the initial conditions favour persistence as it is possible that they lead to a trajectory with many extinctions. The more the two possible persistence states are apart from each other the larger is the variance of the proportion of persisting species. In the most extreme case, either all species survive or all species go extinct. This can lead to the maximal possible variance P (P − 1).
Our heuristic for P is based on the assumption that the correlations between average species densities is small. In Appendix D we used an approximation to show that this requirement is met in the systems that we considered in this paper. Our numerical estimates support this claim. However, the correlation is large around persistence jump points in systems with persistence bistability. This does not contradict our argument for small covariances, as this effect is due to the randomness of the initial conditions as described above and in the concluding remark D.3 of Appendix D.3. Nonetheless, we can conclude that our heuristic is imprecise around jump points. The size of the region of imprecision depends on the variance of the initial conditions.
In summary, our numerical results support the claims and approximations made in the main paper. However, the results also reveal limitations of the heuristic in systems with persistence bistability.

F Supplementary information on Figure 1
In this Appendix, we provide supplementary information on the creation of Figure 1 (main text).
We used the same simulation approach as outlined in the section 'Simulation method' (main text) and Supplementary Appendix E to compute a sample of   Figure 1 is to depict the quality of our approximation ofḡ i given that species i is affected by at least one other species, we excluded allḡ (k) i withḡ (k) i = 0 from our sample. We created normalized histograms of G := ḡ (k) i |ḡ (k) i = 0 for the two simulated systems. To depict the observed cumulative density function, we created cumulative histograms, respectively.
From our simulations, we computed the unknown parameters µ x , σ 2 x , and P as described in the section 'Simulation method' (main text) and Supplementary Appendix E. Then we used the resulting values to compute the approximate mean µ g and variance σ 2 g according to equations (7) and (10) (main text), respectively. Thereby, we neglected covariance terms. Finally, we plotted the normal probability density function and cumulative density function with the obtained mean and variance in the corresponding figures.

G Sharp density bound and functional response
The model our study is based on involves a sharp threshold that limits the growth of each species i. It is All results presented in this paper can be directly applied to system (S33) if the variables are redefined so that µ x = E min(x i , 1) x i > 0 and σ 2 x = V min(x i , 1) x i > 0 . Though proper introduction of functional response would require that the minimum function in system (S33) would include the sum over all prey species, and the interactions with predators should be adjusted accordingly, the dynamics of system (S33) may provide some hints on how a system with legitimate functional response would behave qualitatively.
Though the proof of the above claim is done by applying the steps of our heuristic derivation to system (S33) with the given main adjustments, we can gain a better understanding of why the model change does not affect persistence by comparing the dynamics of the two systems (see Figure S5). Note that the dynamics of systems (S32) and (S33) are similar until a species in system (S32) leaves the threshold density for the first time. Until this point in time, the densities of the species in the two systems are equal except for species that reach the threshold in system (S32). These species will continue to grow in system (S33), though this will not change the effect of these species on other species due to the functional response. The difference between the systems appears when the growth of a species i with density x i ≥ 1 turns negative for the first time. While this change push the density of species i below 1 in system (S32) immediately, which directly affects the impact of species i on other species, the transition to a density below the threshold will occur later in system (S33). Hence, system (S33) can be understood as a delayed version of system (S32). It is intuitive that this delay does not have a strong effect on persistence.