Generalizing game-changing species across microbial communities

Microbes form multispecies communities that play essential roles in our environment and health. Not surprisingly, there is an increasing need for understanding if certain invader species will modify a given microbial community, producing either a desired or undesired change in the observed collection of resident species. However, the complex interactions that species can establish between each other and the diverse external factors underlying their dynamics have made constructing such understanding context-specific. Here we integrate tractable theoretical systems with tractable experimental systems to find general conditions under which non-resident species can change the collection of resident communities—game-changing species. We show that non-resident colonizers are more likely to be game-changers than transients, whereas game-changers are more likely to suppress than to promote resident species. Importantly, we find general heuristic rules for game-changers under controlled environments by integrating mutual invasibility theory with in vitro experimental systems, and general heuristic rules under changing environments by integrating structuralist theory with in vivo experimental systems. Despite the strong context-dependency of microbial communities, our work shows that under an appropriate integration of tractable theoretical and experimental systems, it is possible to unveil regularities that can then be potentially extended to understand the behavior of complex natural communities.


S1.1 In vitro soil communities
In vitro soil communities were compiled from Ref. [30]: experiments were performed by coinoculating species at varying initial fractions, and propagating them through five growth-dilution cycles. The overall interaction time was chosen such that species extinctions would have sufficient time to occur, while new mutants would typically not have time to arise and spread. The abundances are measured by calculating the optical density of the cultures as well as plating on solid agar media and counting colonies, which are unique to each species. All experiments were carried out in duplicate. To determine species extinction between a given set of species, the two replicates from the initial conditions were combined and classified as extinct any species whose median abundance was less than 1%, which was just above the limit of detection. For the readers' convenience, below we provide information about the species, media, competition experiments, and measurements as written in Ref. [30], who performed the experiments.
Competition experiments: Frozen stocks of individual species were streaked out on nutrient agar Petri plates, grown at room temperature for 48 h and then stored at 4 • C for up to two weeks. Before competition experiments, single colonies were picked and each species was grown separately in 50 ml Falcon tubes, first in 5 ml nutrient broth for 24 h and next in 5 ml of the experimental M9 media for 48 h. During the competition experiments, cultures were grown in Falcon flatbottom 96-well plates (BD Biosciences), with each well containing a 150 µl culture. Plates were incubated at 25 • C without shaking, and were covered with a lid and wrapped in Parafilm. For each growth-dilution cycle, the cultures were incubated for 48 h and then serially diluted into fresh growth media by a factor of 1500. Initial species mixtures were prepared by diluting each species separately to an optical density (OD) of 3 × 10 −4 . Different species were then mixed by volume to the desired composition. This mixture was further diluted to an OD of 10 −4 , from which all competitions were initialized. For each set of competing species, competitions were conducted from all the initial conditions in which each species was present at 5%, except for one more abundant species. For example, for each species pair there were two initial conditions with one species at 95% and the other at 5%, whereas for the eight species competition there were eight initial conditions each with a different species at 65% and the rest at 5%.
Measurement of cell density and species fractions: Cell densities were assessed by measuring OD at 600 nm using a Varioskan Flash plate reader. Relative abundances were measured by plating on nutrient agar plates. Each culture was diluted by a factor between 105 and 106 in phosphatebuffered saline, depending on the culture's OD. For each diluted culture, 75 µl was plated onto an agar plate. Colonies were counted after 48 h incubation at room temperature. A median number of 85 colonies per plate were counted.

S1.2 In vivo gut communities
In vivo gut communities were compiled from Ref. [34]: experiments were performed by coinoculating species overnight and maintaining bacterial colonization through frequent ingestion in different flies. Species collections were assessed by visually scoring and enumerating species using a standard curve. We classified as extinct any species whose relative abundance was less than 10% in at least 71% of all (47 to 49) replicates, which corresponds to less than 1% of cases under a binomial distribution with p = 0.5. For the readers' convenience, below we provide information about the species, media, competition experiments, and measurements as written in Ref.
Germ-free fly preparation: Wolbachia-free and virus-free Drosophila melanogaster Canton-S flies reared on a cornmeal-based medium were transferred to embryo collection cages and allowed to acclimate in the cage for at least one day before egg collection. On the morning of egg collection, a yeast paste was added on a grape juice agar plate. Flies were left to lay eggs on this grape juice agar plate for 5-6 h. Eggs were then collected into a 400 µm cell strainer. In a biosafety cabinet, fly eggs were rinsed twice in 10% bleach (0.6% sodium hypochlorite) for 2.5 min each, once in 70% ethanol for 30 s, and three times in sterile dH 2 O for 10 s each. Approximately 50 eggs were transferred using a sterile cotton swab to individual vials containing sterile fly medium (10% filtersterilized glucose, 5% autoclaved active dry yeast, 1.2% autoclaved agar, 0.42% filter sterilized propionic acid). The resulting adults were maintained germ-free for at least three generations to mitigate any parental effects.
Bacteria strains: Five unique species were identified: Lactobacillus plantarum, L. brevis, Acetobacter pasteurianus, A. tropicalis, and A. orientalis, which were then isolated from D. melanogaster flies and checked by standard Sanger sequencing the complete 16S region. To prepare the inoculum for flies, bacteria were grown overnight in MRS medium in a shaker set at 30 • C. The bacteria were resuspended at a concentration of 10 8 cells/mL in sterile phosphate-buffered saline (PBS) for fly gnotobiotic preparations so that constant numbers of CFUs were inoculated per fly vial. The 32 combinations of the 5 bacterial strains were mixed using a Beckman Coulter Biomek NXP workstation to standardize the inoculum. Vials were swabbed to ensure correct bacterial species were present without contaminants.
Adult gnotobiotic fly preparation: Germ-free mated flies 5-7 days post-eclosion were sorted into 10% glucose, 5% active dry yeast medium inoculated with a defined mixture of bacteria. Flies thus treated have been shown to have less variation in physiology and gut morphology. Inoculating in the adult phase allowed us to separate developmental differences from effects on adult flies. Each vial contained a total of 5 × 10 6 CFUs (50 µL of 10 8 bacteria/mL in 1× PBS). Ten female and ten male flies were transferred into each vial. Gnotobiotic flies were transferred to freshly inoculated medium every 3 days for the duration of the concurrent lifespan-fecundity-development experiment.
Bacterial ecology calculations: Two different treatments were applied after an initial 10 days of inoculation where flies were inoculated. Every three days the remaining live flies were transferred to fresh food vials inoculated with 5×10 6 CFUs/vial (as with the initial inoculation) in order to reduce the effects of microbial interactions on the fly media. Vial swabs were performed to check S3 that all inoculated species were still present on the third day. In the first treatment (n=24 flies per bacterial treatment), flies were immediately subjected to bacterial load quantification on the 10th day of inoculation. This experiment evaluates the load and relative abundance of bacteria during the fly fitness experiment. A second treatment was undertaken to measure the steady state of bacterial population sizes in the absence of new colonization. After the initial 10 days of inoculation, these flies were transferred daily to fresh germ-free food for 5 days before subsequent bacterial load quantification.

S2.1 Structuralist framework
To carry out our structuralist analysis, we consider a community of S species. We assume that the community structure and dynamics can be described by any ecological model topologically equivalent to the linear Lotka-Volterra (LV) system, including extensions to nonlinear functional responses and stochastic functions [32,48]: Above, N i is the abundance (or biomass) of species i, r i represents the intrinsic growth rate of species i, and A = (a ij ) is the interaction matrix representing the per-capita effect of species j on an individual of species i (the biotic context). The linear LV system in Eq. (S1) is a phenomenological model written in the r-formalism, implying that intrinsic growth rates represent the summary effect of external conditions on the per capita growth of populations.
The feasible interior equilibrium of Eq. (S1) is is the vector of abundances at equilibrium, r = (r 1 , . . . , r S ) is the vector of intrinsic growth rates, A −1 is the inverse of the interaction matrix. That is, for a given interaction matrix A, feasibility in Eq. (S1) will be satisfied as long as the r-vector falls inside the feasibility domain D F (A) = N * 1 > 0, . . . , N * S > 0 such that − N * 1 a 1 − · · · − N * S a S = r , where a i is the i-th column vector of the interaction matrix A [40]-this corresponds to the match between internal and external conditions (Figure 2 Panel F dark symbols). Because the feasibility domain is described by the positive linear combinations of the columns of matrix A, it corresponds to an algebraic cone in R S [37,40]. This geometric view of the feasibility domain leads to a probabilistic interpretation. The size of the feasibility domain is equivalent to the probability of randomly sampling a direction of the r-vector that falls inside the algebraic cone defined by D F (A).
To compute the normalized size of the feasibility domain, we first define the unit ball in R S as B S = {u ∈ R S , u 2 ≤ 1}. Then, the normalized size of the feasibility domain (i.e., probability of feasibility) can be computed as the volume of the algebraic cone that is inside the unit ball divided by the volume of the unit ball inside the positive quadrant R S >0 : where 2 S normalizes the feasibility domain to positive parameter values (i.e., r i > 0 ∀ i). Thus, Ω(A) = P (r ∈ D F (A)) ∈ [0, 1], where the upper bound represents a scenario with no interspecific interactions. Assuming that any direction of the r-vector is equally likely, the size of the feasibility domain can be numerically estimated by solving the following integral [60]: where we set A A = Σ −1 in order to obtain the cumulative distribution function of a multivariate normal distribution with mean µ = 0 and covariance matrix Σ. Equation (S3) can be computed efficiently for even relatively large communities by solving numerically the integration using a quasi-Monte Carlo method [40,61].
By assuming that the sampling of the directions of intrinsic growth rates is independent and identically distributed for every species, we can compute the probability that a randomly chosen species i is feasible (i.e., N * i > 0) as [32,40]: This species-level measure is bounded as p(i|A) ∈ [0, 1] and can be interpreted as the probability of feasibility under changing environments. Note that the intersection of the two feasibility domains D F (A) and D F (B) (the positive orthant of the unit ball ) can then be mathematically formalized as the linear combination N * i a i of the spanning vectors of A, where N * i satisfies and a 1 , .., a S and b 1 , .., b S are the spanning vectors of the cone D F (A) and D F (B), respectively. Then, we need to apply the triangularization of the intersected region into several cones, and the intersected volume can be computed by adding the volume of each cone together [40].
The dataset contains the equilibrium absolute abundance measured in Colony Forming Units (CFUs) for each of the 2 5 − 1 = 31 different communities that can be assembled. Each species collection has between 40 and 49 replicates. The goal is to obtain an estimateÂ = (â ij ) ∈ R S×S of the interaction matrix A = (a ij ) ∈ R S×S of the linear Lotka-Volterra (LV) system of Eq. (S1).
Here, a ij is the effect of species j on the per-capita growth rate of species i.
To infer the interaction matrix, we consider a subset of the above dataset containing only one or two species. Therefore, the available information is the abundance of species i in isolation, and how its abundance changes when species j is introduced. A small change in the abundance of species i due to the introduction of species j is evidence of a negligible a ij , such as when species S5 2 is added to species 1 (Fig. S1A). By contrast, a large change in abundance is evidence of a significant a ij , such as when species 1 is added to species 2 (Fig. S1B).
To quantify the value of a ij , let x * i denote the equilibrium abundance of the i-th species in isolation. Let (y * i , y * j ) denote the equilibrium abundance of species i and j when both are present. Then, for all data such that x * i = 0 and y * i = 0, the LV system of Eq. (S1) implies that the following equations must be satisfied: Note that, since only steady-state measurements are available, there are fundamental limitations in the information of a ij that can be inferred [62]. Namely, in the above equations we have three unknown (r i , a ii , a ij ) but only two equations, implying there is no sufficient information to constraint the three unknowns. To circumvent this limitation, we can assumeâ ii = −1 without loss of generality [63]. With this assumption, and subtracting the first from the last line of Eq.
(S8), we obtain which can be used to estimate a ij from the experimental data. Importantly, since there are several replicates for x * i and (y * i , y * j ), Eq. (S9) actually characterizes a distribution of possible values for a ij obtained by taking an arbitrary pair of replicates. To obtain the single value necessary for our predictions, we definedâ ij = Median(a ij ), i = j, estimated using a Bootstrap method with 100,000 pairs.
To illustrate the inference process, Figure S1C shows the distribution obtained for a 12 . Calculating the median value givesâ 12 = 0. This result is consistent with the fact that species 2 produces little change in the abundance of species 1, as observed in Fig. S1A. By contrast, the distribution for a 21 is charged to negative values withâ 21 = −1.119 (Fig. S1D). This result is consistent with the fact that species 1 decreases the abundance of species 2, as observed in Fig. S1B. By repeating this process for all pairs of species, we obtained the following estimate for the interaction matrix: To homogenize our structuralist analysis across in vitro and in vivo communities, we inferred the interaction matrix of the soil communities in the same way as we did for the gut communities (as mentioned above) using the data reported in the supplementary information of Ref. [30]. Note that in the original study [30], the interaction matrix was inferred by fitting the linear LV system to the data using a least-squares method. We also ran our analysis using the originally inferred matrix and results presented in Fig. 4 (main text) were qualitatively similar. Our inferred matrix S6 for the in vitro communities using median values is: