Steering ecological-evolutionary dynamics to improve artificial selection of microbial communities

Microbial communities often perform important functions that depend on inter-species interactions. To improve community function via artificial selection, one can repeatedly grow many communities to allow mutations to arise, and “reproduce” the highest-functioning communities by partitioning each into multiple offspring communities for the next cycle. Since improvement is often unimpressive in experiments, we study how to design effective selection strategies in silico. Specifically, we simulate community selection to improve a function that requires two species. With a “community function landscape”, we visualize how community function depends on species and genotype compositions. Due to ecological interactions that promote species coexistence, the evolutionary trajectory of communities is restricted to a path on the landscape. This restriction can generate counter-intuitive evolutionary dynamics, prevent the attainment of maximal function, and importantly, hinder selection by trapping communities in locations of low community function heritability. We devise experimentally-implementable manipulations to shift the path to higher heritability, which speeds up community function improvement even when landscapes are high dimensional or unknown. Video walkthroughs: https://go.nature.com/3GWwS6j; https://online.kitp.ucsb.edu/online/ecoevo21/shou2/.


Supplementary
: Large population size interferes with community selection. Evolutionary dynamics of artificial community selection when the Newborn volume and thus total biomass are scaled up by 10 times (top panels) and 100 times (bottom panels) compared to simulations shown in Figure 6c. As a result, the average total biomass in a Newborn community are 10 3 and 10 4 , respectively. Since cheaters take over more deterministically in large populations, f P and community function decline to very low levels within 40 cycles, although they are maintained at positive values due to inter-community selection. Black, cyan and gray curves represent three independent replicates. Dashed lines mark f * P optimal for community function or the maximal community function.
Supplementary Figure 3: Community functions P (T ) obtained from stochastic simulations can be adequately predicted by numerically integrating deterministic differential Equations 1-7. (a) We calculated community function P (T ) for 100 communities from the 2000th cycle of the simulation plotted in black curves in Figure  6c, using Equations 1-7 with φ M (0) as the initial condition and f P (0) as the f P parameter. We then compared calculated P (T ) with P (T ) from stochastic simulations, and observed a reasonable concordance. Note that community functions from simulations (where cheaters M cells with small cost f P increase in frequency) are generally lower than those from calculations. (b) We calculated P (T ) of 100 communities from the 2000th cycle of the simulation shown in black curves in Figure 7(a-b) with differential equations. For each community, we used its Newborn total biomass and Newborn species composition as initial conditions and the following within-Newborn averages as parameters: average M's cost f P (0), average maximal growth rates g Hmax and g M max , average affinity of M to Byproduct K M B , average affinity of M to Resource K M R , and average affinity of H to the Resource K HR . Calculated communities functions agree well with those obtained from stochastic simulations.
Supplementary Figure 4: Probability density functions (PDF) of normalized f P (0) and φ M (0) of offspring communities plotted in Figure 2e. Determinants are normalized within lineage before pooled together to yield the PDFs. Specifically, if the determinant of the jth offspring community descending from the ith parent community is denoted as x ij , the normalized determinant isx ij = (x ij − x i )/σ i where x i and σ i are respectively the mean and standard deviation of the determinants obtained from all offspring communities descending from the ith parent community. If the within-lineage distribution of x ij is normal, then the pooled distribution ofx ij satisfies a standard normal distribution, as is the case for normalized φ M (0). This is not true for normalized f P (0), whose distribution is skewed. Both PDFs are obtained from~6000 offspring communities (100 parent communities ×~60 offspring communities per parent community).
Supplementary Figure 5: Mixing Adults before reproduction interferes with selection. Evolutionary dynamics of artificial community selection when chosen Adult communities are not mixed (top) or mixed (bottom) before reproduction. Top 10 communities were chosen for reproduction via pipetting so that Newborn total biomass and species composition stochastically fluctuated. Measurement noise was not considered. Top (replicate runs of Figure 3 G and H from our previous work 1 ): Adults were not mixed before reproduction, and each gave birth to 10 Newborns. Bottom: Adults were mixed, and the mixture was randomly split into 100 Newborns. In the third and fourth columns, the slope of each magenta curve is obtained from least square linear regression. A larger absolute value of slope reflects a stronger dependence of P (T ) on the determinant. Notice that under 30%-H spiking, the dependence of community function P (T ) on the nonheritable determinant φ M (0) is the weakest, while the dependence on the heritable determinant f P (0) is the strongest. This means that for given variations in φ M (0) and f P (0), the heritability of community function is the highest under 30%-H spiking. (c) Heritability of community function estimated from the slope of linear regression between parent and offspring community functions. The 100 parents are the 100 communities from Cycle 24 of the simulation displayed by the black curve in (a). Circles represent the mean, and error bars extend from 25% to 75% quantile. Each circle and error bar is calculated from~60 offspring communities. The landscape is similar to Figure 5b iii corresponding to a higher heritability, as demonstrated in (c).
Supplementary Figure 10: Spiking H at a wide range of percentages improves selection outcome when chosen Adults are reproduced through pipetting. Adults ranking top 10 in measured community functions (= true community function + a normal random variable with zero mean and standard deviation of~10% of the community function of Cycle 1) were chosen to reproduce. Each of the 10 chosen Adults reproduced 10 Newborns. The total biomass of a Newborn fluctuated around the target value (BM target = 100), of which the indicated percentage was "pipetted" from H monoculture while the rest was "pipetted" from the parent Adult. Consequently, Newborn total biomass and species composition fluctuated stochastically according to sampling error. Top panels show the dynamics of heritable determinant f P (0), the cost paid by M averaged first within and then among chosen Adults. Bottom panels show P (T ), the average community function of chosen Adults. Figure  The 3 sets of 100 Newborns mature into Adults, and their functions define the "Parent function". Each Adult then gives birth to 6 offspring Newborns under the respective spiking strategy, and when these mature, the median of 6 functions defines the "Offspring function". Cycle C: At the end of cycle C, we can scatter plot the function of each parent against the median function of its 6 offspring (the median is used because it is less sensitive to outliers, which might work better for small number of offspring per parent). The heritability is estimated from the slope of the least squares regression line. In parallel, the current spiking strategy continues until Cycle C when spiking strategy is updated. In this example, the updated current strategy is spiking strategy 3. As a result, there is a slight correlation between parent communities' φ M (0) and offspring communities' φ M (0). In scatter plots of (b-e), circles mark the mean and error bars extend from 25% to 75% quantile. In (b, d), each circle is averaged from~8 offspring communities. In (c, e), each circle is averaged from~18 offspring communities. Individual data points are plotted in gray dots. (d, e) Heritability is increased (larger slope) with 30%-H spiking. (f, g) Evolutionary dynamics. The black dashed lines mark the global maximal community function and the corresponding f * P . The orange and teal dashed lines mark the the maximal function and the corresponding f P along the orange and teal restrictor, respectively. Community function climbs faster along the teal restrictor (due to higher heritability) compared to along the orange restrictor, but the maximal community function achievable along the teal restrictor is lower than that along the orange restrictor. In g, community function and f P (0) climbed higher than teal dashed lines due to fluctuations in Newborn species composition: instead of being confined onto the restrictor, the communities can access a strip along the restrictor and can thus reach a slightly higher community function. This is similar to Figure  5b (ii) and (iii) where selected communities (green circles) are slightly away from the restrictor and achieve functions higher than those expected from the restrictor.  Figure 7, to ensure comparable Newborn compositions), and compare community functions at the respective steady state species compositions marked by dashed lines. For this mutualistic community, community selection without spiking is already efficient. As a result, spiking based on periodic heritability check results in only a slight improvement. However, the slight improvement is significant: The 6 community functions at steady state species compositions from the red curves (spiking with periodic heritability check) are consistently higher than all 6 from the black curves (no spiking), indicating a significant difference between the two distributions (Mann-Whitney U test, n 1 = n 2 = 6, p = 10 −3 , one-tailed). Note that the dynamics of 3 out of 6 simulations are plotted in a and b to avoid overcrowding.  Figure 12. (c) To compare selection outcomes, we focus on the highest-functioning communities at the endpoint of each simulation (Cycle 3000). We calculate community functions based on their heritable determinants (f P (0), g M max (0), K M R (0), etc) using Equations 1-5, 7 and 9-10 under different φ M (0) assuming BM (0) = 10 2 . We also calculated the steady states φ M from the same equations and plot them as dashed lines. The 3 red and 3 black curves are calculated from the 3 replica simulations in (a) and (b), respectively. Compared to communities evolved with no heritability checks or spiking (black curves), communities evolved with spiking guided by periodic heritability checks (red curves) perform better over a wide range of φ M (0), including near the steady state species compositions (dashed lines).
Supplementary Figure 22: Evolutionary dynamics of community selection with 5 or 3 candidate spiking strategies based on periodic heritability checks. Spiking mix consists of 10 evolved H clones or 10 evolved M clones. (a) is copied from Supplementary Figure 16a. (b) Compared to community selection with 5 candidate spiking strategies, community function increases at a slightly faster pace when there are only 3 candidate spiking strategies. Furthermore, selection with 3 candidate spiking strategies reduces the workload of heritability check by 2/5. Note that in (a), the last stretch of community function ascent (from 2000~2500) closely follows the switching of spiking strategy from 60%-H to 30%-H. The legends are the same as Supplementary Figure 12.
Supplementary Figure 23: Heritability of community function remains similar over two consecutive cycles. When heritability of community function is low, it does not change much over two cycles (top panels, no spiking). When heritability of community function is high, it can fluctuate over two cycles, but remains high (bottom panels, 30%-H spiking). Circles represent the mean, and error bars extend from 25% to 75% quantile. In top panels and bottom panels, each error bar is calculated from~60 and 100 offspring communities, respectively. The 2 plots in the left column are from Figure 5c ii and iii. Communities of Cycle C are reproduced from Adults with top-2 highest functions in Cycle C − 1. Heritability of community function is estimated from the slope of the linear regression depicted by the green dashed line.
Supplementary Figure 24: Spiking a species at too high a fraction can make a nonheritable determinant partially heritable. Compare 60%-H spiking (a) versus 30%-H spiking (b). (i) Species composition dynamics of two communities (green and mustard curves). Starting at Newborns immediately after spiking (crosses; parent Newborns), species compositions moves toward the attractor (blue dashed line). In 30%-but not 60%-H spiking, species composition reaches the blue attractor by the end of maturation. After reproducing with respective spiking percentages, average offspring Newborn compositions are plotted with circles. In 30%-H spiking, average offspring Newborns have the same species composition (green circle overlapping with mustard circle in b(i)). In contrast, in 60%-H spiking, average offspring Newborns have different species composition (green circle above mustard circle in a(i)). (ii) In 60%-H but not 30%-H spiking, the determinant φ M (0) displays partial heritability. (iii) Consequently, although 60%-H spiking strategy confers similar community function heritability as 30%-H spiking strategy (similar slopes of the red dashed lines in a(iii) and b(iii)), community function improved faster under 30%-H spiking strategy (Supplementary Figure 22b) than under 60%-H spiking strategy (Supplementary Figure 22a). This is because in 60%-H spiking, a portion of community function heritability originates from the misleading heritability created by a nonheritable determinant.