Automated design of synthetic microbial communities

Microbial species rarely exist in isolation. In naturally occurring microbial systems there is strong evidence for a positive relationship between species diversity and productivity of communities. The pervasiveness of these communities in nature highlights possible advantages for genetically engineered strains to exist in cocultures as well. Building synthetic microbial communities allows us to create distributed systems that mitigate issues often found in engineering a monoculture, especially as functional complexity increases. Here, we demonstrate a methodology for designing robust synthetic communities that include competition for nutrients, and use quorum sensing to control amensal bacteriocin interactions in a chemostat environment. We computationally explore all two- and three- strain systems, using Bayesian methods to perform model selection, and identify the most robust candidates for producing stable steady state communities. Our findings highlight important interaction motifs that provide stability, and identify requirements for selecting genetic parts and further tuning the community composition.

T raditionally, in biotechnology and synthetic biology, a microbe is engineered and grown as a monoculture to perform a particular function. Novel functionality is imparted by introducing heterologous genetic processes that would not normally be found in the organism. Non-orthogonal interactions between the introduced heterologous processes can cause the engineered function to behave in an unintended manner [1][2][3] , while the increased metabolic burden imposed can significantly slow growth rates and encourage selection of mutants 4 . Limited cellular resource availability and unforseen interactions can cause the host organism and the introduced circuits to behave differently when expressed alongside one another [5][6][7] . Using microbial communities would enable us to allocate functional components between subpopulations of cells, creating physical barriers that insulate processes from one another and distribute the burden of heterologous expression between members of the community 7 . This allows us to scale complexity in a manner that could not be achieved under the limitations of a monoculture. In natural environments, we observe mixed-species microbial communities that exhibit competitive advantages over monocultures in productivity, resource efficiency, metabolic complexity and resistance to invasion 8,9 . Being able to predictably and reproducibly construct microbial communities for synthetic biology or biotechnology applications would allow us to harness these advantages.
The maintenance and control of microbial communities comes with its own challenges. Competitive exclusion occurs when multiple populations compete for a single limiting resource (in the absence of other interactions); a single population with the highest fitness will drive the others to extinction 10 . Evidence from microbial ecology has shown us that stability can arise through feedback between subpopulations. Cooperative and competitive interactions are both important for integrating feedback that can stabilise communities by manipulating growth or fitness of the subpopulations [11][12][13][14][15][16] . Synthetic microbial communities have been built using quorum sensing (QS) systems to regulate processes that manipulate the growth rate or fitness of a population. Fitness can be manipulated by the expression of lysis proteins, metabolic enzymes, toxins and anti-microbial peptides (AMPs) 14,[17][18][19][20][21][22][23] . Here, we focus on the use of bacteriocins to manipulate subpopulation fitness. Bacteriocins are gene-encoded AMPs that can be used to directly suppress the growth rate of a sensitive population 24 . They are exported into the extracellular environment, and generally use "Trojan horse" strategies to enter and kill sensitive strains. Expression of immunity genes provides protection against the bacteriocin, and can be expressed separately or in conjunction with the bacteriocin 25 . A single expressed bacteriocin can impact the growth of multiple other strains in the system, as opposed to intracellular toxins which require all strains to be engineered. Bacteriocins also offer variable spectrums of sensitivity, enabling broad or narrow targeting of microbial species 24 . Previously, we have demonstrated the use of bacteriocin MccV to improve plasmid maintenance in a population 26 and for building stable cocultures that overcome competitive exclusion 66 . Other bacteriocins, such as nisin, have also been used to produce stable communities 23 . Predicting how a system will behave before implementation is essential for the efficient use of lab resources and fully understanding the interactions that occur 27 . System design by intuition alone becomes increasingly challenging when dealing with multilevel interactions. We can use model selection to compare a set of candidate models and identify the most promising designs 28 . We have previously performed model selection and parameterisation using Approximate Bayesian computation with sequential Monte Carlo sampling (ABC SMC) 29 to design robust genetic oscillators 30 and multistable genetic switches 31 . Similar approaches have been used to compare the ability of genetic parts to produce logic gate behaviours 32 and to design regulatory networks from databases of characterised parts 33,34 . Automated circuit design has the potential to greatly improve the engineering process in synthetic biology.
Here, we build upon computational circuit design in synthetic biology, presenting automated synthetic community design. Our workflow automatically generates candidate systems from a set of parts which can be used to engineer a community. We use ABC SMC to perform model selection, identifying candidate systems that have the highest probability of producing stable communities in a chemostat bioreactor. Using these methods we reveal the optimal designs for two-strain and three-strain systems. This workflow also allows us to derive fundamental design principles for building stable communities and reveals critical parameters to control the community composition.

Results
Automated synthetic microbial Community Designer (AutoCD) workflow. Figure 1 illustrates AutoCD, the workflow developed and applied in this study. First, we set the available parts which can be used to build a stabilising system in a chemostat environment. This consists of the number of strains (N), bacteriocins (B), and QS systems (A). Any QS system can regulate the expression of any bacteriocin in the system by induction or repression. Strains in all models are dependent upon a single nutrient resource (S), which is consumed by strains and replenished through dilution of the chemostat with fresh media. Importantly, all models therefore include nutrient-based competition between subpopulations. Uniform distributions are used to encode our prior knowledge of biochemical rate parameters informed by literature, describing each part and their interactions with one another ( Table 1). The priors used are broad to allow the full range of possible part characteristics; in scenarios where the parts have already been selected and characterised, the prior parameters can be constrained. The available parts and prior parameter distributions serve as inputs to the model space generator, which conducts a series of combinatorial steps to produce all possible genetic circuits. The model space generator then builds unique combinations of strains expressing different genetic circuits, where each Fig. 1 Overview of AutoCD pipeline. Model selection workflow begins with definition of available parts and prior parameter distributions used to generate system models from all the possible interactions. We use ABC SMC to perform model selection for the desired population behaviour. The outputs of ABC SMC provide us with community designs, insight into underlying motifs, parameter requirements and information on composition tunability.
combination is a candidate model. Filtering steps remove unviable, redundant and mirror systems, yielding a set of unique candidates to be assessed. The model space generator produces an ordinary differential equation (ODE) model for each system in the context of the chemostat environment, and these models form our prior model space (for details, see the "Methods" section).
The final input is a mathematical description of the objective population behaviour, a stable steady state. We use three distance functions (d 1 , d 2 , d 3 ) to describe how far away a simulation is from the objective stable steady state (Eq. (1)). d 1 is the final gradient of a strain population (N x ), capturing the most fundamental characteristic of stable steady state, where the population level of a strain is unchanging. d 2 is the standard deviation of a population, quantifying unstable behaviours such as oscillations, favouring simulations that reach stable steady state quickly. d 3 is the reciprocal of the strain population at the end of the simulation, allowing us to define a minimum population density. Given the three distances, ϵ F defines thresholds below which a simulation meets the requirements of our stable steady state objective. The distances of all strain populations in a simulation must be below these thresholds to satisfy the objective behaviour. ϵ F 1 was chosen to match the error tolerance of the ODE solver and ϵ F 2 threshold was chosen through qualitative assessment of simulation data to define a practical threshold for what stable steady state simulations should look like. ϵ F 3 is set to ensure all populations have a minimum final OD of 0.001, chosen for what could be realistically measured using flow cytometry. The posterior distribution is made up of simulations where the distances for each strain population are less than the ϵ F thresholds (Eq. (2)). ABC SMC performs model selection on the model space for the objective defined by these distance functions and ϵ F . A particle is a sampled model and associated parameters. ABC SMC initially samples particles from the prior distributions with an unbounded distance threshold. Particles are propagated through intermediate distributions, gradually reducing the distance thresholds until they equal ϵ F (see the "Methods" section). ABC SMC provides an estimation of model and parameter space posterior probabilities for the given prior distributions and the objective behaviour. We can use the outputs of ABC SMC to help us design synthetic communities and chemostat settings in the lab.
Distance functions: Distance thresholds: Designing two-strain cocultures that achieve steady state. Here we apply AutoCD to the design of a stable steady state coculture containing two strains. In Fig. 2 we define a model space consisting of two strains (N 1 , N 2 ), two bacteriocins (B 1 , B 2 ) and two QS systems (A 1 , A 2 ). We set model space limits to enable feasible experimental implementation, allowing expression of up to one QS per strain and expression of up to one bacteriocin per strain. Each strain can be sensitive to up to one bacteriocin. Given these conditions, the model space generator yields 69 unique two-strain models (m 0 ,m 1 ...m 68 ). These 69 models serve as a uniform prior model space upon which we perform model selection using ABC SMC (see Supplementary Fig. 4 for visualisation of each candidate model). From the available genetic parts, there are 17 possible interaction options that could exist between state variables in each candidate model. We perform hierarchical clustering on the interactions present in each model, grouping models based on the similarity of their interactions. This clustering is visualised as a dendrogram in Fig. 2a. ABC SMC approximates the posterior probability of each model for the stable steady state objective, indicating how effective the candidate system is in producing a stable steady state. m 62 has the highest posterior probability, and is therefore the system which most robustly produces stable steady state (Fig. 2a). m 62 consists of two strains exhibiting a crossprotection mutualism relationship 35 . Each strain expresses an orthogonal QS molecule that represses the expression of a selflimiting (SL) bacteriocin in the opposing strain (Fig. 2b). In the Table 1 Prior distributions for both two and three strain systems. Constant parameters have the same min and max value. K A y B z , K ω , K A y and KB max z are sampled from log uniform distributions. The remaining parameters are sampled from uniform distributions. ARTICLE absence of the opposing strain, the SL bacteriocin is expressed freely. This creates an interdependence between the two strains where the extinction of one strain would result in the extinction of the other. This closed feedback loop is a feature of the topology of m 62 , overcoming the competitive exclusion principle. When designing new systems, minimising the number of genetic parts will reduce the number of experimental variables, improving the ease of construction and optimisation of a system. We subset the model space by the number of expressed parts in the system (maximum two QS and two bacteriocin), yielding subsets containing candidate models with two, three and four expressed parts (low complexity to high complexity). We identify the candidates with the highest posterior probability in each subset (Fig. 2b). The posterior probability increases despite the larger parameter spaces, which is important because ABC SMC will naturally favour models which yield stable steady state with the smallest possible number of parameters (Occam's razor) 36 . We see that all three models have SL motifs, where a strain is sensitive to the bacteriocin it produces. All three models are devoid of other-limiting (OL) motifs, where a strain is sensitive to a bacteriocin produced by another strain.
The Bayes factor (BF) is a ratio between the marginal likelihoods of two models, giving a quantification of support for one model compared with another. BF > 3.0 indicates evidence of a notable difference between the two models, while BF < 3.0 suggests insubstantial evidence 37 ( Table 2). The BF of m 66 compared with m 48 suggests substantial improvement in the posterior probability can be made by increasing complexity. However, the BF of m 48 compared with m 62 suggests insubstantial evidence behind this improvement in posterior probability (Fig. 2b). These diminishing returns when increasing system complexity hold important ramifications for system design. The introduction of an additional QS part to move from m 48 to m 62 may not be worthwhile for the minor improvement in steady state robustness.  Model selection has identified the best performing designs for producing stable communities. However, the parts used in the design may require specific characteristics or chemostat settings. ABC SMC also produces posterior parameter distributions for each model, giving us information about the parameter values necessary to yield stable steady state. Figure 2c shows the posterior distributions of several tunable parameters in m 66 and m 62 . The dilution rate of the chemostat (D) is a directly tunable parameter and the maximal expression rate of the bacteriocin (KB max ) can be tuned through choice of promoter and ribosomebinding site 38 . The growth rates (μ max ) can be tuned through choice of base strains or auxotrophic dependencies 39,40 .
For m 66 , the correlation coefficients between strain maximal growth rates (μ max1 and μ max2 ) shows the parameters are loosely correlated. Additionally, we see that N 1 requires a higher maximal growth rate (μ max1 ) than that of N 2 (μ max2 ). The faster maximal growth rate of N 1 is necessary to counteract self-limitation that is negatively regulated by the population of N 2 . Conversely, m 62 shows a wider distribution of strain growth rates at stable steady states and a low correlation coefficient. This indicates that this topology does not heavily depend on specific growth rates or related growth rates between the two strains in order to produce a stable steady state. KB max for all bacteriocins is tightly constrained to high maximal bacteriocin expression rates. The distributions of D in both systems show a lower dilution rate is important for stable steady state. The steady state compositions for m 66 frequently contain N 1 in high proportion compared with N 2 , whereas m 62 will commonly yield compositions with more even representation of N 1 and N 2 at steady state ( Supplementary Fig. 2).
Self-limiting motifs stabilise two strain systems. The dendrogram of Fig. 2a highlights a cluster of high performing models that are closely related. This suggests underlying interactions of the model space exist that are important for producing communities with stable steady state.
Non-negative matrix factorisation (NMF) is an unsupervised machine learning method we can use to reduce the dimensionality of the interaction space 41 . We can use NMF to help us understand the underlying motifs and how they affect community stability. We represent each model by the interactions present in the system (Fig. 2a). NMF takes these interactions and learns a number of clusters (K), models can be rebuilt by a weighted sum of these clusters. In our case, these clusters can be represented as interaction motifs. We set K = 4, in order to give us a digestible summary of the model space. Figure 3a shows the learned motifs that can be used to represent the entire model space.  shows the component weights for each model, defining the membership each model has for each motif. The models are shown in descending order of posterior probability, we can see that K1 is heavily weighted in the top performing models. The motif K1 refers to SL only interactions where the strain is sensitive to the bacteriocin it produces (Fig. 3a, b). The top models are consistently assigned low weights for K4 (Fig. 3a, b), a motif which refers to OL only interactions, where the strain is sensitive to a bacteriocin produced by the other strain (Fig. 3a).
We use the indications produced by NMF to curate our own discrete motifs, improving the ease of interpretation. K1 and K4 show us the direction of bacteriocin sensitivity is an important feature and we proceed to investigate this further. All models can be built by combining eight fundamental motifs which can be categorised as either SL or OL, based on the direction of bacteriocin sensitivity (Fig. 3c). Within each category, motifs are differentiated by the mode of bacteriocin regulation (Fig. 3c). For example, m 66 = SL 2 , m 48 = SL 4 + SL 2 and m 62 = SL 2 + SL 2 . In order to assess the importance of each motif for producing stable communities we perform a motif impact analysis. For each model we identify the nearest neighbours in the model space that can be built by adding each motif and calculate the change in posterior probability for each neighbour (Fig. 3d). By repeating this across the entire model space, we are able to quantify whether a motif is stabilising or destabilising (Fig. 3e). The lower quartiles of SL motifs all show lower negative change magnitudes compared with the lower quartiles of OL motifs. The upper quartiles of SL motifs show a higher positive change magnitude than that of OL motifs. Together these show the addition of SL motifs more often result in an improved posterior probability, whereas addition of OL motifs more often result in decreased posterior probability. The upper quartile of SL 2 shows the motif has the most stabilising effect, closely followed by SL 4 . We see these findings are reflected by top models identified in Fig. 2b, where all models are constructed with SL 2 and SL 4 motifs.
The total output of bacteriocin by a population is a function of the population's density. All SL motifs therefore possess a fundamental negative feedback relationship between growth rate and density, augmented by the mode of QS regulation. Conversely, the population density and growth rate of a strain in OL motifs are decoupled. This lack of feedback is a clear explanation as to why we see SL motifs as positive contributors to stability while OL motifs have a destabilising effect. By comparing the posterior probabilities of m 62 and Supplementary Fig. 4, we show that while self-limitation interactions are important for viability, interdependence between the strains is necessary to produce the most robust design (Supplementary Fig. 3).
Designing three strain communities that achieve steady state.
While several studies have demonstrated the ability to establish synthetic two-strain systems 19,22,[42][43][44][45][46][47][48][49][50][51] , efforts with three strains are sparser 23,52,53 . Having demonstrated the automated design of two-strain systems, we next tackle the far larger challenge of designing stable three-strain communities. The addition of a single strain significantly increases the parameter space, engineering options and possible interactions. We define our available parts consisting of three strains (N 1 , N 2 , N 3 ), three bacteriocins (B 1 , B 2 , B 3 ) and two orthogonal QS systems (A 1 , A 2 ). We maintain the same strain engineering restrictions, allowing up to one QS expression and up to one bacteriocin expression per strain. Each strain can be sensitive to up to one bacteriocin. Given the available parts and engineering limits, the model space generator yields 4182 unique models (see Supplementary Fig. 5 for visualisation of each candidate model). Due to the much greater number of models, we group models based upon the interactions in each model by hierarchical clustering for up to five levels. The average posterior probabilities of each cluster are shown (Fig. 4a). 3289 models have a posterior probability of zero, highlighting how much more difficult this design scenario is. ABC SMC identifies m 4119 as the system with the highest posterior probability for producing stable steady state. m 4119 consists of two QS molecules; A 1 is produced by N 2 , A 2 is produced by N 3 (Fig. 4b).
The QS molecules repress the expression of SL bacteriocins produced by each population. Using the minimal motifs defined in Fig. 3c, m 4119 can be summarised as m 4119 = 3 × SL 2 . We group the model space on the counts of heterologous expression in the system, yielding subsets containing candidate models with three, four, five and six expressed parts (Fig. 4b). Models with two heterologously expressed parts all had a posterior probability of 0.0 and are not shown. Again, we see a diminishing increase in posterior probability that comes with increasing complexity. m 3938 is the more complicated neighbour of m 4119 , where N 1 is also contributing with production of A 1 , resulting in a fall in the posterior probability. The increase in posterior probability that occurs when moving from m 4125 to m 4119 has BF < 3.0, indicating the difference between the posterior probability of the two models is not substantial. These system comparisons highlight the tradeoff between increasing complexity and improving system performance. In a similar fashion to the two-strain model space, the top performing models are dominated by SL only interactions ( Supplementary Fig. 1).
Multiple engineered bacteriocins are more important than multiple orthogonal QS systems. Our results have identified top performing models in the two-strain and three-strain model spaces. We have also highlighted the diminishing returns that occur with increasing model complexity in top performing models. Next we aim to summarise the importance of different parts and their contribution to the stable steady state objective behaviour, further enabling us to triage genetic parts for construction in the lab. Figure 5 shows a summary of the parts used to construct threestrain systems and the average posterior probabilities they yield. This gives us important information to form heuristic rules in the design of three-strain systems. Figure 5a shows a very similar posterior probability when comparing two QS systems rather than one. Figure 5b demonstrates the substantial advantage of repressive QS regulation of bacteriocin production over inducible systems. Figure 5c shows very strong evidence in favour of using three bacteriocins to produce stable steady state in three-strain systems. These three statistics suggest that on average there is little advantage to be gained in the use of two QS systems, and priority should be given to the use of a single repressive QS to regulate three bacteriocin systems, such as we see in m 4125 .
Defining stable steady state population ratios in three-strain systems. Natural microbial communities are observed to contain species in abundances differing over orders of magnitude 54,55 . Together the individual species can contribute to an aggregate community function 56,57 . Synthetic communities can take advantage of aggregate community output by their application to improving yields and efficiency of bioproduction pathways via the distribution of genetic processes between subpopulations 43,50 . Biosynthesis studies using cocultures have shown the importance of optimising inoculation ratios to maximise community outputs 58,59 . Therefore being able to define the steady state composition of a synthetic community is a valuable feature. Here we demonstrate that a form of post-processing can be applied to the output of ABC SMC by applying a secondary threshold, identifying key parameters that enable fine tuning of stable steady state population densities.
The ϵ F 3 threshold value ensures all simulations in the final population have an OD > 0.001. Figure 6a shows the community composition distribution of m 4119 . The majority of accepted particles show a final community composition that is dominated by a single strain. Using the final population distances from ABC SMC we can apply a secondary threshold and identify how the system can be tuned to produce a more evenly distributed community composition. We set a secondary threshold, stipulating that all strains must be of OD > 0.1 (pink) (Fig. 6b). Therefore strains that do not meet the secondary threshold have 0.001 < OD < 0.1 (blue) (Fig. 6b). From these two subsets we generate separate parameter distributions and calculate the divergence using Kolmogorov-Smirnov (KS). Parameter distributions that show the greatest divergence are important for changing the system behaviour from one that is dominated by a single strain, to one that has a more even distribution of strain densities. The distributions of four parameters that exhibit greatest divergence are shown in Fig. 6c. A higher dilution rate (D) and lower maximal bacteriocin expression rates (K B max 1 , K B max 2 , K B max 3 ) are associated with producing a more evenly distributed community composition. Importantly, all three parameters are realistically tunable. The dilution rate can be controlled directly through the chemostat device, while bacteriocin expression rates can be changed through the choice of promoters and ribosome-binding sites.

Discussion
Synthetic communities built to date have employed the use of QS, metabolic dependencies, intracellular lysis proteins, toxins and extracellular AMPs to engineer interactions that enable community formation 23,51,52 . When designing a synthetic community, the fundamental interactions in the system itself is often directed by mimicking ecological interactions found in nature, or by rational judgement. As the possible types of engineered interaction increases, so does the need for comprehensive assessment of the vast model spaces. The modelling and statistical framework demonstrated here addresses this design problem. With our examples we have highlighted important design features and heuristic rules for building synthetic steady state communities. As we move to increasingly complex multi-strain systems, bottom- c Comparing systems with one, two and three bacteriocin. NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-20756-2 ARTICLE up approaches have shown that understanding pair-wise interactions can be used to build up to larger stable communities 60 .
We have identified optimal system designs using bacteriocins and QS for stable steady state in two-strain and three-strain communities. m 62 , the top model of the two-strain model space uses a cross-protection mutualism, whereby the density of each subpopulation inhibits the self-limitation of the other. Similarly, in the three-strain model space m 4119 has pairwise cross-protection mutualism between two subpopulations and a dependent subpopulation (Fig. 4b). Cross-protection mutualism has previously been incorporated in synthetic microbial communities via the mutual degradation of externally supplied antibiotics 46 . Metabolic interdependencies can also be employed to engineer mutualism 47,48 . All top performing models used SL interactions to produce stable steady state dynamics. Self-limitation is observed in many natural biological communities, normally in a response to stress 61,62 . These processes, while detrimental to the individual, provide a net benefit to the community through release of a public good-they are altruistic processes 63 . Altruistic cell death is conserved throughout different species implying a competitive advantage in natural environments 64 . SL interactions have previously been used to overcome competitive exclusion by employing lysis proteins regulated by QS in a two-strain culture 51 . The inducible expression of SL bacteriocins under tightly controlled promoters has also been demonstrated 65 . Additionally, in our recent work we have demonstrated the use of bacteriocins to stabilise communities 66 . Random sampling or encapsulation of microbial networks has been demonstrated experimentally in both ecological and synthetic contexts 67,68 . These high throughput approaches could be used to validate our findings, combining differentially engineered strains with one another to give a view of strain combinations that form stable communities.
The robustness of SL interactions can be explained by the feedback loops involved. Total bacteriocin output by a subpopulation is heavily dependent upon its population density; low population density will naturally have a low output of bacteriocin 69 , making QS a secondary level of regulation. This is supported by both two-strain and three-strain scenarios where we observe the diminishing returns that come with increasing complexity. Figure 5 shows that increasing the number of bacteriocins in a system yields greater increases in stability than increasing the number of QS systems.
A closed feedback loop exists between the bacteriocin expression rate and the population density, an important reason why we see all SL motifs generally show positive contribution to stability. Conversely, in OL motifs the population expressing the bacteriocin will not be negatively affected and therefore a closed feedback loop does not exist.
Ecological studies using generalised Lotka-Volterra approaches frequently show that negative, intraspecific interactions are of central importance to the stability of ecological networks [70][71][72] . In our models, SL interactions, dilution rate and limited nutrients are all analogous to negative, intraspecific, density-dependent interactions described at a more detailed level; particularly regarding time delays and accumulation of bacteriocin or QS molecules that may occur. Our results align with previous findings and provide insight into the relative importance of different types of interactions in a synthetic biology context. Additionally, studies have previously shown that higher connectance in mutualistic ecological networks promotes persistence and resilience 73 . All our top performing models contain forms of mutualism; in these models we also see a trend of increasing robustness with complexity which is analagous to connectance (Figs. 2b and 4b).
Studies have traditionally used eigenvalue analysis to investigate the stability properties of random interaction ecological networks 71,73,74 . Similar approaches could be applied to the synthetic community model spaces shown here. The Bayesian approach and time series analysis used here allows us to select for defined temporal characteristics of transient behaviour that represent a definition of a stable system that is achievable experimentally. In principle, eigenvalues could also be included within a distance measure of asymptotic local stability. However, we found they did not improve the classification of behaviour in these models. Finally, we showed that the posterior parameter distribution from ABC SMC can be used to make decisions on part characteristics and experimental conditions (Figs. 2c and 6c). Our results show the dilution rate (D) is an important experimental parameter for producing stable steady state, and tuning the community composition. The rate of removal of molecules from the environment can produce very different population dynamics. This is supported by previous work where the dilution rate has been demonstrated to be important for determining the population dynamics 6,22,46 . We also show our methodology can identify systems that are robust to differences in growth rate, highlighted by the comparison of m 66 and m 62 in Fig. 2c. Together these draw attention to important part characteristics that should be considered when constructing a stable community. It should be emphasised that while the design rules we have identified hold true for a stable steady state objective, it may not be the case for other objective population dynamics, such as oscillations. New objectives can be investigated by changing the distance functions which describe the population dynamics.
The framework we have developed offers a natural entry point to the design-build-test cycle, providing a data informed roadmap for building a robust synthetic community with a desired behaviour. We have revealed stable steady state systems in a two-strain and three-strain model space, and generated impactful rules and heuristics for their construction. The flexibility of this framework enables us to quickly redefine population level behaviours depending on the required application.

Methods
Model space generator. Models are generated from a set of parts, which are expressed by different strains in the system. We represent an expression configuration through a set of options. We define the options for expression of A in each strain, where the options are not expressed, expression of A 1 , and expression of A 2 (0, 1 and 2). We define the options for expression of bacteriocin, which for the twostrain model space includes no expression, expression of B 1 or expression of B 2 (0, 1, and 2). For the three-strain model space, this includes includes no expression, expression of B 1 , expression of B 2 or expression of B 3 (0, 1, 2 and 3, respectively). Lastly we define the mode of regulation, R, for the bacteriocin, which can be either induced or repressed (0 and 1). This is redundant if a bacteriocin is not expressed. Two strain: Three strain: This enables us to build possible part combinations that can be expressed by a population. Let P C be a family of sets, where each set is a unique combination of parts: Each strain in a system can be sensitive to up to one bacteriocin. Let I represent the options for strain sensitivity. In the two-strain model space, the options are insensitive, sensitive to B 1 or sensitive to B 2 (0, 1 and 2, respectively). In the threestrain model space, the options are insensitive, sensitive to B 1 , sensitive to B 2 or sensitive to B 3 (0, 1, 2 and 3, respectively). Two strain: I ¼ f0; 1; 2g Three strain: Each strain is defined by its sensitivities and expression of parts. Let P E be all unique engineered strains: which can be combined to form a model yielding unique combinations in two strains and three strains: Two strain: Finally, we use a series of rules to remove redundant models. A system is removed if: 1. Two or more strains are identical, concerning bacteriocin sensitivity and combination of expressed parts. 2. The QS regulating a bacteriocin is not expressed by a strain. 3. A strain is sensitive to a bacteriocin that is not expressed by a strain. 4. A bacteriocin is expressed that no strain is sensitive to. This cleanup yields the options which are used to generate ODE equations for system. System equations. State variables in each system are rescaled to improve speed of obtaining numerical approximations: Each model is represented as sets defining the system: N ¼ f1; 2:::xg ð 6Þ B ¼ f1; 2:::zg ð 7Þ A ¼ f1; 2:::yg ð 8Þ The system is represented as differential equations: Growth is modelled by Monod's equation for nutrient limited growth: Killing by bacteriocin is modelled via a Hill function, where ω max ¼ 0 if strain is insensitive: Induction or repression of bacteriocin expression by QS, A y : Simulations were conducted for 1000 h, the final 100 h were used to calculate the summary statistics and were stopped early if the population of any strain fell below 1e−10 (extinction event). Simulations with an extinction event have distances set to maximum in order to prevent excessive time spent simulating collapsed populations.
Given an objective of x 0 , where x 0 exists in the solution space, x 0 2 D. We define the likelihood function for the objective behaviour as f(x 0 |θ). Bayes' theorem gives us the posterior distribution of θ that exists for the objective x 0 : We can rewrite π(x 0 ) where a and b represent the lower and upper bounds of the parameter value: The posterior distribution informs us of the parameter distribution that gives rise to the objective: Let m be a model from a vector of competing models, M, such that m ∈ M = {m 0 , m 2 ,...,m n }. Each model has its own parameter space, allowing us to define a joint space, (m, θ) ∈ M × Θ. We can write Bayes' theorem in the context of a model space:

ARTICLE
Since the M is discrete, we can rewrite this as: The marginal likelihood of the model, f(x 0 |m), is the expectation of the likelihood function taken over the model parameter prior distribution. It measures a modelʼs fit: Approximate Bayesian computation. Writing the likelihood function, f (x 0 |θ), in terms of summary statistics can be difficult. We bypass this and approximate the posterior by generating data from a model. We can sample a parameter vector from the prior, θ *~π (θ), which is simulated to yield a data vector, x * . This can be written as a conditional, x *~f (x|θ * ), which also gives the joint density, π(θ,x). In order to obtain the posterior distribution that satisfies our objective behaviour, x 0 , we apply a conditional to define whether a generated data vector, x * , belongs to the objective x 0 .
Else πðθjx; x 0 Þ ¼ 0 ð24Þ Let ρ(x, x 0 ) be a distance function that compares a simulation to the objective. Using distance threshold, ϵ, we can define values below which the distance is acceptably small. We can redefine π(θ|x, x 0 ) in the context of thresholds to obtain an approximation of the posterior. If ρ(x, x 0 ) < ϵ π ϵ ðθjx; x 0 Þ ¼ πðθÞf ðxjθÞ πðθÞf ðxjθÞdxdθ ð25Þ Else π ϵ ðθjx; x 0 Þ ¼ 0 ð26Þ The smaller ϵ is and the larger the number of simulations conducted, the more accurate the representation of the true posterior will be. We can write this marginal posterior distribution as: Model selection with ABC SMC. In this paper, we use a variant of ABC, ABC Sequential Monte Carlo (ABC SMC) 36 . Particles are sampled from the prior distributions. Each particle represents a sampled model and sampled parameters for that model. ABC SMC evolves particles sampled from the prior distribution through a series of intermediate distributions and perturbations. Importance weighting is used to define their sample probability for the next distribution. The distance threshold, ϵ, is decreased between distributions, moving the acceptance criteria closer to the objective. These features aim to improve the acceptance rate of particles while maintaining a good approximation of the posterior distribution (see Supplementary Algorithm 1 for more details).
Bayes factor. The BF can be used to help us interpret how much better (or worse) one model is than the other. Given two models, m 1 and m 2 , the BF is calculated as P(m i ) is the prior, and P(m i |x) is the posterior probability. Given uniform priors, P (m i ) = 1/M, where M is the number of models. Therefore we can simplify to: The BF is a measure of the support for m 1 relative to m 2 . It accounts for the number of parameters, or complexity of the two models. The BF allows us to directly compare the weight of evidence for and against the two models and has the advantage that it can be used to compare non-nested models. Two BFs can be compared directly, since they both represent evidence in favour of the hypothesis 36,37 . We therefore use BFs to directly compare the ability of two models to represent the objective population behaviour. Table 2 allows us to interpret BF.
Software packages and simulation settings. ABC SMC model selection algorithm was written in python using Numpy 75 , Pandas and Scipy 76 . ODE simulations were conducted in C++ with a Rosenbrock 4 stepper from the Boost library 77 . All simulations use an absolute error tolerance of 1e−9, and relative error tolerance of 1e−4. NMF was conducted using Scikit-learn 78 . Dendrograms were made from SciPy, using the unweighted pair group method with arithmetic mean (UPGMA) clustering algorithm 76 . Ternary diagrams were made using python package python-ternary 79 . Parameter distribution plots were made in R using ggplot2 80 .
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The data generated and used to create figures can be found at https://doi.org/10.5281/ zenodo.4286040. Any other relevant data can be obtained from the authors upon reasonable request.

Code availability
AutoCD code repository can be found at https://github.com/ucl-cssb/AutoCD/ 88 . The repository includes configuration files for the two-and three-strain experiments conducted in this study. All code to recreate figures can be found at https://doi.org/ 10.5281/zenodo.4286040.