An engineering theory of evolution

Biological technologies are fundamentally unlike any other because biology evolves. Bioengineering therefore requires novel design methodologies with evolution at their core. Knowledge about evolution is currently applied to the design of biosystems ad hoc. Unless we have a unified engineering theory of evolution, we will neither be able to meet evolution’s potential as a design tool, nor understand or limit its unintended consequences on our designs. Our concept of the evotype offers a conceptual framework for engineering the evolutionary potential of biosystems. We show how a biosystem’s evolutionary properties might be rationally designed by engineering aspects of genetic variation, designed function, and natural selection. This idea could apply to all biosystems – from individual proteins to communities of whole-cells or even entire ecosystems – whether the goal is to direct evolution in the design process, or to limit its impacts during application. These principles could even be used beyond the realm of bioengineering to design entirely synthetic evolving auto-adaptive technologies.


Introduction
The past few decades have seen a revolution in our ability to engineer biology and create novel living systems 1 . Yet, several hurdles still hinder our ability to harness biology's full potential 2 . These predominantly stem from the fact that engineering the stuff of life is not the same as engineering its properties, because life evolves. Evolution makes engineering living systems a radically different challenge to engineering other mediums. To be effective, we cannot just apply traditional engineering design principles to biology and deal with evolution as a secondary thought. If nothing in biology makes sense except in the light of evolution 3 , then evolution must be a central part of an engineering theory of biology.
Evolution poses both a challenge and an opportunity when designing biosystems. On one hand it is a detrimental force that can unpick the meticulous plans of an engineer through mutation 4 . Designed biosystems cannot escape evolution when used and loss of function is a particular concern for engineers, especially as there are often selection pressures for it 5,6 .
Thus, it is essential that we are able to build evolutionarily robust biosystems that can continue to operate under unavoidable evolutionary forces.
On the other hand, evolution is an extremely effective problem solver, and engineers have exploited this fact for decades [7][8][9][10] . For example, directed evolution can be used to optimise or even generate completely novel traits in single proteins 11 or entire cells 12 .
However, these methods rely on the ability of evolution to find solutions in a reasonable length of time. For most systems, the search space is so vast that the starting point in this process must have the potential to generate useful phenotypes relatively quickly. Evolution may even be employed as a feature of the system during operation. For example, being used to develop adaptive systems that evolve in response to environmental cues, or even to create evolvable genetic circuits that can be designed with specific classes of phenotype that are reached as necessary through evolutionary change. To create such systems, it is critical that the biological design is evolvable, having the potential to generate desired phenotypes from a single starting point.
Even more critical, is our moral obligation to a deeper understanding of how synthetic biosystems will continue to evolve if deployed into our bodies or the wider environment 13 . The field has rightly made efforts to develop tools to reduce and mitigate evolution 14 , with failsafes such as kill switches 15 or metabolic dependencies 16 . However, without a good theoretical understanding of how synthetic biosystems might continue to evolve once deployed, we risk these technologies developing unexpected faults with dire, but avoidable, consequences.
Central to many of these issues is the view in traditional engineering disciplines that the engineered artefact is a final destination in the design process. This view breaks for biology. Instead, we believe that a new perspective is needed for a truly effective engineering 4 of biology; one that sees a designed biosystem as a starting point in a lineage of possibilities.
Whilst much of evolutionary biology has concerned itself looking backwards at an organism's history 17 , bioengineers must consider the future, and specifically how a biosystem will continue to evolve when used 18 . Here, we describe a framework to enable this transition offering a way to specify, test and conceive the properties of biosystems in terms of their evolutionary potential, and not just their phenotype (Figure 1). This provides a means to reimagine biological engineering in a way that works hand-in-hand with the ability for all life to evolve.

The design type and the evotype
To more fully harness the capabilities of biology, it is crucial to have a way of thinking about the evolutionary properties of engineered biosystems. We need to design not only the immediate functionalities of the system (i.e. its phenotypic traits), but also its potential for evolutionary change. Though these are properties of populations yet to exist, they can still be predicted for an individual biosystem. We consider the design type as the system that has been engineered consisting of a single genotype. We introduce the concept of the evotype to capture the evolutionary properties of a system. The evotype is the set of evolutionary dispositions of the design type, analogous to genotype and phenotype being sets of genes and traits, respectively. Unlike a trait, a disposition is not a directly observable property, rather it is a potential property of the system. For example, a protein may have the disposition of instability where its phenotype may change dramatically when mutated. Designing the dispositions of the evotype is a challenge fundamental to engineering biology.
For all but the very simplest biosystems, it is impractical to enumerate every potential evolutionary disposition, just as it is impossible to consider every trait of the phenotype.
Instead, an appropriate sample of the evotype must be used for the purpose at hand, just as samples of traits are used when describing the phenotype. How we take this sample, and thus the scope of the evotype covered, should be determined by knowledge of the design type, its intended function, and the context in which it will be used. This could include the size of population, environment, and number of generations over which the system must operate reliably.
The key properties of the evotype can be understood by describing a landscape surrounding the design type in sequence space that captures the interwoven roles of genetic variation and natural selection, as well as the mapping between sequences (i.e. genotypes) and their associated function. The bioengineer's goal is to sculpt this landscape to their specification and ensure the landscape of the evotype has a structure in line with their requirements. Sequence space is therefore not explored in a uniformly random way. Instead, the path evolution can take is determined by the variation operator set, which defines all the different point and algorithmic mutations that can occur in the system. Each variation operator in this set has an associated probability distribution that represents the likelihood of arriving at a given sequence from another (i.e. by this operator acting on the design type). The distributions of the variation operator set can be combined to produce the variation probability distribution, which describes the chance of arriving at a given sequence from the design type due to all the processes of genetic variation present in the system (Figure 1B, right). As a design type evolves, the variation probability distribution can be recalculated for each lineage in a population to understand further dispositions available to a system.
The variation operator set depends on the specifics of the biosystem being engineered, and the set to be used in practice is dependent on available knowledge of the system. For example, the variation operator set for a plasmid with many repeated parts may be said to include transition mutations, transversion mutations, and homologous recombination. A sample population can be generated by applying the operator set to the design type. This population, with the design type at its centre, may be named a quasispecies; as is used for the related concept in viral evolution 27 .
The variation probability distribution can be considered in all stages of the engineering process. Global and local mutation rates could be specified in a design and standardised mutation rates could be listed in part datasheets and it is likely that improvements to the prediction of mutation probabilities will be made with the increasing availability of sequence data and improvements in computational methods. Design rules for influencing the local genetic variability are already known (e.g. avoiding the use of repeated parts to reduce homologous recombination and avoiding repeat sequences to avoid sequence mutation) 28 , and global mutation rates can also be rationally engineered and manipulated 12,29 . Design 6 types could even be constructed with specific sets of variation operators. For example, making a system capable of recombination of certain genetic parts but not others. Highly specific recombination systems can also be directly built into the genome of a cell, such as the SCRaMbLE system found in the synthetic yeast Sc2.0 30 . In the future it may be possible to design and build systems capable of carefully specified algorithmic mutations by controlling combinations of biochemical mechanisms such as recombination, CRISPR mediated modifications, and methylation to name but a few. Finally, advances and increased accessibility and throughput of quantitative sequencing 31 will enable better characterisation of a system's genetic variation in detail.

Function
Each genotype maps to a phenotype. However, these are not necessarily distributed evenly throughout sequence space, and physical and biological constraints mean that not all conceivable phenotypes may be possible. This means the topology of the genotypephenotype map has an underlying structure, which influences evolvability 32 . When talking about phenotypes, we are really talking about traits or subsets of traits. As engineers, the trait we are interested in is the design type's functionthe behaviour or properties specified by the designer. Thus, we are interested in the genotype-function map of the region surrounding our design type. We may group functions into distinct classes, or work with a continuous function space ( Figure 1C). This could be a literal mathematical function like a logic function, a physical characteristic like colour or size, or a combination of several properties.
Any system has a degree of utilitythe extent to which the system fulfils a specified/desired function. The sole goal of a traditional engineering design process is to maximise the utility of the design type. However, the topology of the function landscape surrounding the design type, and thus of the utility landscape, is also important. functions are inaccessible to directed evolution. We might engineer as much redundancy into our system as possible, so that mutations are less likely to result in dramatic changes of function (i.e. a flat or gently sloping landscape). Alternatively, we may want to maximise variability, however, this may be constrained to favour certain classes of function. For example, it may be necessary to reduce irrelevant or harmful functions as much as possible (e.g. in a diagnostic application regions of function space that causes false negatives must be avoided, whilst false positives can be tolerated). Engineering this landscape is not trivial: it requires significant knowledge of the physical, biochemical, and organisational properties of the biosystem. However, awareness of this challenge will enable engineers to think beyond the utility of the individual and consider the performance of future lineages as a whole.

Selection
Selection is the force that gives the otherwise random (but constrained) processes of genetic variation a 'direction', driving a population up the slopes of the adaptive landscape 33 . Uniquely, an engineered biosystem is a result of two forms of selection: natural selection and the design process. Natural selection acts on reproductive fitness of the biosystem, and the design process can be thought of as a sophisticated form of artificial selection acting on its utility.
Understanding the interplay between these two processes is critical for good evotype design, since there is often a tension between utility and fitness ( Figure 1D). If the two are uncorrelated, then natural selection will always strive to undo the work of the engineer.
However, if fitness and utility are highly correlated, then natural selection will also drive up utility. It should be noted that natural selection here is meant as the process that acts on the reproductive ability of the biosystem. Neither the environment nor biosystem need to be natural (e.g. the organisms could be engineered to make use of non-canonical amino acids and grown within a bioreactor). The critical distinction is that natural selection acts on survival of the biosystem without the input of the engineer.
The aim of a bioengineer is to maximise fitneitydefined as a function that combines utility and fitness ( Figure 1D). Biosystems with either high utility but low fitness or high fitness but low utility will both have low fitneity. To maximise fitneity both fitness and utility must be considered. How these two are weighted, and the exact function that makes up fitneity is dependent on the use-case and the properties of the system to be engineered. In some cases, utility may be prioritised, in others, the system may only require some minimal level of functionality, as long as it can survive under high natural selection pressures.
Careful design could ease the tension between utility and fitness by reducing the selection pressures of a design, for example by reducing metabolic load. The interplay between utility and natural selection can also be controlled, for instance by correlating the two in a directed evolution experiment. Evotype engineering then, is to sculpt the fitneity landscape 8 of the design type. The design type itself should have maximal fitneity, but a robust design must also be in a region of sequence space with high fitneity. Similarly, an evolvable design ideally has a fitneity landscape with smooth slopes and single peak that can be climbed by natural selection.

Conclusion
Like the genotype and phenotype, the evotype is a further way to think about the properties of engineered biosystems and how they relate to each other ( Table 1). It is a framework for thinking about an important but often overlooked property: the role the biosystem itself plays in its future evolution. As engineered biosystems are the result of both human thought and natural adaptation, a holistic consideration of both the roles of design and evolution is necessary. The evotype helps us do this by explicitly considering the effects that variation, function, and selection will have on a design (Figure 1).
We can now design and build genotypes with great precision, but we must account for the inevitable processes of genetic variation that will follow. The statistical structure of variation is unique to the biosystem and something we have control over. Yet understanding the details of genetic variation is insufficient if we don't also understand how this will manifest in changes of the designed function of the biosystem. Even a system with low mutation rates can be evolutionarily unstable if function changes wildly with small sequence alterations. Directed evolution will not be successful, despite the mutation strategy, if desired functions are simply not accessible from the starting point. If the biosystem's utility (i.e. its success as a design) and its fitness (i.e. its success as a biological replicator) are at odds, well designed dispositions for variation or function might not save the design from the pressure of natural selection. This must also be understood as conflict between utility and fitness landscapes across sequence space surrounding the original design type. It is clear then that all three of the aspects of the evotypevariation, function, and selectionmust be considered together, and all offer significant scope for engineering. For instance, imagine a large genetic circuit that places an unavoidably high metabolic burden on the host cell. If it is crucial that the function of the circuit is maintained over long periods of time, then redundancy could be used to accommodate unavoidable mutations. However, if the dent to reproductive fitness is severe, this may still not be enough. Therefore, combining redundancy in the design with a hyper-stable host cell (e.g. one where all mobile genetic elements have been deleted and efficient DNA repair mechanisms are present 34 ) might be the only way to achieve the desired goal for the system.
Designing biosystems with evolution in mind is a step towards a more complete engineering theory of biology. However, to be practical, supporting tools must exist that can provide key information regarding the genetic variation, genotype-function mapping and selective pressures within a biosystem. Advances in sequencing offer a means to 9 quantitatively measure millions of genotypes in parallel 35 and when combined with techniques such as fluorescence activated cell sorting (FACS) make it possible to infer simplified genotype-function maps 36 . Even so, the vastness of evotype landscapes and the need for functions calculated from many outputs of a system mean that new methods with greater throughputs are also necessary, especially those able to measure many characteristics of each cell simultaneously (e.g. via automated high-content microscopy 37 or high-throughput Raman spectroscopy 38 ). Parallel to these experimental methods, a promising direction to bypass the need to directly measure these properties are the development of sufficiently comprehensive computational models (e.g. encompassing whole cells 39 ) to allow for a mechanistic understanding of the biases in processes related to variation and reproductive rate. In these cases, and if sufficiently accurate, the evotype could be predicted and used within computer aided-design workflows 40 to reduce the need to physically implement every possible design.
In addition to characterising evotypes, tools for bioengineers to directly sculpt their landscapes must also be available. (Figure 2). Here, we have touched upon the numerous ways that biomolecular components have been repurposed and genome engineering performed to alter the types of possible variation and selective pressures that are present.
However, the spectacular diversity of molecular machines dedicated to manipulating genetic information found in the natural world lends support to the idea that we will likely require a large library to allow for precise modifications we might like to shape evotype landscapes any way that we want.
An engineering theory of evolution is both a new way of looking at evolutionary theory, and a new way of thinking about what it is that engineers do, and what the design process is.
The concept of the evotype, with some modifications, may also find use in evolutionary science, where it offers a framework for considering the mechanistic constraints of evolution and a way of talking about the evolutionary characteristics of organisms. It may also be applied beyond biological engineering fields to create new auto-adaptive technologies. Here the framework could be applied to ask how we design technologies to evolve, and not just how to engineer systems that already do.

Author Contributions
Whilst T.E.G. and C.S.G. framed questions and provided direction, the core concepts and new terminology here are mostly the work of S.D.C., who also wrote the first draft. S.D.C. and T.E.G. developed the figures with input from C.S.G. T.E.G. and C.S.G edited the manuscript and supervised the work.    produce a colour), and the landscape may be smoothed (e.g. through removing crosstalk between features) and thus made amenable to evolutionary search. Selection (orange row):

Tables
If, as in the naive design, reproductive fitness (red dotted line) and utility (blue dashed line) are highly uncorrelated, then the design type may have a strong selection pressure acting against it, and regions where both fitness and utility are maximised may be rare or nonexistent, so high fitneity (grey solid line), may not be achievable. In a robust design, one might act to reduce the effects of natural selection through global increases in fitness (e.g. through reducing metabolic load of a genetic circuit), by reducing toxicity of gene products, or by smoothing either fitness or utility landscapes (e.g. through orthogonal parts). A naive design can be made more evolvable by closely correlating fitness and utility (e.g. through coupling function to survival). This means natural selection will act to drive up the utility of the design: the precise goal of a directed evolution experiment.