In vivo continuous evolution of genes and pathways in yeast

Directed evolution remains a powerful, highly generalizable approach for improving the performance of biological systems. However, implementations in eukaryotes rely either on in vitro diversity generation or limited mutational capacities. Here we synthetically optimize the retrotransposon Ty1 to enable in vivo generation of mutant libraries up to 1.6 × 107 l−1 per round, which is the highest of any in vivo mutational generation approach in yeast. We demonstrate this approach by using in vivo-generated libraries to evolve single enzymes, global transcriptional regulators and multi-gene pathways. When coupled to growth selection, this approach enables in vivo continuous evolution (ICE) of genes and pathways. Through a head-to-head comparison, we find that ICE libraries yield higher-performing variants faster than error-prone PCR-derived libraries. Finally, we demonstrate transferability of ICE to divergent yeasts, including Kluyveromyces lactis and alternative S. cerevisiae strains. Collectively, this work establishes a generic platform for rapid eukaryotic-directed evolution across an array of target cargo.

Supplementary Tables S1 to S14 Supplementary   Cargo integrated into genome, followed by cell outgrowth.  Two independent libraries generated through epPCR or oligo synthesis, followed by cloning into E. coli and transformation into yeast. These two in vivo libraries are then combined through mating. i a) Fig. 2b.

EP-PCR
b) This number was calculated using PEDEL 6 with a "Library size" of 10 10 x (number of total mutants generated) and a "mean number of point mutations per sequence" of 0.15 x (cargo length). It is important to note that the number of distinct amino acid variants will be much lower than these values for random mutant libraries, ranging from 20-50%, depending on the nucleotide sequence being mutated 7 . For libraries incorporating synthetic oligonucleotides, such as the "site-directed" methods above, this efficiency can approach 100%. c) Number of total mutants generated for a 5kb cargo has been calculated using a library size of 9.49*10 -3 per cell (Supplementary Figure 2a). d) Genome-wide mutagenesis occurs during continuous mutagenesis/selection as Ty1 integrates into alternate loci in the genome.
e) This is the maximum yeast plasmid transformation efficiency 8 .
f) Consider M cells being transformed with a DNA library of size L. The probability that a particular member of L is delivered to a particular member of M is 1/L. The probability that this member of L is not delivered to this member of M is 1-(1/L). The probability that this member of L is not delivered to any member of M is (1-(1/L)) M . Thus, the probability that this member of L is delivered to at least one member of M is 1-(1-(1/L)) M . Summing over all members of L, the expected value of the number of members of L which were delivered at least once to the cells in M is given by L(1-(1-(1/L)) M ). This calculation assumes that the DNA molecules composing L are numerous enough that successful transformation events do not significantly change the proportion of each member in L. This calculation also assumes that each member of L is at the same proportion. This is not true for error-prone-PCR-derived libraries since sequences with fewer mutations likely outnumber those with larger numbers of mutations. This bias will reduce the number of distinct mutants generated beyond the simple formula presented here.
g) For error-prone PCR, a 10 -2 transformation efficiency implies 10 8 library-containing cells, so M for these cells is 10 8 .
h) 1.2*10 7 is the maximum number of distinct variants of a 100bp sequence which can be generated through error-prone PCR using a 16 kb -1 mutation rate, and so for this cell L was assumed to be 1.2*10 7 .
i) Most methods for diversity generation have been benchmarked for cell populations ~10 8 in size. Effort has been scaled accordingly to bring these methods into line with the scale of ICE. j) For each template size we have assumed a mutation rate of 16 kb -1 to generate an upper bound for the number of distinct mutants attainable by error-prone PCR. Although the required library size L is different for every experiment, and has been reported exceeding 10 12 for E. coli 9 , we have chosen 10 9 here as this is the minimum value enabling ~95% of transformed yeast cells to contain unique library members. Percentages approaching 100% are attained with severely diminishing returns. k) Authors note a requirement for the mutated gene to be expressed from cytoplasmic RNAP with low expression capacity.
l) Consider M cells which already contain a library of size L1 (such that the number of cells which contain each member of L1 is M/L1), and which are being transformed with a different library of size L2. The probability that a particular member of L2 is delivered to a particular member of M is 1/L2. The probability that this member of L2 is not delivered to this member of M is 1-(1/L2). The probability that this member of L2 is not delivered to any member of M containing a particular member of L1 is (1-(1/L2)) M/L1 . Thus, the probability that this member of L2 is delivered to at least member of M containing a particular member of L1 is 1-(1-(1/L2)) M/L1 . Summing over all members of L1 and L2, the expected value of the number of unique L1-L2 combinations is L1*L2*(1-(1-(1/L2)) M/L1 ). This represents the best-case scenario of no biases in library representation among L1 or L2 libraries. The general form of this expression is (sum(i=1 to L1) of L2*(1-(1-(1/L2)) Mi ) where Mi is # cells transformed with each member of L1. The number of distinct mutants generated will be lower if, for example, there are a large population of WT cells in L1. m) In our opinion, the major benefit of these approaches over methods such as error-prone PCR for random mutagenesis of genes and pathways is the ability to explore combinations of two distinct libraries, so that's the mode that is highlighted here. YOGE and CRISPR-Cas9 can also be used for generation of mutations throughout the yeast genome, but as this is a different goal than that of ICE, it is not considered in detail here.
n) This is the effort required to generate one library from 10 10 total cells. Additional effort is required to combine multiple libraries.
o) "Number of total mutants generated" is a combination of 1% electoporation survival rate and 2% recombination frequency in survivors 3 . We have chosen two libraries of size 4400, as these are the minimum values enabling ~95% of transformed yeast cells to contain unique library members. To accomplish this task using a single library, the library would need to contain 1.9*10 7 unique members in order for 1.9*10 6 to make it into yeast.
p) 90bp dsoligo repair template q) 1.4kb repair template r) Plasmid repair template, yeast plasmid transformation frequency s) Library size for plasmid repair template assumes pre-transformation of Cas9 and repair plasmid, followed by transformation of sgRNA plasmid using the LiAc/ssDNA method, as this is the method which results in the largest theoretical library size. If instead one wishes to generate the same diversity using a single library (instead of combining multiple libraries), this library must be at least 7.2*10 6 (90bp dsoligo repair template), 2.1*10 6 (1.4kb repair template), or 9.6*10 8 (plasmid repair template).
t) This has been calculated assuming L2 is # distinct variants in donor cells

Supplementary Note 1: Analysis of in vivo mutation rate through next-generation sequencing
In order to gain a detailed picture of the mutational rate and spectrum enabled by Ty1, yeast cells containing URA3-containing, Ty1 retrotransposons were induced in galactose for three days, after which intron-less URA3 amplicons were generated via PCR of total DNA. As a negative control encompassing background genetic drift, PCR error rate, and sequencing error, a region of the Ampicillin resistance gene (Amp) of the same length (which is not reverse transcribed) was also amplified (See Methods). These amplicons were then sequenced ( Supplementary Fig. 5d). Analysis of identified mutants showed a uniform error distribution across each amplicon, with URA3 consistently showing a higher mutation rate (0.28 kb -1 ) than Amp (0.13 kb -1 ) (Fig. 4b). This 0.15 kb -1 increase in error rate above the combined effects of drift, PCR error, and sequencing error was thus due to Ty1 and was also reflected in an increased frequency of observing a given number of mutations per 200bp read in URA3 versus Amp (Fig.   4c). Finally, Ty1 exhibited a mutational spectrum commensurate with other commonly-used error-prone polymerases and displayed an error rate which is useful for directed evolution of genes and pathways (Table 1). Collectively, this analysis indicated that the Ty1 retrotransposon is a useful vehicle for introducing mutations to defined genes and pathways in vivo.

Supplementary Note 2: Analysis of in vivo mutation rate using dKanMX reversion assay
In order to directly compare the rates of Ty1-induced mutations and background genetic drift, the dKanMX marker was constructed for use in a mutation reversion assay, containing two point mutations that together prevent functional activity. The first is T405A, which introduces an artificial stop codon at the 135th residue, in the middle of the proposed active site 18 . To separate the effect of background genetic drift absent the optimized Ty1 retroelement, dKanMX was either integrated directly into the genome (g-dKanMX) or incorporated into a Ty1 mutagenesis cassette, which was also integrated into the genome (ICE-dKanMX). Both strains were grown to stationary phase and exposed to galactose, then plated on media containing G418.
The rate of reversion mutations could then be measured by counting G418-resistant colonies.
Genomic DNA from 39 colonies of both strains was extracted, a PCR designed to amplify position 135 of dKanMX was performed, and amplicons were sequenced via Sanger sequencing. In

Supplementary Note 3: Characterization of ura3 variants
The first-round mutant ura3 (3)(4)(5) contained a single coding mutation (Arg 145 Ile) that resides on an outer loop of the URA3p (β/α)8 barrel and which is distal to both the homodimer interface and catalytic site ( Supplementary Fig. 2a). After isolation and sequencing of ura3 mutants enriched by the first round of screening, the capacities of each ura3 variant to convert 5-FOA to 5-fluorouracil, while maintaining activity in the uracil biosynthesis pathway, were performed by integrating each variant into a low-copy vector and transformed this expression cassette into BY4741 ∆rrm3. Cells containing either a mutant ura3 or wild-type URA3 were then plated on solid media lacking uracil and containing 5-FOA, and relative growth rate was quickly determined by comparing colony size. Strains expressing ura3 (3)(4)(5) enabled over 3-fold increases to colony area relative to those expressing the wild-type gene, indicating a decreased propensity of these mutants to convert 5-FOA into 5-fluorouracil while retaining their function in the uracil biosynthesis pathway (Supplementary Fig. 3b). Importantly, cells containing this mutant did not exhibit decreased growth rate relative to those containing wild-type URA3 in uracil-deficient media without 5-FOA, indicating that their increased specificity came with no observable fitness tradeoff under these conditions.

Supplementary Note 4: Characterization of spt15 variants
The best-performing mutant isolated from the first round, spt15-B6, contains a single coding mutation (Arg 98 His) near the DNA-binding domain of this protein ( Supplementary Fig.   2b), suggesting a putative mode of action. Spt15-B6-1, the best-performing mutant from the second round of ICE contained a second coding mutation (Gly 192 Ser) along with two indels in the TEF1 promoter (Fig. 4b). This coding sequence mutation, like that of spt15-B6, resides in the DNA-binding domain ( Supplementary Fig. 2b).
After isolation and sequencing of spt15 mutants enriched by our initial screening, viability analyses of spt15 mutants were performed by integrating each spt15 mutant into a lowcopy vector and transforming this expression cassette into wild-type BY4741. These strains, along with controls expressing either wild-type SPT15 or an empty vector, were grown to stationary phase and then subjected to a killing concentration of 1-butanol (3.5% by volume).
After 0, 1, 2, and 3.5 hours, a small volume of each culture was plated to determine the number of viable cells remaining (see Methods). This analysis indicated that cells containing spt15-B6 exhibited a 1.7-fold higher viability in lethal 1-butanol concentrations compared to wild-type, while cells containing spt15-B6-1 exhibited up to 1.95-fold higher viability relative to wild-type under the same conditions (Fig. 4b).
These mutants were then further tested for any potential growth improvements in butanol-containing media. The same strains, along with controls expressing either wild-type SPT15 or an empty vector, were grown to stationary phase and then resuspended at a low OD in media containing between 1.2% and 1.5% butanol. These were then grown in anaerobic sealed culture tubes. Spt15-B6-1 conferred 32% and 44% increased growth over wild-type at 1.3% and 1.4% 1-butanol, respectively (p<0.05). As spt15-B6-1 also contained mutations to pTEF1, a qRT-PCR experiment was carried out to investigate potential changes to transcription levels in order to provide insights into the observed phenotype (see "Expression Analysis" above).
However, expression measurements indicated no difference between the rate of transcription enabled by pTEF1 and the promoter contained in spt15-B6-1 (Supplementary Fig. 3a) under these test-tube, exponential growth conditions.

Supplementary Note 5: Development of an optimal in vivo mutagenesis host for screening xylose pathway variants
We performed several modifications to our optimized Ty1-containing strain in order to enable selection for growth on xylose. In particular, GRE3, which encodes an aldose reductase, was knocked out in order to reduce competitive xylose utilization and allow any potential improvements in xylose isomerase activity to confer a greater phenotypic advantage, thus increasing the sensitivity of our growth-based screen 20 . An additional copy of XKS1 was also integrated into the genome to boost downstream metabolic flux. Finally, since overexpression of transaldolase (TAL1) has been shown to improve xylose consumption rate 21 , the TAL1 gene was expressed under the control of pTEF1 and cloned into the tRNA iMet overexpression vector. The resulting plasmid was introduced into BY4741 ∆rrm3 containing the GRE3 knockout and XKS1 overexpression, resulting in BY4741 Δr-g-x. The two-gene xylose catabolic pathways consisting of xylose isomerase and xylulokinase were driven by a strong hybrid pTDH3 promoter (UASTEF-UASCIT-UASCLB-PGPD) 22 and pathway genes were joined using ribosome-cleavable 2A sites 23 .
These pathways were then inserted into the synthetic retroelement and integrated into the genome of BY4741 Δr-g-x, collectively forming the parent strains for ICE of each xylose pathway.

Supplementary Note 6: Characterization of xylose pathway variants
The I3K-66 and I3K-20 strains contain mutants with one (Ile 433 Val) and three (Ala 48 Ser, Ile 433 Val, Met 435 Ile) amino acid substitutions, respectively. I3K-66 also contains one silent mutation in xyla3* (A1029G). Interestingly, A48S lies inside the (β/α)8 barrel of xylA, which houses the dual Mg catalytic core and thus potentially influences the active site of this enzyme.
Ile 433 Val and Met 435 Ile both lie proximal to the homodimer interface and potentially influence the stability of the XylA3*p catalytic tetramer 24 (Supplementary Fig. 2c). Enzymatic assays of the isolated mutants indicated increased Vmax values (0.126 ± 0.008 and 0.134 ± 0.003 µmol min -1 mg protein -1 for I3K-66 and I3K-20, respectively) compared to wild-type (0.118 ± 0.007 µmol min -1 mg protein -1 ) (Supplementary Table 3). In comparing these values to prior work, it is important to note that XKS1 was not overexpressed during the characterization of xylose consumption kinetics in 15 , whereas it was in this work. However, in both works, Vmax and Km are computed using whole cell extracts, so the entire pathway will contribute to these values.
Since isomerization is reversible, high xylulose concentrations due to the absence of sufficient After screening, the growth rates of strains containing xylose pathway variants were characterized through growth in 1mL cultures containing 20 g L -1 xylose (Supplementary Fig. 3d and 3e). It was observed that mutant IK-34 displayed a 1.7-fold increase in growth rate over the control as well as a significantly shorter lag phase. For the I3K multi-gene cassette, I3K-66 and I3K-20 conferred roughly 1.3-fold improvements to growth rate (Supplementary Fig. 3d and 3e, Supplementary improvements to growth rate in xylose-containing media over their respective wild-type, respectively (Fig. 4c). Mirroring results at the 1mL scale, mutant IK-34 exhibited an 18 hour shorter lag, while mutants I3K-66 and I3K-20 exhibited a 6 hour shorter lag phase. These analyses produced similar results and indicated that variants of strains containing IK and I3K exhibited significantly increased growth rates and significantly reduced lag times compared with that of their parent strains.
These mutants were isolated at the following frequency at the end of the selection in our generated library using PEDEL 6 . This analysis shows that each 1-mutant variant is at a copy number of 37.6, while each 3-mutant variant is unique. This indicates that the expected probability of finding IK-34 and I3K-66 is 37.6/1.5*10^6=2.5*10^-5, and of finding I3K-20 is 6.7*10^-7 if no selection was occurring. Given the observed recovery frequencies of these variants, we concluded that IK-34 is enriched 1500-fold, I3K-66 610-fold, and I3K-20 57000-fold over the initial library.
We were initially surprised that none of these variants contained mutations in all of the promoter and coding sequences. As one potential explanation, we posit that in multigene pathways, it is only really necessary to improve the rate-limiting step in order for pathway performance to be increased, which may only map to one enzyme or promoter region. For long evolutionary time courses, multiple genes and regulatory elements will each eventually undergo mutations as they become the rate-limiting step in further improvements.