Multiplexed tracking of combinatorial genomic mutations in engineered cell populations

Journal name:
Nature Biotechnology
Year published:
Published online


Multiplexed genome engineering approaches can be used to generate targeted genetic diversity in cell populations on laboratory timescales, but methods to track mutations and link them to phenotypes have been lacking. We present an approach for tracking combinatorial engineered libraries (TRACE) through the simultaneous mapping of millions of combinatorially engineered genomes at single-cell resolution. Distal genomic sites are assembled into individual DNA constructs that are compatible with next-generation sequencing strategies. We used TRACE to map growth selection dynamics for Escherichia coli combinatorial libraries created by recursive multiplex recombineering at a depth 104-fold greater than before. TRACE was used to identify genotype-to-phenotype correlations and to map the evolutionary trajectory of two individual combinatorial mutants in E. coli. Combinatorial mutations in the human ES2 ovarian carcinoma cell line were also assessed with TRACE. TRACE completes the combinatorial engineering cycle and enables more sophisticated approaches to genome engineering in both bacteria and eukaryotic cells than are currently possible.

At a glance


  1. By condensing genotype information into a single construct compatible with high-throughput sequencing technology, we can track libraries containing >105 members.
    Figure 1: By condensing genotype information into a single construct compatible with high-throughput sequencing technology, we can track libraries containing >105 members.

    1. Primers and linkers are precisely engineered to be functional while minimizing undesired primer-primer interactions. 2. These primers are used with optimized PCR reaction kinetics to assemble mutation information into a single construct. 3. The assembled construct can be amplified and read by Sanger sequencing (for single colonies) or next-generation sequencing (at the population level).

  2. Mathematical modeling of the multiplexed assembly kinetics occurring when condensing information directly from the genome in TRACE.
    Figure 2: Mathematical modeling of the multiplexed assembly kinetics occurring when condensing information directly from the genome in TRACE.

    (a) Analysis of the number of assembly PCR cycles required to generate final construct from 1 pM starting template to 1 nM assembled product versus the number of sites assembled. We also found that equal primer concentrations and increased anneal time (45 s) improve assembly. (b) Analysis of the nucleotide consumption during assembly. The nucleotide specificity threshold is plotted versus primer concentration for several sites assembled. The specificity threshold is defined as the number of cycles required for the ratio of the final construct concentration to amount of nucleotides consumed during the reaction to surpass 10−14.

  3. Generalizability and demonstration of TRACE assembly approach on single genotypes.
    Figure 3: Generalizability and demonstration of TRACE assembly approach on single genotypes.

    (a) Genome scale map of target locations along with potential interactions, which are assembled with the model ten-site construct. Target sequences are 3–6 nt and bolded. (b) Adjacent linkers are assembled, then subsequently amplified. Low molecular weight ladder is used. (c) Final ten-site construct amplified after assembly for 1–3, 1–6, 1–8 and 1–10 sites. 1-kb ladder is used. Final construct density of the ten-site product is around 105 bp/site. (d) TRACE is performed for four different libraries under broad reaction conditions including the addition of DMSO and betaine. Constructs include (i) ten-site (previously demonstrated), (ii) nine-site, (iii) eight-site and (iv) four-site ES2 cancer constructs. 100-bp ladder is used. (e) Sequencing of ten-site construct. Errors (1%, ten errors) were randomly spread throughout the construct. Resequencing corrected for these errors.

  4. Tracking an artificial combinatorial population using next-generation sequencing.
    Figure 4: Tracking an artificial combinatorial population using next-generation sequencing.

    (a) Schematic of the approach. Six genotypes are generated by recombineering modified at three sites (galK-kan-bla) to be either on (1) or off (0). TRACE is performed in emulsions to generate a construct set representing the original population. (b) Graphical representation of observed genotypes versus varying η for an initial population of two unequal genotypes. In this case the middle genotype is not considered in the calculation as it does not vary. (c) Constructed is the population frequency created assuming pure strains and constant cell count to OD600 between strains. Expected (η = ∞) refers to expected population assuming strains in the constructed population have background mutations. Expected (η = 0) is the expected result assuming η = 0. (d) Experimental TRACE results on a population constructed from Figure 3b. Filtered data assumes η = 4. (e) Four populations are assessed with TRACE, and measured genotype frequency versus corrected population is displayed. (f) The same population data of measured genotype frequency is displayed versus the full crossover prediction.

  5. Assessment of a hydrolysate tolerance library generated by recursive multiplexed recombineering.
    Figure 5: Assessment of a hydrolysate tolerance library generated by recursive multiplexed recombineering.

    (a) All sites targeted during multiplexed recombineering are displayed on a genome map. The interactions of four genes (lpp-lpcA-ilvM-tonB) identified to be important to hydrolysate tolerance are highlighted and studied with TRACE. These genes included proteins responsible for membrane composition (lpp, lpcA), nucleotide biosynthesis (ilvM) and energy transduction (tonB). (b) MAGE libraries were genotyped at 4 rounds of MAGE and 12 rounds of MAGE. The TRACE signal-to-noise ratio is plotted against the measured genotype frequency. (c) The evolutionary trajectory of (i) the most abundant single mutant (lpp, 4.5%); (ii) the most abundant double mutant (ilvM-tonB, 0.19%). (d) Combinatorial RBS calculated gene expression for enriched and lost genotypes. Enriched genotypes are newly identified genotypes after selection with η > 2 and frequency >4 × 10−5. Reduced genotypes are genotypes identified in the 12-round MAGE library with η > 2 and frequency >4 × 10−5 that were not identified after selection.

  6. Assessment of a 6-site RBS library targeting membrane genes found to be influential to isobutanol tolerance.
    Figure 6: Assessment of a 6-site RBS library targeting membrane genes found to be influential to isobutanol tolerance.

    (a) Combinatorial RBS translation initiation rates for six genes identified after three-rounds of isobutanol selection in minimal media. Plotted are enriched mutants (red), diluted mutants (blue) and wild-type sequence (black). (b) Measured landscape of all targeted mutations between slt and murF sorted by predicted translation initiation rate. Number of counts before and after selection and relative change in frequency are included for each genotype. Single-site mutants are outlined in yellow. P-values refer to confidence that this landscape is different than a η = 0 landscape generated from single-site mutant frequencies. (c) Analysis of growth rate in isobutanol for three strains enriched by colony TRACE sequencing and three strains reconstructed by recursive multiplexed recombineering. Relative growth is the OD600 of the mutant strain compared with the OD600 of the wild-type strain.


  1. Wang, H.H. et al. Programming cells by multiplex genome engineering and accelerated evolution. Nature 460, 894898 (2009).
  2. Wang, H.H. et al. Genome-scale promoter engineering by coselection MAGE. Nat. Methods 9, 591593 (2012).
  3. Alper, H., Miyaoku, K. & Stephanopoulos, G. Construction of lycopene-overproducing E. coli strains by combining systematic and combinatorial gene knockout targets. Nat. Biotechnol. 23, 612616 (2005).
  4. Lee, J.W. et al. Systems metabolic engineering of microorganisms for natural and non-natural chemicals. Nat. Chem. Biol. 8, 536546 (2012).
  5. Wang, B.L. et al. Microfluidic high-throughput culturing of single cells for selection based on extracellular metabolite production or consumption. Nat. Biotechnol. 32, 473478 (2014).
  6. Pál, C., Papp, B. & Pósfai, G. The dawn of evolutionary genome engineering. Nat. Rev. Genet. 15, 504512 (2014).
  7. Romero, P.A. & Arnold, F.H. Exploring protein fitness landscapes by directed evolution. Nat. Rev. Mol. Cell Biol. 10, 866876 (2009).
  8. Wang, H.H. & Church, G.M. Multiplexed genome engineering and genotyping methods applications for synthetic biology and metabolic engineering. Methods Enzymol. 498, 409426 (2011).
  9. Barrick, J.E. & Lenski, R.E. Genome dynamics during experimental evolution. Nat. Rev. Genet. 14, 827839 (2013).
  10. Sandoval, N.R. et al. Strategy for directing combinatorial genome engineering in Escherichia coli. Proc. Natl. Acad. Sci. USA 109, 1054010545 (2012).
  11. Woodruff, L.B. et al. Genome-scale identification and characterization of ethanol tolerance genes in Escherichia coli. Metab. Eng. 15, 124133 (2013).
  12. Warner, J.R., Reeder, P.J., Karimpour-Fard, A., Woodruff, L.B. & Gill, R.T. Rapid profiling of a microbial genome using mixtures of barcoded oligonucleotides. Nat. Biotechnol. 28, 856862 (2010).
  13. Kosuri, S. & Church, G.M. Large-scale de novo DNA synthesis: technologies and applications. Nat. Methods 11, 499507 (2014).
  14. Wang, X. et al. Engineering furfural tolerance in Escherichia coli improves the fermentation of lignocellulosic sugars into renewable chemicals. Proc. Natl. Acad. Sci. USA 110, 40214026 (2013).
  15. Phillips, P.C. Epistasis—the essential role of gene interactions in the structure and evolution of genetic systems. Nat. Rev. Genet. 9, 855867 (2008).
  16. Khan, A.I. et al. Negative epistasis between beneficial mutations in an evolving bacterial population. Science 332, 11931196 (2011).
  17. Craig, D.W. et al. Identification of genetic variants using bar-coded multiplexed sequencing. Nat. Methods 5, 887893 (2008).
  18. Tewhey, R. et al. Microdroplet-based PCR enrichment for large-scale targeted sequencing. Nat. Biotechnol. 27, 10251031 (2009).
  19. Mamanova, L. et al. Target-enrichment strategies for next-generation sequencing. Nat. Methods 7, 111118 (2010).
  20. Metzker, M.L. Sequencing technologies—the next generation. Nat. Rev. Genet. 11, 3146 (2010).
  21. Casini, A. et al. One-pot DNA construction for synthetic biology: the Modular Overlap-Directed Assembly with Linkers (MODAL) strategy. Nucleic Acids Res. 42, e7 (2014).
  22. Wetmur, J.G. et al. Molecular haplotyping by linking emulsion PCR: analysis of paraoxonase 1 haplotypes and phenotypes. Nucleic Acids Res. 33, 26152619 (2005).
  23. Estep, A.L., Palmer, C., McCormick, F. & Rauen, K.A. Mutation analysis of BRAF, MEK1 and MEK2 in 15 ovarian cancer cell lines: implications for therapy. PLoS ONE 2, e1279 (2007).
  24. Williams, R. et al. Amplification of complex gene libraries by emulsion PCR. Nat. Methods 3, 545550 (2006).
  25. Kanagawa, T. Bias and artifacts in multitemplate polymerase chain reactions (PCR). J. Biosci. Bioeng. 96, 317323 (2003).
  26. Pääbo, S., Irwin, D.M. & Wilson, A.C. DNA damage promotes jumping between templates during enzymatic amplification. J. Biol. Chem. 265, 47184721 (1990).
  27. Volkmer, B. & Heinemann, M. Condition-dependent cell volume and concentration of Escherichia coli to facilitate data conversion for systems biology modeling. PLoS ONE 6, e23126 (2011).
  28. Sawitzke, J.A. et al. Probing cellular processes with oligo-mediated recombination and using the knowledge gained to optimize recombineering. J. Mol. Biol. 407, 4559 (2011).
  29. Farasat, I. et al. Efficient search, mapping, and optimization of multi-protein genetic systems in diverse bacteria. Mol. Syst. Biol. 10, 731 (2014).
  30. Minty, J.J. et al. Evolution combined with genomic study elucidates genetic bases of isobutanol tolerance in Escherichia coli. Microb. Cell Fact. 10, 18 (2011).
  31. Goodarzi, H. et al. Regulatory and metabolic rewiring during laboratory evolution of ethanol tolerance in E. coli. Mol. Syst. Biol. 6, 378 (2010).
  32. Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 9094 (2011).
  33. Na, D. et al. Metabolic engineering of Escherichia coli using synthetic small regulatory RNAs. Nat. Biotechnol. 31, 170174 (2013).
  34. Yoo, S.M., Na, D. & Lee, S.Y. Design and use of synthetic regulatory small RNAs to control gene expression in Escherichia coli. Nat. Protoc. 8, 16941707 (2013).
  35. Jiang, W. et al. RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nat. Biotechnol. 31, 233239 (2013).
  36. Datta, S., Costantino, N., Zhou, X. & Court, D.L. Identification and analysis of recombineering functions from Gram-negative and Gram-positive bacteria and their phages. Proc. Natl. Acad. Sci. USA 105, 16261631 (2008).
  37. Gevertz, J.L., Dunn, S.M. & Roth, C.M. Mathematical model of real-time PCR kinetics. Biotechnol. Bioeng. 92, 346355 (2005).
  38. Eklund, A.C., Friis, P., Wernersson, R. & Szallasi, Z. Optimization of the BLASTN substitution matrix for prediction of non-specific DNA microarray hybridization. Nucleic Acids Res. 38, e27 (2010).
  39. Miura, F., Uematsu, C., Sakaki, Y. & Ito, T. A novel strategy to design highly specific PCR primers based on the stability and uniqueness of 3′-end subsequences. Bioinformatics 21, 43634370 (2005).
  40. Caporaso, J.G. et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc. Natl. Acad. Sci. USA 108, 45164522 (2011).
  41. Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 16391645 (2009).

Download references

Author information


  1. Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado, USA.

    • Ramsey I Zeitoun,
    • Andrew D Garst,
    • Gur Pines,
    • Thomas J Mansell,
    • Tirzah Y Glebes &
    • Ryan T Gill
  2. Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.

    • George D Degen
  3. Department of Chemical and Biological Engineering, Colorado School of Mines, Golden, Colorado, USA.

    • Nanette R Boyle


R.I.Z., A.D.G. and R.T.G. conceived this idea. R.I.Z., A.D.G., G.P., T.J.M. and R.T.G. designed experiments. G.D.D. and R.I.Z. performed kinetic modeling. R.I.Z. performed experiments with assistance from G.P., T.Y.G. and N.R.B. R.I.Z. and R.T.G. wrote the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (2,275 KB)

    Supplementary Figures 1–22 and Supplementary Tables 1–16

Text files

  1. Supplementary Software (49 KB)

Additional data