Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Genome-wide mapping of mutations at single-nucleotide resolution for protein, metabolic and genome engineering


Improvements in DNA synthesis and sequencing have underpinned comprehensive assessment of gene function in bacteria and eukaryotes. Genome-wide analyses require high-throughput methods to generate mutations and analyze their phenotypes, but approaches to date have been unable to efficiently link the effects of mutations in coding regions or promoter elements in a highly parallel fashion. We report that CRISPR–Cas9 gene editing in combination with massively parallel oligomer synthesis can enable trackable editing on a genome-wide scale. Our method, CRISPR-enabled trackable genome engineering (CREATE), links each guide RNA to homologous repair cassettes that both edit loci and function as barcodes to track genotype–phenotype relationships. We apply CREATE to site saturation mutagenesis for protein engineering, reconstruction of adaptive laboratory evolution experiments, and identification of stress tolerance and antibiotic resistance genes in bacteria. We provide preliminary evidence that CREATE will work in yeast. We also provide a webtool to design multiplex CREATE libraries.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: CREATE workflow.
Figure 2: CREATE validation.
Figure 3: Saturation mutagenesis of an essential bacterial gene.
Figure 4: Reconstruction of thermotolerant genotypes.
Figure 5: Genome-scale mapping of mutations that confer antibiotic resistance (ac) and solvent tolerance (df).

Accession codes

Primary accessions

Protein Data Bank

Referenced accessions

Protein Data Bank


  1. 1

    Findlay, G.M., Boyle, E.A., Hause, R.J., Klein, J.C. & Shendure, J. Saturation editing of genomic regions by multiplex homology-directed repair. Nature 513, 120–123 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. 2

    Shendure, J. Life after genetics. Genome Med. 6, 86 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  3. 3

    Smanski, M.J. et al. Functional optimization of gene clusters by combinatorial design and assembly. Nat. Biotechnol. 32, 1241–1249 (2014).

    Article  CAS  Google Scholar 

  4. 4

    Isaacs, F.J. et al. Precise manipulation of chromosomes in vivo enables genome-wide codon replacement. Science 333, 348–353 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. 5

    Wang, H.H. et al. Programming cells by multiplex genome engineering and accelerated evolution. Nature 460, 894–898 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. 6

    Sandoval, N.R. et al. Strategy for directing combinatorial genome engineering in Escherichia coli. Proc. Natl. Acad. Sci. USA 109, 10540–10545 (2012).

    Article  Google Scholar 

  7. 7

    Wang, H.H. et al. Multiplexed in vivo His-tagging of enzyme pathways for in vitro single-pot multienzyme catalysis. ACS Synth. Biol. 1, 43–52 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. 8

    Raman, S., Rogers, J.K., Taylor, N.D. & Church, G.M. Evolution-guided optimization of biosynthetic pathways. Proc. Natl. Acad. Sci. USA 111, 17803–17808 (2014).

    Article  CAS  PubMed  Google Scholar 

  9. 9

    Ho, J.M. et al. Efficient reassignment of a frequent serine codon in wild-type Escherichia coli. ACS Synth. Biol. 5, 163–171 (2016).

    Article  CAS  Google Scholar 

  10. 10

    Warner, J.R., Reeder, P.J., Karimpour-Fard, A., Woodruff, L.B.A. & Gill, R.T. Rapid profiling of a microbial genome using mixtures of barcoded oligonucleotides. Nat. Biotechnol. 28, 856–862 (2010).

    Article  CAS  Google Scholar 

  11. 11

    Wetmore, K.M. et al. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons. MBio 6, e00306–e00315 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. 12

    Zeitoun, R.I. et al. Multiplexed tracking of combinatorial genomic mutations in engineered cell populations. Nat. Biotechnol. 33, 631–637 (2015).

    Article  CAS  Google Scholar 

  13. 13

    Kim, H. & Kim, J.-S. A guide to genome engineering with programmable nucleases. Nat. Rev. Genet. 15, 321–334 (2014).

    Article  CAS  Google Scholar 

  14. 14

    Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. 15

    Jiang, Y. et al. Multigene editing in the Escherichia coli genome via the CRISPR-Cas9 system. Appl. Environ. Microbiol. 81, 2506–2514 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    Li, Y. et al. Metabolic engineering of Escherichia coli using CRISPR-Cas9 meditated genome editing. Metab. Eng. 31, 13–21 (2015).

    Article  CAS  Google Scholar 

  17. 17

    Shalem, O. et al. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science 343, 84–87 (2014).

    CAS  Google Scholar 

  18. 18

    Wang, T., Wei, J.J., Sabatini, D.M. & Lander, E.S. Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80–84 (2014).

    Article  CAS  Google Scholar 

  19. 19

    Gilbert, L.A. et al. Genome-scale CRISPR-mediated control of gene repression and activation. Cell 159, 647–661 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. 20

    Peters, J.M. et al. A comprehensive, CRISPR-based functional analysis of essential genes in bacteria. Cell 165, 1493–1506 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. 21

    Li, K., Wang, G., Andersen, T., Zhou, P. & Pu, W.T. Optimization of genome engineering approaches with the CRISPR/Cas9 system. PLoS One 9, e105779 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  22. 22

    Zhou, Y. et al. High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells. Nature 509, 487–491 (2014).

    Article  CAS  Google Scholar 

  23. 23

    Chen, S. et al. Genome-wide CRISPR screen in a mouse model of tumor growth and metastasis. Cell 160, 1246–1260 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. 24

    Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. 25

    Koike-Yusa, H., Li, Y., Tan, E.-P., Velasco-Herrera, Mdel.C. & Yusa, K. Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nat. Biotechnol. 32, 267–273 (2014).

    Article  CAS  Google Scholar 

  26. 26

    Pines, G. et al. Codon compression algorithms for saturation mutagenesis. ACS Synth. Biol. 4, 604–614 (2015).

    Article  CAS  Google Scholar 

  27. 27

    Sawitzke, J.A. et al. Probing cellular processes with oligo-mediated recombination and using the knowledge gained to optimize recombineering. J. Mol. Biol. 407, 45–59 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. 28

    Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini, L.A. RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nat. Biotechnol. 31, 233–239 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. 29

    Oh, J.-H. van Pijkeren, J.-P. CRISPR-Cas9-assisted recombineering in Lactobacillus reuteri. Nucleic Acids Res. 42, e131 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. 30

    Watson, M., Liu, J.-W. & Ollis, D. Directed evolution of trimethoprim resistance in Escherichia coli. FEBS J. 274, 2661–2671 (2007).

    Article  CAS  Google Scholar 

  31. 31

    Toprak, E. et al. Evolutionary paths to antibiotic resistance under dynamically sustained drug selection. Nat. Genet. 44, 101–105 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. 32

    Iwakura, M. et al. Evolutional design of a hyperactive cysteine- and methionine-free mutant of Escherichia coli dihydrofolate reductase. J. Biol. Chem. 281, 13234–13246 (2006).

    Article  CAS  Google Scholar 

  33. 33

    Boehr, D.D., McElheny, D., Dyson, H.J. & Wright, P.E. The dynamic energy landscape of dihydrofolate reductase catalysis. Science 313, 1638–1642 (2006).

    Article  CAS  Google Scholar 

  34. 34

    Bhabha, G. et al. Divergent evolution of protein conformational dynamics in dihydrofolate reductase. Nat. Struct. Mol. Biol. 20, 1243–1249 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. 35

    Fisher, M.A. et al. Enhancing tolerance to short-chain alcohols by engineering the Escherichia coli AcrB efflux pump to secrete the non-native substrate n-butanol. ACS Synth. Biol. 3, 30–40 (2014).

    Article  CAS  Google Scholar 

  36. 36

    Foo, J.L. & Leong, S.S.J. Directed evolution of an E. coli inner membrane transporter for improved efflux of biofuel molecules. Biotechnol. Biofuels 6, 81 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. 37

    Tenaillon, O. et al. The molecular diversity of adaptive convergence. Science 335, 457–461 (2012).

    Article  CAS  Google Scholar 

  38. 38

    Chang, R.L. et al. Structural systems biology evaluation of metabolic thermotolerance in Escherichia coli. Science 340, 1220–1223 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. 39

    Basak, S. & Jiang, R. Enhancing E. coli tolerance towards oxidative stress via engineering its global regulator cAMP receptor protein (CRP). PLoS One 7, e51179 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. 40

    Rodríguez-Verdugo, A., Gaut, B.S. & Tenaillon, O. Evolution of Escherichia coli rifampicin resistance in an antibiotic-free environment during thermal stress. BMC Evol. Biol. 13, 50 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. 41

    Campbell, E.A. et al. Structural mechanism for rifampicin inhibition of bacterial rna polymerase. Cell 104, 901–912 (2001).

    Article  CAS  Google Scholar 

  42. 42

    White, D.G., Goldman, J.D., Demple, B. & Levy, S.B. Role of the acrAB locus in organic solvent tolerance mediated by expression of marA, soxS, or robA in Escherichia coli. J. Bacteriol. 179, 6122–6126 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. 43

    Nakashima, R., Sakurai, K., Yamasaki, S., Nishino, K. & Yamaguchi, A. Structures of the multidrug exporter AcrB reveal a proximal multisite drug-binding pocket. Nature 480, 565–569 (2011).

    Article  CAS  Google Scholar 

  44. 44

    Kohanski, M.A., Dwyer, D.J., Hayete, B., Lawrence, C.A. & Collins, J.J. A common mechanism of cellular death induced by bactericidal antibiotics. Cell 130, 797–810 (2007).

    Article  CAS  Google Scholar 

  45. 45

    Dwyer, D.J., Kohanski, M.A. & Collins, J.J. Role of reactive oxygen species in antibiotic action and resistance. Curr. Opin. Microbiol. 12, 482–489 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. 46

    Mills, T.Y., Sandoval, N.R. & Gill, R.T. Cellulosic hydrolysate toxicity and tolerance mechanisms in Escherichia coli. Biotechnol. Biofuels 2, 26 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. 47

    Glebes, T.Y., Sandoval, N.R., Gillis, J.H. & Gill, R.T. Comparison of genome-wide selection strategies to identify furfural tolerance genes in Escherichia coli. Biotechnol. Bioeng. 112, 129–140 (2015).

    Article  CAS  Google Scholar 

  48. 48

    Browning, D.F. et al. Modulation of CRP-dependent transcription at the Escherichia coli acsP2 promoter by nucleoprotein complexes: anti-activation by the nucleoid proteins FIS and IHF. Mol. Microbiol. 51, 241–254 (2004).

    Article  CAS  Google Scholar 

  49. 49

    Sandoval, N.R., Mills, T.Y., Zhang, M. & Gill, R.T. Elucidating acetate tolerance in E. coli using a genome-wide approach. Metab. Eng. 13, 214–224 (2011).

    Article  CAS  Google Scholar 

  50. 50

    Wolfe, A.J. The acetate switch. Microbiol. Mol. Biol. Rev. 69, 12–50 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. 51

    Chiang, S.M. & Schellhorn, H.E. Regulators of oxidative stress response genes in Escherichia coli and their functional conservation in bacteria. Arch. Biochem. Biophys. 525, 161–169 (2012).

    Article  CAS  Google Scholar 

  52. 52

    Wang, X. et al. Engineering furfural tolerance in Escherichia coli improves the fermentation of lignocellulosic sugars into renewable chemicals. Proc. Natl. Acad. Sci. USA 110, 4021–4026 (2013).

    Article  Google Scholar 

  53. 53

    Stoebel, D.M., Hokamp, K., Last, M.S. & Dorman, C.J. Compensatory evolution of gene regulation in response to stress by Escherichia coli lacking RpoS. PLoS Genet. 5, e1000671 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. 54

    Smith, A.M. et al. Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples. Nucleic Acids Res. 38, e142 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. 55

    van Opijnen, T., Bodi, K.L. & Camilli, A. Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms. Nat. Methods 6, 767–772 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. 56

    Lajoie, M.J. et al. Genomically recoded organisms expand biological functions. Science 342, 357–360 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. 57

    Ronda, C., Pedersen, L.E., Sommer, M.O.A. & Nielsen, A.T. CRMAGE: CRISPR optimized MAGE recombineering. Sci. Rep. 6, 19452 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. 58

    Maruyama, T. et al. Increasing the efficiency of precise genome editing with CRISPR-Cas9 by inhibition of nonhomologous end joining. Nat. Biotechnol. 33, 538–542 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. 59

    Reisch, C.R. & Prather, K.L.J. The no-SCAR (Scarless Cas9 Assisted Recombineering) system for genome editing in Escherichia coli. Sci. Rep. 5, 15096 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. 60

    Bao, Z. et al. A homology integrated CRISPR-Cas (HI-CRISPR) system for one-step multi-gene disruptions in Saccharomyces cerevisiae. ACS Synth. Biol. 4, 585–594 (2015).

    Article  CAS  Google Scholar 

  61. 61

    Wong, A.S.L. et al. Multiplexed barcoded CRISPR-Cas9 screening enabled by CombiGEM. Proc. Natl. Acad. Sci. USA 113, 2544–2549 (2016).

    Article  CAS  Google Scholar 

  62. 62

    Li, X.-T. et al. Identification of factors influencing strand bias in oligonucleotide-mediated recombination in Escherichia coli. Nucleic Acids Res. 31, 6674–6687 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. 63

    Costantino, N. & Court, D.L. Enhanced levels of λ Red-mediated recombinants in mismatch repair mutants. Proc. Natl. Acad. Sci. USA 100, 15748–15753 (2003).

    Article  CAS  Google Scholar 

  64. 64

    Wang, H.H., Xu, G., Vonner, A.J. & Church, G. Modified bases enable high-efficiency oligonucleotide-mediated allelic replacement via mismatch repair evasion. Nucleic Acids Res. 39, 7336–7347 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. 65

    Mosberg, J.A., Gregg, C.J., Lajoie, M.J., Wang, H.H. & Church, G.M. Improving lambda red genome engineering in Escherichia coli via rational removal of endogenous nucleases. PLoS One 7, e44638 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. 66

    Nyerges, Á. et al. A highly precise and portable genome engineering method allows comparison of mutational effects across bacterial species. Proc. Natl. Acad. Sci. USA 113, 2502–2507 (2016).

    Article  CAS  Google Scholar 

  67. 67

    Alper, H., Moxley, J., Nevoigt, E., Fink, G.R. & Stephanopoulos, G. Engineering yeast transcription machinery for improved ethanol tolerance and production. Science 314, 1565–1568 (2006).

    Article  CAS  Google Scholar 

  68. 68

    Alper, H. & Stephanopoulos, G. Global transcription machinery engineering: a new approach for improving cellular phenotype. Metab. Eng. 9, 258–267 (2007).

    Article  CAS  Google Scholar 

  69. 69

    Gutiérrez-Ríos, R.M. et al. Regulatory network of Escherichia coli: consistency between literature knowledge and microarray profiles. Genome Res. 13, 2435–2443 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. 70

    Ross, W. et al. A third recognition element in bacterial promoters: DNA binding by the alpha subunit of RNA polymerase. Science 262, 1407–1413 (1993).

    Article  CAS  Google Scholar 

  71. 71

    Ebright, R.H., Ebright, Y.W. & Gunasekera, A. Consensus DNA site for the Escherichia coli catabolite gene activator protein (CAP): CAP exhibits a 450-fold higher affinity for the consensus DNA site than for the E. coli lac DNA site. Nucleic Acids Res. 17, 10295–10305 (1989).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. 72

    Kosuri, S. et al. Scalable gene synthesis by selective amplification of DNA pools from high-fidelity microchips. Nat. Biotechnol. 28, 1295–1299 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. 73

    Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816–821 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. 74

    Qi, L.S. et al. Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152, 1173–1183 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. 75

    Firth, A.E. & Patrick, W.M. GLUE-IT and PEDEL-AA: new programmes for analyzing protein diversity in randomized libraries. Nucleic Acids Res. 36, W281–5 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. 76

    Datta, S., Costantino, N. & Court, D.L. A set of recombineering plasmids for gram-negative bacteria. Gene 379, 109–115 (2006).

    Article  CAS  Google Scholar 

  77. 77

    Prior, J.E., Lynch, M.D. & Gill, R.T. Broad-host-range vectors for protein expression across gram negative hosts. Biotechnol. Bioeng. 106, 326–332 (2010).

    CAS  PubMed  Google Scholar 

  78. 78

    Hamady, M., Walker, J.J., Harris, J.K., Gold, N.J. & Knight, R. Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nat. Methods 5, 235–237 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. 79

    Edgar, R.C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).

    Article  CAS  Google Scholar 

  80. 80

    Farasat, I. et al. Efficient search, mapping, and optimization of multi-protein genetic systems in diverse bacteria. Mol. Syst. Biol. 10, 731 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. 81

    Bakan, A., Meireles, L.M. & Bahar, I. ProDy: protein dynamics inferred from theory and experiments. Bioinformatics 27, 1575–1577 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. 82

    Bateman, A. et al. The Pfam protein families database. Nucleic Acids Res. 32, D138–D141 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. 83

    Schrödinger, LLC. The PyMOL Molecular Graphics System, Version 1.3r1 (2010).

  84. 84

    Nakashima, R. et al. Structural basis for the inhibition of bacterial multidrug exporters. Nature 500, 102–106 (2013).

    Article  CAS  Google Scholar 

  85. 85

    Hung, L.-W. et al. Crystal structure of AcrB complexed with linezolid at 3.5 Å resolution. J. Struct. Funct. Genomics 14, 71–75 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. 86

    Rice, P.A., Yang, S., Mizuuchi, K. & Nash, H.A. Crystal structure of an IHF-DNA complex: a protein-induced DNA U-turn. Cell 87, 1295–1306 (1996).

    Article  CAS  Google Scholar 

  87. 87

    Molodtsov, V. et al. X-ray crystal structures of the Escherichia coli RNA polymerase in complex with benzoxazinorifamycins. J. Med. Chem. 56, 4758–4763 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. 88

    Murakami, K.S., Masuda, S. & Darst, S.A. Structural basis of transcription initiation: RNA polymerase holoenzyme at 4 A resolution. Science 296, 1280–1284 (2002).

    Article  CAS  Google Scholar 

  89. 89

    Rhee, S., Martin, R.G., Rosner, J.L. & Davies, D.R. A novel DNA-binding motif in MarA: the first structure for an AraC family transcriptional activator. Proc. Natl. Acad. Sci. USA 95, 10413–10418 (1998).

    Article  CAS  Google Scholar 

  90. 90

    Kwon, H.J., Bennik, M.H., Demple, B. & Ellenberger, T. Crystal structure of the Escherichia coli Rob transcription factor in complex with DNA. Nat. Struct. Biol. 7, 424–430 (2000).

    Article  CAS  Google Scholar 

Download references


This work would not have been possible without the insights and efforts of a number of talented individuals. We would like to thank T. Mansell, N. Boyle, K. Fujimori, and H. Chilton for their input, feedback, guidance and contributions to this work. This work was supported by the US Department of Energy (Grant DE-SC0008812) and CAPES foundation (grant #0315133).

Author information




R.T.G., M.C.B., G.P., S.A.L. and R.Z. all contributed to A.D.G.'s development of the concept. A.D.G., M.C.B., G.P., S.A.L. and R.T.G. all aided in the design of experiments. Scripts to automate CREATE cassette design were written by A.D.G. Library construction and recombineering was done by A.D.G. Selections, sample preparation, sequencing, clonal reconstructions and growth validations of selected variants were done by A.D.G., M.C.B., G.P., R.L. and Z.W. Sequencing data analysis was done by A.D.G. with contributions to the statistical analysis provided by R.Z. and R.T.G. The web interface was developed by A.L.H.-E. Yeast validation of CREATE methodology was performed by L.L., R.L. and W.G.A. The manuscript was written by A.D.G. and R.T.G.

Corresponding author

Correspondence to Ryan T Gill.

Ethics declarations

Competing interests

R.T.G. and A.D.G. have a patent application pending (WO/2015/123339) whose value may be affected by the publication of this paper. R.T.G., A.D.G. and A.L.H.-E. have financial interests in Muse Biotechnology Inc., which is commercializing the CREATE technology.

Integrated supplementary information

Supplementary Figure 1 Enabling flexible design strategies.

Illustration of designs compatible with the CREATE strategy. a) For protein engineering applications, a silent codon approach is taken (top, see also Fig. 1). This mutation strategy allows targeted mutagenesis of key protein regions to alter features such as DNA binding, protein-protein interactions, catalysis, or allosteric regulation. Above, a DNA binding saturation mutagenesis library designed for the global transcription factor Fis designed for this study is illustrated. b) For promoter mutations, PAM sites in proximity to a specified transcription start site (TSS) can be be disrupted through nucleotide replacement or integration cassettes. To simplify this design procedure used in this study, consensus CAP or UP elements were designed for integration at a fixed location relative to the TSS without taking into account possible effects of these mutations may have on proximal genes. c) An example of cassette design for mutagenizing a ribosome binding site (RBS). Although easily accommodated by the design pipeline, RBS mutagenesis was not performed for this study. d) Example of a simple deletion design. Points a and b are included to illustrate distance between two sites at the gene deletion locus. In all cases cassette designs disrupt a targeted PAM to allow selective enrichment of the designed mutant.

Supplementary Figure 2 Cas9 editing efficiency controls using galactokinase red/white screening.

a) The CREATE galK_120/17 off cassette (different cassettes tested shown below in b) was transformed into different backgrounds to assess the efficiency of homologous recombination between the CREATE plasmid and the target genome. Red colonies represent unedited (wt) genomic variants and white colonies represent edited variants. Transformation into cells containing only pSIM5 or pSIM5/X2 and dCas9 plasmids exhibited no detectable recombination as indicated by the lack of white colonies. In the presence of active Cas9 (X2-Cas9, far right) we observe high efficiency editing (>80%), indicating the requirements for dsDNA cleavage to achieve high efficiency editing and library coverage. b) Different cassettes designed to test the requirements of efficient editing using the CREATE strategy. The stop codon edit is shown in blue for each cassette, and the synonymous PAM mutation (red) is positioned at increasing distances from the edit site (17, 44 and 59 bp).

Supplementary Figure 3 Toxicity of gRNA dsDNA cleavage in E. coli.

a) The toxicity of a single gRNA cut in E. coli as observed in control experiments with a gRNA targeting galK (spacer sequence TTAACTTTGCGTAACAACGC) or folA (spacer sequence GTAATTTTGTATAGAATTTA). In the absence of a repair template we observe strong killing from the gRNA. Rescue efficiencies of 103-104 are observed upon co-transformation of a single stranded donor oligo indicating the need for a homologous repair template to alleviate this toxicity. b) Toxicity of multiple CREATE edits. The targeted sites are illustrated graphically on the left and at the bottom of the bar graph. A non-targeting gRNA control was used to estimate transformation efficiency based on no edits (far left, no target sites). A CREATE cassette targeting either folA (red) or galK (green) or a combination of the two. Note the multiplicative toxicity in E. coli of having additional gRNAs expressed from the same plasmid. In this scenario, there is homologous repair for each site suggesting that off-target gRNA cleavage would be highly lethal. These data suggest that off target cleavage by a CREATE cassette would be selectively removed from the population early in the library construction phase.

Supplementary Figure 4 Test of CREATE strategy for gene deletions.

a) Cassette design for deleting 100 bp from the galK ORF. The HA is designed to recombine with regions of homology with the designated spacing, with each 50 bp side of the CREATE HA designed to recombine at the designated site (blue). The PAM/spacer location (red) is proximal to one of the homology arms and is deleted during recombination, allowing selectable enrichment of the deleted segment. b) Electrophoresis of chromosomal PCR amplicons from clones recombineered with this cassette. c) Design for 700 bp deletion as in a). d) Colony PCR of 700 bp deletion cassettes as in b). The asterisks in b) and d) indicate colonies that appear to have the designed deletion. Note that some clones appear to have bands pertaining to both wt and deletion sizes indicating that chromosome segregation in some of the colonies is incomplete when plated 3 hours post recombineering28.

Supplementary Figure 5 Editing efficiency controls by co-transformation of gRNA and linear dsDNA cassettes.

Effect of PAM distance on editing efficiency using linear dsDNA PCR amplicons and co-transformation with a gRNA. On the left is an illustration of the experiments - PCR amplicons were designed to contain a dual (TAATAA) stop codon on one side (asterisk) and a PAM mutation just downstream of the galK gene (gray box) on the other end. These PCR amplicons were co-transformed with a gRNA targeting the downstream galK PAM site. The primers were designed such that the mutations were 40 nt from the end of the amplicon to ensure enough homology for recombination. Data was obtained from these experiments by red/white colony screening. A linear fit to the data is shown at the bottom. Cassettes in which only the PAM mutation is present were included as assay controls and were observed to have very low rates of GalK inactivation. These experiments were performed in a BW25113 strain of E. coli in which the mutS gene was knocked out to allow high efficiency editing with double stranded DNA templates. This approach in MG1655 did not achieve high efficiency editing due to the active mutS allele.

Supplementary Figure 6 Library cloning analysis and statistics.

a) Reads from the plasmid library following cloning are shown according to the number of total mismatches between the read and the target design sequence. The majority of plasmids are matches to the correct design. However, there are a large number of 4 base pair indel/mismatch mutants that were observed in this cloned population. b) Plot of the mutation profile for the plasmid pool as a function of cassette position. An increase in the mutation frequency is observed near the center of the homology arm (HA) indicating a small error bias in the sequencing or synthesis of this region. We suspect that this is due to the presence of sequences complementary to the spacer element in the gRNA c) Histogram of the distances between the PAM and codon for the CREATE cassettes designed in this study. Large majority (> 95%) were within the design constraints tested in Fig. 2. The small fraction that are beyond 60 bp were made in cases where there was no synonymous PAM mutation within closer proximity. d) Library coverage from multiplexed cloning of CREATE plasmids. Deep sequencing counts of each variant are shown with respect to their position on the genome. The inset shows a histogram of the number of variants having the indicated plasmid counts in the cloned libraries.

Supplementary Figure 7 Precision of CREATE cassette tracking of recombineered populations.

a) Correlation plot of CREATE cassette read frequencies in the plasmid population prior to Cas9 exposure (x-axis) and after 3 hours post transformation into a Cas9 background. b) between replicate recombineering reactions following overnight recovery. The gray lines indicate the line of perfect correlation for reference. R2 and p values were calculated from a linear fit to the data using the Python SciPy statistics package. A counting threshold of 5 for each replicate experiment was applied to the data to filter out noise from each data set.

Supplementary Figure 8 Growth characteristics of folA mutations in M9 minimal media.

While F153R appears to maintain normal growth characteristics, the growth rate of the F153W mutation is significantly slower under these conditions, suggesting that these two amino acid substitutions at the same site have very different effects on organismal fitness presumably due to different changes invoked in the stability/dynamics of this protein.

Supplementary Figure 9 Enrichment profiles for folA CREATE cassettes in minimal media.

Cassettes that encode synonymous HA are shown in black and non-synonymous cassettes in gray, the dashed lines indicate enrichment scores with p<0.05 significance compared to the synonymous population mean as estimated from a bootstrap analysis (Sup. Meth.). The enrichment score observed for each mutant cassette at each position in the protein sequence is shown to the left and a histogram of these enrichment scores as a fraction of the total variants to the right. The two populations appear to be largely similar. Conserved residues that are highly deleterious are shown in blue for reference.

Supplementary Figure 10 Validation of newly identified acrB mutations for improved solvent and antibiotic tolerance.

a) On the left a global overview of AcrB efflux pump. Substrates enter the pump through the openings in the periplasmic space and are extruded via the AcrB/AcrA/TolC complex across the outer membrane and into the extracellular space. Library targeted residues are highlighted by blue spheres for reference and the red dot indicates the region where many of the enriched variants clustered. On the right is a blow up of the loop-helix motif abutting the central funnel where enriched mutations in isobutanol were identified (red and teal spheres), presumably affecting solute transport from the periplasmic space. Mutants targeting the T60 position (teal spheres) was also enriched in the presence of erythromycin b) Confirmation of N70D and D73L mutations for tolerance to isobutanol. The N70D mutation in particular appears to improve the final OD to a significant degree. Reconstructed strains were measured for final OD in capped 1.5 mL eppendorf tubes following 48 hours incubation. Error bars are derived from N=3 trials and p-values derived from a one-tailed T-test. c) Improved growth of the AcrB T60N mutant was observed in inhibitory concentrations of erythromycin (200 μg/mL) and isobutanol (1.2%) in shaking 96 well plate, indicating that this mutation may enhance the efflux activity of this pump towards many compounds. For these experiments, CREATE cassette designs were individually synthesized, cloned and sequence verified before recombineering into E. coli MG1655 to reconstruct the mutations and the genomic modifications were sequence verified by colony PCR to confirm the genotype-phenotype association.

Supplementary Figure 11 Benefits of rational mutagenesis for sampling novel adaptive genotypes.

a) 500 μg/mL rifampicin b) 500 μg/mL erythromycin c) 10 g/L acetate and d) 2 g/L furfural. While naturally evolving systems or error-prone PCR are highly biased towards sampling single nucleotide polymorphisms (i.e. 1 nt mutations, red) these histograms illustrate the potential advantages for rational design approaches that can identify rare or inaccessible mutations (2 and 3 nt, green and blue respectively). For example, the highest fitness solutions appear to be biased toward these rare mutations in rifampicin, erythromycin and furfural selections to varying degrees. These results indicate that procedures such as CREATE should allow more rapid and thorough analysis of fitness improving mutations, in much the same way that computational approaches are being used to improve directed evolution for protein engineering.

Supplementary Figure 12 Reconstruction of mutations identified by erythromycin selection.

Reconstructed strains grown in 0.5 mL in capped 1.5 mL eppendorf tubes following 48 hours incubation in the presence of 200 μg/mL erythromycin and final OD measurements assessed. Error bars are derived from N=3 trials. A one tailed T-test was performed on each set of measurements to determine p-values indicated for significance of growth benefit.

Supplementary Figure 13 Validation of Crp S28P mutation for furfural or thermal tolerance.

a) Crystal structure of the Crp regulatory protein with variants identified by furfural selection highlighted in red (PDB ID 3N4M). A number of the CREATE designs targeting residues near the cyclic-AMP binding site (aa. 28-30, 65) of this regulator were highly enriched in minimal media selections for furfural or thermal tolerance, suggesting that these mutations may enhance E. coli growth in minimal media under a variety of stress conditions. b) Validation the Crp S28P mutant identified in 2 g/L furfural selections in M9 media. This mutant was reconstructed as described for AcrB T60S in Fig. S8.

Supplementary Figure 14 Precision editing with CREATE in laboratory and wild strains of S. cerevisiae.

A CREATE cassette designed to edit tandem stop codons into ADE2 was designed and inserted into a modified pCRCT vector29. a) The laboratory strain BY4709 was transformed, and 95% of the colonies were phenotypically confirmed as red after 3 days liquid culture and plating, indicating successful deactivation of ADE2. Genotypic confirmation of 20 strains was performed by sequencing the ADE2 genomic locus and revealed that the disruption was due to the designed edit and not a consequence of NHEJ mediated indel formation in 100% of the colonies. b) RM11-1a, a haploid derivative of a vineyard strain30, was transformed using the same plasmid and CREATE cassette designed using the S288c reference sequence for ADE2. Despite three polymorphisms occurring in the region of interest in RM11-1 ADE2, 98% of colonies possessed the red ade2 knockout phenotype. We sequenced the ADE2 locus in 20 of these red colonies to confirm that 70% were mutated as intended at the target site (we note that in RM11-1a we saw many colonies where the sequencing traces contained mixtures of the wt and designed dual stop codon edit suggesting incomplete chromosomal segregation).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14 (PDF 1788 kb)

Supplementary Table 1

Protein Engineering Libraries (XLSX 42 kb)

Supplementary Table 2

Cloning primers (XLSX 26 kb)

Supplementary Table 3

MiSeqPrimers (XLSX 32 kb)

Supplementary Table 4

Supplementary Table 4 (XLSX 83 kb)

Supplementary Table 5

Publication (XLSX 24 kb)

Supplementary Table 6

Supplementary Table 6 (TXT 8711 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Garst, A., Bassalo, M., Pines, G. et al. Genome-wide mapping of mutations at single-nucleotide resolution for protein, metabolic and genome engineering. Nat Biotechnol 35, 48–55 (2017).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing