Abstract

Nature uses 64 codons to encode the synthesis of proteins from the genome, and chooses 1 sense codon—out of up to 6 synonyms—to encode each amino acid. Synonymous codon choice has diverse and important roles, and many synonymous substitutions are detrimental. Here we demonstrate that the number of codons used to encode the canonical amino acids can be reduced, through the genome-wide substitution of target codons by defined synonyms. We create a variant of Escherichia coli with a four-megabase synthetic genome through a high-fidelity convergent total synthesis. Our synthetic genome implements a defined recoding and refactoring scheme—with simple corrections at just seven positions—to replace every known occurrence of two sense codons and a stop codon in the genome. Thus, we recode 18,214 codons to create an organism with a 61-codon genome; this organism uses 59 codons to encode the 20 amino acids, and enables the deletion of a previously essential transfer RNA.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

The sequences and genome design details used in this study are available in the Supplementary Data. Supplementary Data 1 provides the GenBank file of the E. coli MDS42 genome (NCBI accession number AP012306.1); Supplementary Data 2 provides the GenBank file of the designed synthetic E. coli genome with codon replacements and refactorings; Supplementary Data 3 provides the table of target codons; Supplementary Data 4 provides the table of overlaps and refactoring; Supplementary Data 5 provides the table of 10-kb stretches; Supplementary Data 6 provides the GenBank file of the BAC sacB-cat-rpsL; Supplementary Data 7 provides the GenBank file of BAC-rpsL-kanR-sacB; Supplementary Data 8 provides the GenBank file of the BAC rpsL-kanR-pheS-HygR; Supplementary Data 9 provides the table of BAC construction; Supplementary Data 10 provides the table of BAC assembly; Supplementary Data 11 provides the table of REXER experiments; Supplementary Data 12 provides the GenBank file of spacer plasmids without trans-activating CRISPR RNA (tracrRNA) and annotation for linear spacers; Supplementary Data 13 provides the GenBank file of spacer plasmids with tracrRNA and annotation for linear spacers; Supplementary Data 14 provides the table of oligonucleotides used for recoding fixing experiments; Supplementary Data 15 provides the GenBank file of the gentamycin-resistance oriT cassette; Supplementary Data 16 provides the oligonucleotide primers used for conjugation; Supplementary Data 17 provides the GenBank file of the pJF146 F′ plasmid that does not self-transfer; Supplementary Data 18 provides the GenBank file of the fully recoded genome of Syn61, verified by next-generation sequencing; Supplementary Data 19 provides the table of design optimizations and non-programmed mutations; Supplementary Data 20 provides a list of the proteins identified by tandem mass spectrometry; and Supplementary Data 21 provides a list of the primers used for deletion experiments. All other datasets generated and/or analysed in this study are available from the corresponding author upon reasonable request. All materials (Supplementary Data 9, 12, 13, 17, 18) from this study are available from the corresponding author upon reasonable request.

Code availability

Code used for genome design is available at https://github.com/TiongSun/genome_recoding; for sequencing at https://github.com/TiongSun/iSeq; and for generating recoding landscapes at https://github.com/TiongSun/recoding_landscape.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Crick, F. H., Barnett, L., Brenner, S. & Watts-Tobin, R. J. General nature of the genetic code for proteins. Nature 192, 1227–1232 (1961).

  2. 2.

    Kudla, G., Murray, A. W., Tollervey, D. & Plotkin, J. B. Coding-sequence determinants of gene expression in Escherichia coli. Science 324, 255–258 (2009).

  3. 3.

    Cho, B. K. et al. The transcription unit architecture of the Escherichia coli genome. Nat. Biotechnol. 27, 1043–1049 (2009).

  4. 4.

    Li, G. W., Oh, E. & Weissman, J. S. The anti-Shine–Dalgarno sequence drives translational pausing and codon choice in bacteria. Nature 484, 538–541 (2012).

  5. 5.

    Sørensen, M. A. & Pedersen, S. Absolute in vivo translation rates of individual codons in Escherichia coli: the two glutamic acid codons GAA and GAG are translated with a threefold difference in rate. J. Mol. Biol. 222, 265–280 (1991).

  6. 6.

    Curran, J. F. & Yarus, M. Rates of aminoacyl-tRNA selection at 29 sense codons in vivo. J. Mol. Biol. 209, 65–77 (1989).

  7. 7.

    Kimchi-Sarfaty, C. et al. A “silent” polymorphism in the MDR1 gene changes substrate specificity. Science 315, 525–528 (2007).

  8. 8.

    Zhang, G., Hubalewska, M. & Ignatova, Z. Transient ribosomal attenuation coordinates protein synthesis and co-translational folding. Nat. Struct. Mol. Biol. 16, 274–280 (2009).

  9. 9.

    Mittal, P., Brindle, J., Stephen, J., Plotkin, J. B. & Kudla, G. Codon usage influences fitness through RNA toxicity. Proc. Natl Acad. Sci. USA 115, 8639–8644 (2018).

  10. 10.

    Cambray, G., Guimaraes, J. C. & Arkin, A. P. Evaluation of 244,000 synthetic sequences reveals design principles to optimize translation in Escherichia coli. Nat. Biotechnol. 36, 1005–1015 (2018).

  11. 11.

    Quax, T. E., Claassens, N. J., Söll, D. & van der Oost, J. Codon bias as a means to fine-tune gene expression. Mol. Cell 59, 149–161 (2015).

  12. 12.

    Chin, J. W. Expanding and reprogramming the genetic code. Nature 550, 53–60 (2017).

  13. 13.

    Mukai, T. et al. Codon reassignment in the Escherichia coli genetic code. Nucleic Acids Res. 38, 8188–8195 (2010).

  14. 14.

    Lajoie, M. J. et al. Genomically recoded organisms expand biological functions. Science 342, 357–360 (2013).

  15. 15.

    Mukai, T. et al. Highly reproductive Escherichia coli cells with no specific assignment to the UAG codon. Sci. Rep. 5, 9699 (2015).

  16. 16.

    Napolitano, M. G. et al. Emergent rules for codon choice elucidated by editing rare arginine codons in Escherichia coli. Proc. Natl Acad. Sci. USA 113, E5588–E5597 (2016).

  17. 17.

    Wang, K. et al. Defining synonymous codon compression schemes by genome recoding. Nature 539, 59–64 (2016).

  18. 18.

    Lau, Y. H. et al. Large-scale recoding of a bacterial genome by iterative recombineering of synthetic DNA. Nucleic Acids Res. 45, 6971–6980 (2017).

  19. 19.

    Ostrov, N. et al. Design, synthesis, and testing toward a 57-codon genome. Science 353, 819–822 (2016).

  20. 20.

    Mukai, T. et al. Reassignment of a rare sense codon to a non-canonical amino acid in Escherichia coli. Nucleic Acids Res. 43, 8111–8122 (2015).

  21. 21.

    Hutchison, C. A. III et al. Design and synthesis of a minimal bacterial genome. Science 351, aad6253 (2016).

  22. 22.

    Gibson, D. G. et al. Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium genome. Science 319, 1215–1220 (2008).

  23. 23.

    Gibson, D. G. et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329, 52–56 (2010).

  24. 24.

    Shen, Y. et al. Deep functional analysis of synII, a 770-kilobase synthetic yeast chromosome. Science 355, eaaf4791 (2017).

  25. 25.

    Annaluru, N. et al. Total synthesis of a functional designer eukaryotic chromosome. Science 344, 55–58 (2014).

  26. 26.

    Xie, Z. X. et al. “Perfect” designer chromosome V and behavior of a ring derivative. Science 355, eaaf4704 (2017).

  27. 27.

    Mitchell, L. A. et al. Synthesis, debugging, and effects of synthetic chromosome consolidation: synVI and beyond. Science 355, eaaf4831 (2017).

  28. 28.

    Dymond, J. S. et al. Synthetic chromosome arms function in yeast and generate phenotypic diversity by design. Nature 477, 471–476 (2011).

  29. 29.

    Wu, Y. et al. Bug mapping and fitness testing of chemically synthesized chromosome X. Science 355, eaaf4706 (2017).

  30. 30.

    Zhang, W. et al. Engineering the ribosomal DNA in a megabase synthetic chromosome. Science 355, eaaf3981 (2017).

  31. 31.

    Richardson, S. M. et al. Design of a synthetic yeast genome. Science 355, 1040–1044 (2017).

  32. 32.

    Pósfai, G. et al. Emergent properties of reduced-genome Escherichia coli. Science 312, 1044–1046 (2006).

  33. 33.

    Chan, L. Y., Kosuri, S. & Endy, D. Refactoring bacteriophage T7. Mol. Syst. Biol. 1, 2005.0018 (2005).

  34. 34.

    Corey, E. J. & Cheng, X.-M. The Logic of Chemical Synthesis (John Wiley, Chichester, 1989).

  35. 35.

    Kouprina, N., Noskov, V. N., Koriabine, M., Leem, S. H. & Larionov, V. Exploring transformation-associated recombination cloning for selective isolation of genomic regions. Methods Mol. Biol. 255, 69–89 (2004).

  36. 36.

    Goodall, E. C. A. et al. The essential genome of Escherichia coli K-12. MBio 9, e02096-17 (2018).

  37. 37.

    Pundir, S., Martin, M. J. & O’Donovan, C. UniProt Protein Knowledgebase. Methods Mol. Biol. 1558, 41–55 (2017).

  38. 38.

    Claverie-Martin, F., Diaz-Torres, M. R., Yancey, S. D. & Kushner, S. R. Analysis of the altered mRNA stability (ams) gene from Escherichia coli. Nucleotide sequence, transcriptional analysis, and homology of its product to MRP3, a mitochondrial ribosomal protein from Neurospora crassa. J. Biol. Chem. 266, 2843–2851 (1991).

  39. 39.

    Jain, C. & Belasco, J. G. RNase E autoregulates its synthesis by controlling the degradation rate of its own mRNA in Escherichia coli: unusual sensitivity of the rne transcript to RNase E activity. Genes Dev. 9, 84–96 (1995).

  40. 40.

    Diwa, A., Bricker, A. L., Jain, C. & Belasco, J. G. An evolutionarily conserved RNA stem-loop functions as a sensor that directs feedback regulation of RNase E gene expression. Genes Dev. 14, 1249–1260 (2000).

  41. 41.

    Schuck, A., Diwa, A. & Belasco, J. G. RNase E autoregulates its synthesis in Escherichia coli by binding directly to a stem-loop in the rne 5′ untranslated region. Mol. Microbiol. 72, 470–478 (2009).

  42. 42.

    Isaacs, F. J. et al. Precise manipulation of chromosomes in vivo enables genome-wide codon replacement. Science 333, 348–353 (2011).

  43. 43.

    Ma, N. J., Moonan, D. W. & Isaacs, F. J. Precise manipulation of bacterial chromosomes by conjugative assembly genome engineering. Nat. Protocols 9, 2285–2300 (2014).

  44. 44.

    Lederberg, J. & Tatum, E. L. Gene recombination in Escherichia coli. Nature 158, 558 (1946).

  45. 45.

    Elliott, T. S., Bianco, A., Townsley, F. M., Fried, S. D. & Chin, J. W. Tagging and enriching proteins enables cell-specific proteomics. Cell Chem. Biol. 23, 805–815 (2016).

  46. 46.

    Elliott, T. S. et al. Proteome labeling and protein identification in specific tissues and at specific developmental stages in an animal. Nat. Biotechnol. 32, 465–472 (2014).

  47. 47.

    Krogager, T. P. et al. Labeling and identifying cell-specific proteomes in the mouse brain. Nat. Biotechnol. 36, 156–159 (2018).

  48. 48.

    Neidhardt, F. C. Escherichia coli and Salmonella typhimurium: Cellular and Molecular Biology (American Society for Microbiology, Washington, 1987)

Download references

Acknowledgements

This work was supported by the Medical Research Council (MRC), UK (MC_U105181009 and MC_UP_A024_1008), the Medical Research Foundation (MRF-109-0003-RG-CHIN/C0741) and an ERC Advanced Grant SGCR, all to J.W.C., and by the Lundbeck Foundation (R232-2016-3474) to J.F. J.W.C. thanks H. Pelham for supporting this project. We thank M. Skehel and the MRC-LMB mass spectrometry service for label-free-quantification-based proteomics; N. Barry for microscopy; A. Crisp for helping with Python scripts; and C. J. K. Wan, S. H. Kim, L. Dunsmore, N. Huguenin-Dezot and S. D. Fried for their support in experimental work.

Reviewer information

Nature thanks Abhishek Chatterjee, Tom Ellis and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Author notes

  1. These authors contributed equally: Julius Fredens, Kaihang Wang, Daniel de la Torre, Louise F. H. Funke, Wesley E. Robertson

Affiliations

  1. Medical Research Council Laboratory of Molecular Biology, Cambridge, UK

    • Julius Fredens
    • , Kaihang Wang
    • , Daniel de la Torre
    • , Louise F. H. Funke
    • , Wesley E. Robertson
    • , Yonka Christova
    • , Tiongsun Chia
    • , Wolfgang H. Schmied
    • , Daniel L. Dunkelmann
    • , Václav Beránek
    • , Chayasith Uttamapinant
    • , Andres Gonzalez Llamazares
    • , Thomas S. Elliott
    •  & Jason W. Chin
  2. Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA

    • Kaihang Wang
  3. School of Biomolecular Science and Engineering, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand

    • Chayasith Uttamapinant

Authors

  1. Search for Julius Fredens in:

  2. Search for Kaihang Wang in:

  3. Search for Daniel de la Torre in:

  4. Search for Louise F. H. Funke in:

  5. Search for Wesley E. Robertson in:

  6. Search for Yonka Christova in:

  7. Search for Tiongsun Chia in:

  8. Search for Wolfgang H. Schmied in:

  9. Search for Daniel L. Dunkelmann in:

  10. Search for Václav Beránek in:

  11. Search for Chayasith Uttamapinant in:

  12. Search for Andres Gonzalez Llamazares in:

  13. Search for Thomas S. Elliott in:

  14. Search for Jason W. Chin in:

Contributions

K.W. and T.C. designed the target genome sequence. T.C. generated scripts for data analysis. All authors, except T.S.E., contributed to assembly of sections. J.F., L.F.H.F., K.W. and A.G.L. led the fixing of deleterious synthetic sequences. J.F., D.d.l.T., L.F.H.F., W.E.R. and Y.C. led the assembly of sections into Syn61 and characterized the strain with the assistance of T.S.E. J.W.C. supervised the project and wrote the paper with the other authors.

Competing interests

The authors declare no competing interests.

Corresponding author

Correspondence to Jason W. Chin.

Extended data figures and tables

  1. Extended Data Fig. 1 Using 100-kb fragments of synthetic DNA to replace the corresponding regions in the genome through REXER, and using GENESIS for the stepwise replacement of genomic DNA by synthetic DNA to generate recoded sections.

    a, REXER uses CRISPR–Cas9- and lambda-red-mediated recombination to replace genomic DNA with synthetic DNA provided from an episome (BAC). This enables large regions of the genome (>100 kb) to be replaced by synthetic DNA17. The black triangles denote the location of CRISPR protospacers, which are cleaved by Cas9 to liberate the synthetic DNA (pink) cassette from the BAC flanked by homology regions. Homology regions 1 and 2 program the location of recombination into the E. coli genome. The double-selection cassette (−1, +1) ensures the integration of the synthetic DNA, and the double-selection cassette(−2, +2) on the genome ensures the removal of the corresponding wild-type DNA. In the example shown in the figure, +1 is kanR, −1 is rpsL, +2 is cat and −2 is sacB. b, Iterative cycles of REXER, with alternating choices of positive- and negative-selection cassettes, enables GENESIS17. This enables large sections of the synthetic genome to be assembled through the iterative addition of fragments, which replace the corresponding genomic sequences, in a clockwise manner. The first REXER of a 100-kb synthetic fragment of DNA leaves a −1, +1 double-selection cassette on the genome, which acts as a landing site for the downstream integration of a second fragment of synthetic DNA that contains a −2, +2 double-selection cassette. In the example shown, +1 is kanR, −1 is rpsL, +2 is cat and −2 is sacB, but the same logic can be used with different permutations of positive and negative selection markers on the genome and the BAC.

  2. Extended Data Fig. 2 Recoding ftsI-murE and map in fragment 1.

    a, Recoding landscape of fragment 1. We sequenced six clones after REXER. Each dot represents the frequency of recoding within the sequenced clones (y axis) for a target codon at the indicated position in the genome (x axis). Black dots indicate positions at which we did not observe recoding. Four codons and a refactoring of ftsI-murE, and one codon in map, were rejected. b, Refactoring the 14-bp overlap of ftsI and murE. The codons and overlaps are colour-coded by their post-REXER replacement frequency in the clones sequenced. Using our initial refactoring scheme (refactoring 1) (in which the overlap plus 20 bp of upstream sequence was duplicated), we did not observe replacement of the overlap by synthetic DNA (in the six clones sequenced after REXER). Refactoring scheme 2 (refactoring 2) (which duplicates the overlap plus 182 bp of upstream sequence) resulted in complete recoding of this region in 12 of the 16 post-REXER clones that we sequenced. c, Testing alternative codons at Ser4 in map. A double-selection cassette, pheS-HygR, on a constitutive EM7 promoter was introduced upstream of map, followed by a ribosome-binding site. We replaced the cassette using linear double-stranded DNA that introduces alternative codons (purple bar) at position four, via lambda-red recombination and negative selection for loss of pheS. DNA with AGC and AGT did not integrate (0/16 clones); we recovered one clone for AGC but sequencing revealed that it contained a mutant AAC (Asn) codon. TCT (6/8), TCC (6/16), ACA (6/8) and TTA (4/8) were allowed. d, Recoding landscape (purple) over the genomic region shown in a, following REXER with a BAC that contained refactoring scheme 2 for the ftsI-murE overlap and TCT at position 4 in map. In total, 2/7 post-REXER clones were completely refactored and recoded, and each target codon was replaced in at least 5/7 clones. The data from a are shown in red for comparison.

  3. Extended Data Fig. 3 Recoding rne and yceQ in fragment 9.

    a, Recoding landscape of fragment 9. Our designed synthetic sequence of fragment 9 was integrated into the genome by REXER, and 19 clones were completely sequenced by next-generation sequencing. The recoding landscape graph shows the frequency at which each target codon was recoded across the 19 clones. Although most codon replacements were accepted, recoding of a 26-kb region was consistently rejected; codon positions with a recoding frequency of zero in all the sequenced clones are indicated by black dots. To pinpoint the problematic sequence, 10-kb stretches of the genome (labelled G2 to G7) were deleted in the presence of the episomal copy of synthetic fragment 9. The synthetic sequence was sufficient to support deletion of all stretches except G4 (dark grey box), which suggests that an underlying problem is within this stretch. None of the nineteen clones was completely recoded. b, Recoding landscape of stretch G4. After REXER across the 10-kb G4 stretch, and sequencing of 10 clones, the recoding landscape shown was generated. This revealed a clear recoding minimum at yceQ—a ‘gene’ that encodes a predicted protein for which there is little evidence of transcription, protein synthesis or homologues37. All target codons in yceQ were recoded at least once in individual clones, but never simultaneously; thus, the minimum of the recoding landscape does not reach zero, and 0/10 clones were completely recoded. This is consistent with epistasis between the targeted positions. In the map below the recoding landscape, sequences annotated as essential are shown in dark grey and target codons are shown in red. The sequence position (x axis) is with reference to a. c, Altered design of the region surrounding rne in fragment 9. Top, original design of yceQ recoding and rne (which encodes RNase E) regulatory sequences. Target codons are shown in red. P1rne, P2rne and P3rne are the promoters (blue arrows) for the essential gene rne; these are found in and around the hypothetical gene yceQ. The −10 sequence of the major promoter P1rne is mutated by our initial design. The sequences that contains hairpin 1 (hp1) and hairpin 2 (hp2), which bind to RNase E to mediate transcript degradation, are shown as blue bars; these sequences encompass the remaining target codons and are also mutated by our initial design. Bottom, the second codon in yceQ was replaced with a stop codon (purple) and the remaining target codons retained their original sequence. The sequence position (x axis) is with reference to a. d, The modified fragment 9 (from c) was integrated on the genome, which resulted in complete recoding in 4/5 clones that we sequenced. The axes of the graph are the same as in a. The recoding landscape for the modified fragment 9, derived from sequencing five clones, is shown in purple. The data from a are reproduced for comparison.

  4. Extended Data Fig. 4 Recoding yaaY in fragment 37a.

    a, Recoding landscape of fragment 37a. Our designed synthetic sequence of fragment 37a was integrated into the genome by REXER, and six clones were completely sequenced by next-generation sequencing. Although most codon replacements were accepted, recoding of a 6.5-kb region was consistently rejected. Target-codon positions that were never recoded in the six clones sequenced are indicated by black dots. b, Identification of the problematic target codon. Within the identified 6.5-kb problematic region, we first focused on codons in essential genes (dark grey arrows) rather than non-essential genes (light grey arrows). Sanger sequencing (black bar) of 24 clones showed that 2 clones were recoded in all 6 target codons within a sub-section of the essential genes. Further Sanger sequencing of the remaining target codons in essential genes in these two clones revealed that 1 clone was recoded at all 17 target codons. This clone was completely sequenced by next-generation sequencing and used to generate a recoding landscape, in which each target codon is either recoded (red) or not recoded (black). In combination with the recoding landscape in a, this enabled us to identify a problematic region 1.8-kb upstream of ribF. Here we focused on the four target codons in the genes rpsT and yaaY as the nearest codons to the essential ribF gene. Sanger sequencing of 33 clones across this sequence revealed only 1 codon that was never recoded—the codon for Ser70 in the hypothetical gene yaaY (sequencing results are shown as colour-coded on the gene map of rspT and yaaY). We therefore investigated alternative codon replacements in yaaY. c, Alternative codon replacement in the hypothetical gene yaaY. At position Ser70 in this gene, replacement of TCA with AGT was not successful. To investigate alternative codon replacement schemes, a double-selection marker (pheS-HygR) on a constitutive EM7 promoter, followed by a ribosome-binding site, was introduced into yaaY, 12 bp upstream of the codon for Ser70. The negative-selection marker was then used to select for clones that had replaced the cassette using linear double-stranded DNA that introduces alternative codons (purple bar) at position 70, via lambda-red recombination. Although linear double-stranded DNA with AGT did not integrate (0/16 clones), integration of double-stranded DNA with TCC (2/16), TCG (2/16), TCT (6/16) and AGC (9/16) proved viable. d, Recoding landscape following REXER with a BAC that contains a corrected version of fragment 37a, bearing AGC at position Ser70 in the hypothetical gene yaaY (purple). When integrated by REXER, we identified 1/7 completely recoded clones. AGC at position Ser70 in yaaY was introduced in 4/7 clones.

  5. Extended Data Fig. 5 Substitutions in the hypothetical gene yceQ overlap with regulatory elements in rne.

    a, In our original design, a programmed substitution of a TCA (blue) to AGT (red) in the hypothetical gene yceQ leads to mutation of the −10 region of the P1rne promoter (boxed). The transcriptional start site (tss) of this promoter for rne transcription is indicated by an arrow; this is the major promoter for rne transcription. b, Target-codon substitutions overlap with and may potentially disrupt the key regulatory hairpins (hp2 and hp3) in the long 5′ untranslated region of the rne transcript. hp2 and hp3 mediate a regulatory feedback loop, in which RNase E is recruited to the mRNA to promote degradation of its own transcript. A schematic of the wild-type secondary structure of the rne 5′ untranslated region is shown40. The target codons for synonymous replacement are highlighted in blue.

  6. Extended Data Fig. 6 Completing sections A, B and H.

    a, GENESIS was initiated with fragment 4 and proceeded smoothly until fragment 9, in which we were unable to recode yceQ. Identifying and fixing the problems with our initial design of fragment 9 was carried out as described in Extended Data Fig. 3, by introducing a stop codon (yellow line) at the start of the predicted yceQ ORF. Following a swap of the sacB-cat (sC) double-selection cassette at the end of fragment 9 for a pheS-HygR (pH) double selection cassette, this strain was ready to act as the recipient for conjugation to assemble a strain in which fragments 4–13 (section A plus section B) are fully recoded. In parallel, we continued to recode the strain that contains the recoded fragment 4 to incomplete fragment 9 by GENESIS; this generated a second strain for assembly in which fragments 4–8 and 10–13 were completely recoded, and fragment 9 was partially recoded. We then integrated oriT (white triangle) 3 kb upstream of the start of fragment 10 in the second strain to generate a donor for conjugation, to assemble a strain in which fragments 4–13 (section A plus section B) are fully recoded. Conjugation of the donor and recipient strains resulted in a strain in which sections A and B are fully recoded. rK, rpsL-kanR double-selection cassette. b, Individual REXER of fragments 37a and 1 led to incomplete recoding. We carried out troubleshooting of both fragments independently (Extended Data Figs. 2, 4). The repairs are indicated with yellow and purple lines in fragment 37a and fragment 1, respectively. Each strain then served as a starting point for two independent sets of GENESIS; one generated 37a–37b (on the left) and ended in an rpsL-kanR double-selection cassette, and one generated 1–3 (on the right) and ended in a sacB-cat double-selection cassette. We integrated an oriT (white triangle) 3 kb upstream of the start of fragment 1, and this strain served as a donor for the directed conjugation of 1–3 into 37a–37b. The correct product was selected for by the gain of cat and the loss of rpsL. This resulted in the completion of section H in a single strain.

  7. Extended Data Fig. 7 Assembly of an organism with a fully synthetic genome through conjugation of recoded genome sections.

    a, Schematic assembly of partially synthetic donor and recipient genomes into a more-synthetic genome, through conjugation. In the recipient cell, the recoded genome section (pink) is extended with recoded DNA (dark pink)—commonly, 3–4 kb—by a lambda-red-mediated recombination and positive and negative selection; this step takes advantage of the genomic markers at the end of the recoded sequence that are introduced by GENESIS, and provides a homology region with the end of the recoded fragment in the donor strain. The donor strain is prepared by integration of an oriT at the end of the recoded DNA. The indicated positive and negative selection ensures the survival of recipient strains, and selects for recipients that have successfully integrated the synthetic DNA from the donor. An F′ plasmid that contains a mutation in the oriT sequence that makes it non-transferrable was used to facilitate conjugation of the donor genome to the recipient. +2, cat; −2, sacB; +3, HygR; −3, pheS; +4, aacC1 (a gene conferring gentamycin resistance); +5, tetA (a gene conferring tetracycline resistance). The homologous regions in the donor and recipient are both shown in dark pink. b, Synthetic genomic sections (pink) from multiple individual partially recoded genomes were assembled into a single fully recoded genome using conjugative assembly. The donor (d) and recipient (r) strains contain unique recoded genomic sections labelled in pink; recoded overlapping homology regions (3 kb to 400 kb in size) were used to seamlessly recombine the strains, and are shown in dark pink. Small homology regions ranging from 3 to 5 kb in size are denoted with an asterisk. Conjugations for which we used greater than 5-kb homology (HR) are indicated. For assembly, the recoded genomic content from the donor was conjugated in a clockwise manner to replace the corresponding wild-type genomic section (grey) in the recipient. The origin of strain AB and strain H is described in detail in Extended Data Fig. 6; all other individual synthetic genomes were generated by GENESIS (Extended Data Fig. 1). Conjugation followed by recombination proceeded until the final fully recoded A–H strain was assembled and sequence-verified by next-generation sequencing.

  8. Extended Data Fig. 8 Characterization of an organism with a fully synthetic genome.

    a, Doubling times for Syn61 and MDS42. Our fully synthetic recoded E. coli Syn61 has a doubling time that is 1.6× longer than that of MDS4232, when grown in standard medium conditions (90.1 min versus 57.6 min in lysogeny broth (LB) + 2% glucose). The ratio of growth rates between Syn61 and MDS42 in LB (decreased carbon catabolite repression) at 37 °C is 1.7, in M9 minimal medium is 1.7, in richer medium (2XTY) is 1.4, in LB at 25 °C is 2.5 and in LB at 42 °C is 1.3. The doubling times in different medium conditions are: LB at 37 °C, 58.3 min and 100.6 min; LB + 2% glucose, 57.6 min and 90.1 min; M9 minimal medium, 130.5 min and 221.1 min; 2XTY, 68.2 min and 92.6 min; LB at 25 °C, 86.3 min and 218.4 min; LB at 42 °C, 77.4 min and 99.7 min, for MDS42 and Syn61, respectively. Syn61 containing a plasmid without (−) or with (+) serV exhibited a growth-rate ratio of 0.99 (138.3 min versus 136.2 min). Doubling times represent the average of ten independently grown biological replicates of each strain, and are shown as mean ± s.d. (see Supplementary Methods). The data for individual experiments are represented by dots. b, Representative microscopy images of E. coli strain MDS42 and Syn61. Samples were imaged on an upright Zeiss Axiophot phase-contrast microscope using a 63× 1.25 NA Plan Neofluar phase objective (see Supplementary Methods). The experiment was performed twice with similar results. c, Histogram of cell lengths quantified from microscopy images of strains MDS42 and Syn61. The mean cell length (±s.d.) for MDS42 was 1.97 ± 0.57 μm and for Syn61 was 2.3 ± 0.74 μm. Images of n = 500 cells were taken during exponential growth phase for both strains. Cell-length measurements were made using Nikon NIS Elements software (see Supplementary Methods). A 1-μm lower size limit was imposed to remove background particulates and dust from quantification; this also precludes quantification of extracellular vesicles. d, Label-free quantification of the MDS42 and Syn61 proteomes. Each strain was grown in three biological replicates. Each biological replicate was analysed by tandem mass spectrometry in technical duplicate. Technical duplicates of biological replicates were merged. A total of 1,084 proteins was quantified across the samples. No protein quantified in both MDS42 and Syn61 differed in abundance—as judged by label-free quantification values—by more than 1.16-fold.

  9. Extended Data Fig. 9 Consequences of synonymous codon compression in Syn61.

    a, Synonymous codon compression and deletion of prfA, serU and serT in E. coli. The grey boxes shows the E. coli serine codons and stop codons, together with the tRNAs and release factors that decode them in wild-type E. coli (WT genome). tRNA anticodons and release factors are connected to the codons that they are predicted to read by black lines. The tRNA and release factor genes are shown in the black boxes. Synonymous codon compression (syn. codon. comp.) leads to Syn61 cells with a recoded genome (pink boxes), in which TCG and TCA codons are removed. The abundance of each codon is listed in its box. b, As in Fig. 4b, but with the M. mazei PylRS/tRNAPylUGA pair (anticodon UGA). There are fewer cognate codons to this anticodon in Syn61 than in MDS42; CYPK addition might therefore be expected to be less toxic in Syn61, as observed. c, As in Fig. 4b, but with the M. mazei PylRS/tRNAPylGCU pair (anticodon GCU). There are a greater number of cognate codons to this anticodon in Syn61 than in MDS42; CYPK addition might therefore be expected to be more toxic in Syn61, as observed. d, serT (dark grey) is deleted by insertion of a PheS-HygR double-selection cassette (black) via lambda-red-mediated recombination. Recombination yields new junctions 1 and 2, indicated by green and blue bars. For each recombination, both junctions were sequence-verified by Sanger sequencing. Above the Sanger chromatograms, the arrows indicate the precise location of the junction, the blue bar indicates the sequence that corresponds to the selection cassette and the green bar corresponds to the genomic sequence that flanks the selection cassette. The primers used to generate selection cassettes with suitable homologies to serU, serT and prfA for recombination are provided in Supplementary Data 21. The experiment was performed once. e, prfA (dark grey) is deleted by the insertion of an rpsL-kanR double-selection cassette (in black) via lambda-red-mediated homologous recombination. The agarose gels are annotated as described in Fig. 4c, and the rest of the data are annotated as described in d. The experiment was performed once. f, serU (dark grey) is deleted by insertion of a PheS-HygR double-selection cassette (in black) via lambda-red-mediated recombination. The agarose gels are annotated as described in Fig. 4c, and the rest of the data are annotated as described in d. The experiment was performed once. The full gels are available in Supplementary Fig. 1.

  10. Extended Data Fig. 10 The scale of genome synthesis, and scale and fidelity of recoding.

    a, Genome and chromosome synthesis. The size (in Mb) of synthetic genomes that have been produced for M. genitalium and M. mycoides22,23, and several S. cerevisiae chromosomes24,25,26,27,28,29,30,31 (light grey). The size of the synthetic E. coli genome presented here is shown in dark grey. b, Genome recoding efforts. Attempts to recode target codons TTA and TTG in Salmonella enterica serovar Typhimurium LT218; AGC, AGT, TTG, TTA, AGA, AGG and TAG in E. coli19; AGA and AGG in E. coli16, as well as recoding of all TAG in E. coli14 (light grey), compared to the removal of all TCA, TCG and TAG in E. coli presented here (dark grey). The total number of codons recoded in a single strain is shown on the graph, and the maximum percentage of target codons recoded in a single strain in each effort is indicated. c, Number of reported non-programmed mutations and indels as a function of the number of target codons recoded for the experiments shown in b.

Supplementary information

  1. Supplementary Information

    This file contains the Supplementary Methods, Supplementary References and Supplementary Figure 1. Supplementary Figure 1 shows the full gels with the corresponding Figure panel. The molecular size standards are annotated and the area shown in the relevant Figure is indicated by a white outline.

  2. Reporting Summary

  3. Supplementary Data

    This zipped file contains Supplementary Datasets 1–21 and a Supplementary Data guide.

About this article

Publication history

Received

Accepted

Published

Issue Date

DOI

https://doi.org/10.1038/s41586-019-1192-5

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.