Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Continuous synthesis of E. coli genome sections and Mb-scale human DNA assembly

Abstract

Whole-genome synthesis provides a powerful approach for understanding and expanding organism function1,2,3. To build large genomes rapidly, scalably and in parallel, we need (1) methods for assembling megabases of DNA from shorter precursors and (2) strategies for rapidly and scalably replacing the genomic DNA of organisms with synthetic DNA. Here we develop bacterial artificial chromosome (BAC) stepwise insertion synthesis (BASIS)—a method for megabase-scale assembly of DNA in Escherichia coli episomes. We used BASIS to assemble 1.1 Mb of human DNA containing numerous exons, introns, repetitive sequences, G-quadruplexes, and long and short interspersed nuclear elements (LINEs and SINEs). BASIS provides a powerful platform for building synthetic genomes for diverse organisms. We also developed continuous genome synthesis (CGS)—a method for continuously replacing sequential 100 kb stretches of the E. coli genome with synthetic DNA; CGS minimizes crossovers1,4 between the synthetic DNA and the genome such that the output for each 100 kb replacement provides, without sequencing, the input for the next 100 kb replacement. Using CGS, we synthesized a 0.5 Mb section of the E. coli genome—a key intermediate in its total synthesis1—from five episomes in 10  days. By parallelizing CGS and combining it with rapid oligonucleotide synthesis and episome assembly5,6, along with rapid methods for compiling a single genome from strains bearing distinct synthetic genome sections1,7,8, we anticipate that it will be possible to synthesize entire E. coli genomes from functional designs in less than 2 months.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Universal spacers enable scarless replacement of genomic DNA with synthetic DNA.
Fig. 2: CONEXER is a rapid, simplified and standardized method for genome synthesis from synthetic DNA in episomes.
Fig. 3: Megabase-scale assembly of DNA using BASIS.
Fig. 4: Deletion of recA from the host genome increases the fraction of fully recoded clones in CONEXER-mediated genomic replacement.
Fig. 5: CGS from episomes.

Similar content being viewed by others

Data availability

The sequences and design details used in this study are available in the Supplementary Data. Supplementary Data 1 provides all of the spacer sequences used in CONEXER and BASIS experiments. Supplementary Data 2 lists all of the nucleotide sequences of oligonucleotides, plasmids and BACs used in this study. Supplementary Data 3 provides the GenBank file of spacer plasmid pKW3 to express universal spacer set 1. Supplementary Data 4 provides the GenBank file of spacer plasmid pKW3 to express universal spacer set 2. Supplementary Data 5 provides the GenBank file of a general CONEXER BAC design with spacer sequences of universal spacer set 1. Supplementary Data 6 provides the GenBank file of a general CONEXER BAC design with spacer sequences of universal spacer set 2. Supplementary Data 7 provides the GenBank file of plasmid pLF118 for λ-red recombineering. Supplementary Data 8 provides the GenBank file of CFTR BAC01. Supplementary Data 9 provides the GenBank file of CFTR BAC02. Supplementary Data 10 provides the GenBank file of CFTR BAC03. Supplementary Data 11 provides detailed results for sequence verification, listing all variants and their classification as called for the final CFTR assembly. Supplementary Data 12 lists all raw sequencing data, including the NCBI SRA accession numbers, as deposited at BioProject PRJNA962525. Supplementary Data 13 provides the GenBank file of plasmid pFR015 used for retron-editing. Supplementary Data 14 provides the GenBank file of helper plasmid pFR156 for retron-editing. Supplementary Data 15 provides the GenBank file of plasmid pHBA008, which was used as a template to amplify BASIS components for human BAC adaptation. Supplementary Data 16 provides the GenBank file of plasmid pHBA010, which was used as a template to amplify BASIS components for human BAC adaptation. Supplementary Data 17 provides the GenBank file of initial assembly acceptor plasmid pHBA031 for the 1.1 Mb BASIS assembly. Supplementary Data 18 provides detailed results for sequence verification, listing all true positive variants as called for the final 1.1 Mb BASIS assembly. Supplementary Data 19 provides detailed results for sequence verification, listing all variants as called for the final 1.1 Mb BASIS assembly and categorized as either likely false-positive or false-positive. Supplementary Data 20 provides the GenBank file of plasmid pSP43 with spacer sequences for gene knockout by CRISPR–Cas9-mediated cleavage and λ-red recombineering. All other datasets generated and/or analysed in this study are available from the corresponding author on reasonable request. All materials (Supplementary Data 3, 4, 710, 1317 and 20) from this study are available from the corresponding author on reasonable request.

Code availability

Code used for generating the recoding landscape (https://github.com/JWChin-Lab/recoding-landscape-generator); for sequence data alignment and analysis (https://github.com/JWChin-Lab/NGS-analysis); for analysis of BASIS constructs (https://github.com/JWChin-Lab/BASIS-sequence-analysis); and for automated liquid handling for next-generation sequencing library preparation (https://github.com/JWChin-Lab/NGS-sample-prep) is available at GitHub.

References

  1. Fredens, J. et al. Total synthesis of Escherichia coli with a recoded genome. Nature 569, 514–518 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  2. Gibson, D. G. et al. Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium genome. Science 319, 1215–1220 (2008).

    Article  ADS  CAS  PubMed  Google Scholar 

  3. Gibson, D. G. et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329, 52–56 (2010).

    Article  ADS  CAS  PubMed  Google Scholar 

  4. Wang, K. H. et al. Defining synonymous codon compression schemes by genome recoding. Nature 539, 59–64 (2016).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  5. Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).

    Article  CAS  PubMed  Google Scholar 

  6. Kouprina, N. & Larionov, V. TAR cloning: insights into gene function, long-range haplotypes and genome structure and evolution. Nat. Rev. Genet. 7, 805–812 (2006).

    Article  CAS  PubMed  Google Scholar 

  7. Wang, K., de la Torre, D., Robertson, W. E. & Chin, J. W. Programmed chromosome fission and fusion enable precise large-scale genome rearrangement and assembly. Science 365, 922–926 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  8. Ma, N. J., Moonan, D. W. & Isaacs, F. J. Precise manipulation of bacterial chromosomes by conjugative assembly genome engineering. Nat. Protoc. 9, 2285–2300 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Robertson, W. E. et al. Sense codon reassignment enables viral resistance and encoded polymer synthesis. Science 372, 1057–1062 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  10. Zurcher, J. F. et al. Refactored genetic codes enable bidirectional genetic isolation. Science 378, 516–523 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  11. Nyerges, A. et al. A swapped genetic code prevents viral infections and gene transfer. Nature 615, 720–727 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  12. Spinck, M. et al. Genetically programmed cell-based synthesis of non-natural peptide and depsipeptide macrocycles. Nat. Chem. 15, 61–69 (2023).

    Article  CAS  PubMed  Google Scholar 

  13. Richardson, S. M. et al. Design of a synthetic yeast genome. Science 355, 1040–1044 (2017).

    Article  ADS  CAS  PubMed  Google Scholar 

  14. Lajoie, M. J. et al. Genomically recoded organisms expand biological functions. Science 342, 357–360 (2013).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  15. Ostrov, N. et al. Design, synthesis, and testing toward a 57-codon genome. Science 353, 819–822 (2016).

    Article  ADS  CAS  PubMed  Google Scholar 

  16. Lau, Y. H. et al. Large-scale recoding of a bacterial genome by iterative recombineering of synthetic DNA. Nucleic Acids Res. 45, 6971–6980 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Hutchison, C. A. 3rd et al. Design and synthesis of a minimal bacterial genome. Science 351, aad6253 (2016).

    Article  PubMed  Google Scholar 

  18. Shao, Y. et al. Creating a functional single-chromosome yeast. Nature 560, 331–335 (2018).

    Article  ADS  CAS  PubMed  Google Scholar 

  19. Giani, A. M., Gallo, G. R., Gianfranceschi, L. & Formenti, G. Long walk to genomics: history and current approaches to genome sequencing and assembly. Comput. Struct. Biotechnol. J. 18, 9–19 (2020).

    Article  CAS  PubMed  Google Scholar 

  20. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  ADS  CAS  PubMed  Google Scholar 

  21. Neil, D. L. et al. Structural instability of human tandemly repeated DNA sequences cloned in yeast artificial chromosome vectors. Nucleic Acids Res. 18, 1421–1428 (1990).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Haubold, B. & Wiehe, T. How repetitive are genomes? BMC Bioinform. https://doi.org/10.1186/1471-2105-7-541 (2006).

  23. Yoneji, T., Fujita, H., Mukai, T. & Su’etsugu, M. Grand scale genome manipulation via chromosome swapping in Escherichia coli programmed by three one megabase chromosomes. Nucleic Acids Res. 49, 8407–8418 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Yu, D. et al. An efficient recombination system for chromosome engineering in Escherichia coli. Proc. Natl Acad. Sci. USA 97, 5978–5983 (2000).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  25. Mejia, J. E. & Larin, Z. The assembly of large BACs by in vivo recombination. Genomics 70, 165–170 (2000).

    Article  CAS  PubMed  Google Scholar 

  26. Mukai, T. et al. Overcoming the challenges of megabase-sized plasmid construction in Escherichia coli. ACS Synth. Biol. 9, 1315–1327 (2020).

    Article  CAS  PubMed  Google Scholar 

  27. Kotzamanis, G. & Huxley, C. Recombining overlapping BACs into a single larger BAC. BMC Biotechnol. 4, 1 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Sopher, B. L. & La Spada, A. R. Efficient recombination-based methods for bacterial artificial chromosome fusion and mutagenesis. Gene 371, 136–143 (2006).

    Article  CAS  PubMed  Google Scholar 

  29. Lovett, S. T. in Bacterial Stress Responses 2nd edn (eds Storz, G. & Hengge, R.) 205–228 (2011); https://doi.org/10.1128/9781555816841.ch13.

  30. Anstey-Gilbert, C. S. et al. The structure of Escherichia coli ExoIX-implications for DNA binding and catalysis in flap endonucleases. Nucleic Acids Res. 41, 8357–8367 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Liu, Y., Kao, H. I. & Bambara, R. A. Flap endonuclease 1: a central component of DNA metabolism. Annu. Rev. Biochem. 73, 589–615 (2004).

    Article  CAS  PubMed  Google Scholar 

  32. Ellsworth, R. E. et al. Comparative genomic sequence analysis of the human and mouse cystic fibrosis transmembrane conductance regulator genes. Proc. Natl Acad. Sci. USA 97, 1172–1177 (2000).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  33. Krzywinski, M. et al. A set of BAC clones spanning the human genome. Nucleic Acids Res. 32, 3651–3660 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Sun, J. X. et al. A direct characterization of human mutation based on microsatellites. Nat. Genet. 44, 1161–1165 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Szklarczyk, D. et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49, 10800–10800 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. van der Oost, J. & Patinios, C. The genome editing revolution. Trends Biotechnol. 41, 396–409 (2023).

    Article  PubMed  Google Scholar 

  38. Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini, L. A. RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nat. Biotechnol. 31, 233–239 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Tong, Y., Jorgensen, T. S., Whitford, C. M., Weber, T. & Lee, S. Y. A versatile genetic engineering toolkit for E. coli based on CRISPR-prime editing. Nat. Commun. 12, 5206 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  40. Datsenko, K. A. & Wanner, B. L. One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc. Natl Acad. Sci. USA 97, 6640–6645 (2000).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  41. Waters, V. L. Conjugation between bacterial and mammalian cells. Nat. Genet. 29, 375–376 (2001).

    Article  CAS  PubMed  Google Scholar 

  42. Lee, E. C. et al. Complete humanization of the mouse immunoglobulin loci enables efficient therapeutic antibody discovery. Nat. Biotechnol. 32, 356–363 (2014).

    Article  CAS  PubMed  Google Scholar 

  43. Macdonald, L. E. et al. Precise and in situ genetic humanization of 6 Mb of mouse immunoglobulin genes. Proc. Natl Acad. Sci. USA 111, 5147–5152 (2014).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  44. Pansegrau, W. et al. Complete nucleotide-sequence of Birmingham IncPα plasmids—compilation and comparative-analysis. J. Mol. Biol. 239, 623–663 (1994).

    Article  CAS  PubMed  Google Scholar 

  45. Robertson, W. E. et al. Creating custom synthetic genomes in Escherichia coli with REXER and GENESIS. Nat. Protoc. https://doi.org/10.1038/s41596-020-00464-3 (2021).

  46. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://doi.org/10.48550/arXiv.1303.3997 (2013).

  47. Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience https://doi.org/10.1093/gigascience/giab008 (2021).

  48. Ramirez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Smolka, M. et al. Comprehensive structural variant detection: from mosaic to population-level. Preprint at bioRxiv https://doi.org/10.1101/2022.04.04.487055 (2022).

  51. Cer, R. Z. et al. Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools. Nucleic Acids Res. 41, D94–D100 (2013).

    Article  CAS  PubMed  Google Scholar 

  52. Schubert, M. G. et al. High-throughput functional variant screens via in vivo production of single-stranded DNA. Proc. Natl Acad. Sci. USA https://doi.org/10.1073/pnas.2018181118 (2021).

Download references

Acknowledgements

We thank B. Porebski for support with next-generation sequencing on the Illumina HiSeq system; and Y. Gu and D. Czernecki for support with sequencing data analysis. This work was supported by the UK Medical Research Council (MRC; MC_U105181009, MC_UP_A024_1008 to J.W.C. and MC_U105178808 to J.E.S.), a Wellcome Investigator Award to J.W.C (220808/Z/20/Z) and a Wellcome Discretionary Award (221267/Z/20/Z) to J.W.C. and J.E.S. A.A.K and S.G. were supported by the Boehringer Ingelheim Fonds and the Cambridge Commonwealth, European and International Trust. J.W.C. and J.E.S. thank the members of the Wellcome Sanger Institute for support through the associate faculty program.

Author information

Authors and Affiliations

Authors

Contributions

J.F.Z., A.A.K., L.F.H.F., J.B. and J.F. contributed equally to this work. J.F. and L.F.H.F. developed universal spacers. L.F.H.F. developed CONEXER. J.B., A.A.K., J.F.Z., M.S. and L.F.H.F. created host factor knockouts and investigated their consequences for CONEXER. A.A.K. and J.F.Z. established BASIS. S.G. and J.B. contributed to the construction of BASIS BACs. A.A.K., P.M., S.G. and J.F.Z. performed bioinformatic analysis for BASIS constructs. J.E.S. supervised P.M. A.A.K., J.F.Z. and F.B.H.R. modified BASIS BACs. S.G. and G.P. demonstrated CFTR expression in human cells. L.F.H.F. and J.F.Z. developed CGS. J.F.Z. implemented CGS in ∆recA cells. K.C.L. contributed to automated NGS, phenotyping and data analysis. J.W.C. set the direction of research and supervised the project. J.W.C., A.A.K., J.F.Z., J.F. and L.F.H.F. wrote the paper with input from all of the authors.

Corresponding author

Correspondence to Jason W. Chin.

Ethics declarations

Competing interests

J.W.C. is the founder of Constructive Bio. The MRC has filed a patent application covering the work described.

Peer review

Peer review information

Nature thanks Benjamin Blount and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Steps in REXER-mediated integration of ~100 kb of synthetic DNA into the E. coli genome using homology region (HR)-specific spacers.

REXER allows integration of more than 100 kb of synthetic DNA (pink) into the genome, through replacement of the corresponding genomic DNA. A bacterial artificial chromosome (BAC) containing the synthetic DNA of interest is electroporated into competent cells with a suitably marked genome, the cells also contain a helper plasmid encoding the Cas9 protein and the lambda red recombination components. Selection for the helper plasmid (+5) and the BAC (+2) is applied. A clonal cell is then expanded and induced with arabinose to express the helper plasmid genes and made electrocompetent again. HR-specific spacer arrays (either plasmid-based as shown, or as linear DNA) are then electroporated into the cell; this leads to CRISPR/Cas9 mediated in vivo excision of the synthetic DNA, flanked by a double selection cassette (+2/−2) and HRs to the genome, from the BAC. The lambda red recombination machinery then uses the HRs to direct the integration of the excised DNA into the genome. Triangles denote the Cas9 cleavage sites at the HRs (grey boxes) flanking the synthetic DNA. Selection on tetracycline (maintenance of +5), ampicillin (maintenance of +6), chloramphenicol (maintenance of +2), and streptomycin (loss of −1) ensures only cells where the recombination took place over the whole section survive. The selectable markers are +1, blue, kanR (selected for with kanamycin); −1, yellow, rpsL (selected against with streptomycin); +2, green, cat (selected for with chloramphenicol); −2, pink, sacB (selected against with sucrose); +5, dark blue tetR (selected for with tetracycline); +6, red ampR (selected for with ampicillin).

Extended Data Fig. 2 Step-wise depiction of CONEXER procedure.

a, In an odd step of CONEXER, recipient cells with a +1/−1 double selection cassette in their genome (+1, kanR (confers growth on kanamycin); −1, rpsL (confers sensitivity to streptomycin)) and a tetracycline resistance (+5, tetR (confers resistance to tetracycline)) conferring plasmid encoding arabinose inducible lambda red components and Cas9 are mixed with donor cells and spotted on an agar plate. Donor cells contain an odd BAC and a non-transferable F’ plasmid. During incubation on the plate (1h, 37 °C) the BAC is conjugated from donor to recipient cells. Subsequently, cells are washed off the plate and inoculated in selective media containing tetracycline (selection for maintenance of +5) and chloramphenicol (selection for gain of +2), this selects for recipient cells that have received the BAC. Arabinose is also added to induce lambda red components and Cas9; excision of the linear DNA and recombination with the genome is induced at this step. Cells are recovered in selective media containing tetracycline (selection for maintenance of +5) and chloramphenicol (selection for maintenance of +2) but no arabinose; this selects for recipient cells that have received the BAC. Finally, cells are plated on selective agar plates containing tetracycline (selection for maintenance of +5) – to select for recipient cells, chloramphenicol (selection for maintenance of +2) – to select for genomic integration of the +2/−2 double selection cassette from the BAC, and streptomycin (selection for loss of −1) – to select for loss of the +1/−1 double selection cassette from the genome and loss of the BAC backbone. b, In an even step of CONEXER, recipient cells with a +2/−2 double selection cassette in their genome (+2, cat (confers growth on chloramphenicol); −2, sacB (confers sensitivity to sucrose) and a tetracycline resistance (+5 tetR (confers resistance to tetracycline)) conferring plasmid encoding arabinose inducible lambda red components and Cas9 are mixed with donor cells and spotted on an agar plate. Donor cells contain an odd BAC and a non-transferable F’ plasmid. During incubation on the plate (1h, 37 °C) the BAC is conjugated from donor to recipient cells. Subsequently, cells are washed off the plate and inoculated in selective media containing tetracycline (selection for maintenance of +5) and kanamycin (selection for gain of +1); this selects for recipient cells that have received the BAC. Arabinose is also added to induce lambda red components and Cas9; excision of the linear DNA and recombination with the genome is induced at this step. Cells are recovered in selective media containing tetracycline (selection for maintenance of +5) and kanamycin (selection for maintenance of +1) but no arabinose; this selects for recipient cells that have received the BAC. Finally, cells are plated on selective agar plates containing tetracycline (selection for maintenance of +5) – to select for recipient cells, kanamycin (selection for maintenance of +1) – to select for genomic integration of the +1/−1 double selection cassette from the BAC, sucrose (selection for loss of −2) – to select for loss of the +2/−2 double selection cassette from the genome, and 4-CP (selection for loss of −3) – to select for loss of the BAC backbone. c, Clones from CONEXER experiments are picked from the selection plate. They are grown up individually in a 96-well plate and phenotyped, for the functionality of selection markers, on agar plates. Subsequently, clones that show the correct growth phenotype (even steps: growth on +1, −2; no growth on −1, +2; odd steps: growth on −1, +2; no growth on +1, −2) are sequenced by NGS. The selectable markers are +1, blue, kanR (selected for with kanamycin); −1, yellow, rpsL (selected against with streptomycin); +2, green, cat (selected for with chloramphenicol); −2, pink, sacB (selected against with sucrose); −3, orange, pheS* (selected against with 4-chlorophenylalanine); +5, dark blue, tetR (selected for with tetracycline).

Extended Data Fig. 3 CFTR assembly via BASIS.

a, Recipient cells containing a BASIS assembly BAC with the first section of the CFTR gene and a plasmid encoding Cas9 and lambda red components were mixed with donor cells. Donor cells contained a BASIS BAC encoding the second section of the CFTR gene and a non-transferable F’ plasmid. The donor BAC was conjugated to the recipient cell (A) and recipient cells selected for on tetracycline (+5, tetR (confers resistance to tetracycline)). Upon induction of protein expression from the helper plasmid, linear dsDNA was excised from the donor BAC (B). The excised DNA inserts into the assembly BAC between HR1 and uHR. Selection on hygromycin (selection for gain of +3) – to select for gain of the selection cassette from the donor BAC, sucrose (selection for loss of −2) – to select for loss of the selection cassette from the assembly BAC and loss of donor BAC backbone, tetracycline (selection for maintenance of +5) – to select for maintenance of the helper plasmid, ensured that only cells with the correctly assembled BAC survive. In step 2, recipient cells containing a BAC with the first and second section of the CFTR gene and a plasmid encoding Cas9, and lambda red components were mixed with donor cells. Donor cells contained a BASIS BAC encoding the third section of the CFTR gene and the non-transferable F’ plasmid. The donor BAC was conjugated to the recipient cell (A) and recipient cells selected for on tetracycline (+5, tetR (confers resistance to tetracycline)). Upon induction of protein expression from the helper plasmid, linear dsDNA was excised from the donor BAC (B). The excised DNA inserts into the assembly BAC between HR2 and uHR. Selection on chloramphenicol (selection for gain of +2) – to select for gain of the selection cassette from the donor BAC), 4-CP (loss of −3) – to select for loss of the selection cassette from the assembly BAC, streptomycin (loss of −1) – to select for loss of the donor BAC backbone, and tetracycline (maintenance of +5) – to select for maintenance of the helper plasmid, ensured that only cells with the correctly assembled BAC survive. b, Clones from BASIS experiments were picked from the selection plate. They were grown up individually in a 96-well plate and phenotyped for the functionality of selection markers on agar plates. Subsequently, clones that showed the correct growth phenotype and in some cases genotype for the assembly junctions by PCR (step 1: growth on hygromycin, growth on sucrose; no growth on 4-CP, no growth on chloramphenicol, and genotyping for insertion of the second section of CFTR (for primers see Supplementary Data 2); step 2: growth on 4-CP, growth on chlopramphenicol; no growth on hygromycin, no growth on sucrose) were sequenced by NGS. The selectable markers are +3, purple, hygR (selected for with hygromycin); −3, orange pheS* (selected against with 4-chlorophenylalanine); +2, green, cat (selected for with chloramphenicol); −2, pink, sacB (selected against with sucrose); −1, yellow rpsL (selected against with streptomycin); +5, dark blue, tetR (selected for with tetracycline).

Extended Data Fig. 4 Assembly of BACs to be used for CONEXER and BASIS, and adaptation of existing BACs derived from human BAC libraries.

a, BAC assembly from DNA fragments of 3–10 kb; these fragments may be derived from chemically synthesized oligonucleotides and/or may be amplified from natural sequences. 10 kb DNA fragments are assembled with established methods, either in vitro or in vivo by yeast assembly1,2,3. All fragments are assembled with the BAC backbone, which contains components required for subsequent CONEXER or BASIS steps (universal spacers, origin of transfer, marker cassettes). b, BACs used for assembly of a megabase-scale human genomic section are derived from human BAC libraries. The sequence of human DNA in these BACs overlap with each other, and these overlaps constitute the homology regions exploited for assembly. The universal spacer cassette, the origin of transfer, the universal homology region, and appropriate selection markers were introduced into BACs from the human BAC library, by one-step λ-red recombineering, to generate BASIS BACs.

Extended Data Fig. 5 BACs produced by BASIS can be extensively modified by lambda red recombineering and retron-mediated editing – to generate insertions, replacements, and edits.

a, For expression in human tissue culture, the endogenous CFTR promoter was replaced with an EF1alpha constitutive promoter using λ-red recombineering. To this end the EF1alpha promoter was coupled to an ampicillin resistance gene (+6, red ampR (confers resistance to ampicillin)). Following recombineering cells were selected on ampicillin (selection for gain of +6) – to select for replacement of the promoter. Sequence coverage of the EF1alpha prompter is shown (maximum coverage indicated in brackets). b, BACs produced by BASIS can be precisely edited using retron-mediated editing. A single strand binding protein and a retron containing the desired base pair substitutions were expressed in target cells containing the BAC. During replication annealing of the retron to the lagging strand led to the desired edits. We corrected two point mutations in exon 15 of the CFTR gene (Methods). Sanger sequencing traces of the region containing the point mutations are shown before (top - red) and after (bottom - green) editing. c, To distinguish BAC encoded CFTR from the endogenous gene an HA-tag was inserted into exon 17 of the CFTR gene on the BAC. This tag is known to be tolerated in the cDNA of CFTR4. First, a double selection cassette (+3, orange hygR (confers resistance to hygromycin); −3, purple pheS* (confers sensitivity to 4-CP)) was inserted into the locus of interest. Following recombineering cells were selected on hygromycin (selection for gain of +3) – to select for insertion of the double selection cassette. Subsequently, λ-red recombineering was used to replace the double selection cassette with an HA-tag. Following recombineering cells were selected on 4-CP (selection for loss of −3) – to select for replacement of the double selection cassette. Sequence coverage of exon 17 containing the HA-tag is shown (maximum coverage indicated in brackets).

Extended Data Fig. 6 Expression of CFTR from a modified BASIS BAC in mammalian cells.

a, Schematic of a 208 kb CFTR BAC: the entire human CFTR gene with an HA-tag (yellow-bar) was expressed under a constitutive EF1alpha promoter. The native transcript, containing exons and introns is spliced to form the mature transcript of approximately 4.5 kb in length. Additionally, EGFP and a Neomycin (Neo) resistance are expressed from a constitutive CAG promoter. b, FACS plots from human embryonic kidney (HEK293) cells transfected with the CFTR BAC (CFTR-HA_GFP) and a control (WT). On the x-axis the fluorescence intensity from GFP protein is displayed. On the y-axis side-scattering in displayed. Approximately, 1.19% of cells transfected with the CFTR BAC are positive for GFP. c, PCR was performed on cDNA from cells with and without the transfected CFTR BAC. The band (around 4.5 kb) corresponding to an amplicon of CFTR transcript is specific to cells transfected with the CFTR BAC. Band sizes of the ladder are indicated to the left (grey arrows). The experiment in c was performed in one biological replicate. d, NGS of the PCR product shown in (c) demonstrates that CFTR has been transcribed and processed properly in cells after transfection. The presence of an HA-tag in the transcript shows that the transcript is derived from the BASIS BAC encoded CFTR. Increased coverage at the flanks likely stems from sequencing of the primers.

Extended Data Fig. 7 Analysis of genomic features in the 1.1 Mb target region on chromosome 21.

Analysis of distribution of genomic features in the 1.1 Mb target region of the human genome assembled by BASIS. The genomic features in this region are compared to the distribution of the genomic feature throughout the whole genome, computed in 1 Mb windows. The red line indicates the fraction of each genomic feature for the 1.1 Mb target region. The blue line indicates the median for each genomic feature for the whole genome. The dotted lines represent the 5th and 95th percentile of the distribution.

Extended Data Fig. 8 CONEXER experiments on sequences that cannot be replaced by recoded synthetic DNA.

a, Codons in yceQ at positions 37’213, 37’227, 37’251 (indicated by green arrow), within 100k09, cannot be replaced by recoded synthetic DNA in which TCG codons are replaced with AGC, TCA codons are replaced with AGT, and TAG codons are replaced with TAA. Each graph shows a compiled recoding landscape in WT genetic background (blue) and ∆recA (red). Landscapes were generated by sequencing 13–16 clones from an independent CONEXER experiment with a CONEXER BAC bearing recoded sequence for 100k09. The resolution of the diagnostic landscapes (distance between the positions of the first and last event with recoding frequency 0) is indicated in parentheses. Three independent replicates of each experiment are shown. b, Bar graph of the diagnostic resolution of CONEXER experiments in section 100k09 in WT and ∆recA conditions; data are from n = 3 independent biological replicates shown in panel a. Data are represented as mean +/− standard deviation. The ∆recA condition is significantly better than the WT at localizing disallowed synthetic sequences (two-sided unpaired t test: p-value = 0.021). We note that previous experiments by REXER in WT background yielded a resolution of >20,000 bp1.

Extended Data Fig. 9 Continuous genome synthesis method.

Alternating even and odd rounds of CONEXER for continuous replacement of genomic DNA with synthetic DNA. a, Following an even round of CONEXER, clones are picked from an appropriate selection plate containing kanamycin (selection for gain of +1), sucrose (selection for loss of −2), 4-chlorophenylalanine (selection for loss of −3), and tetracycline (selection for maintenance of +5). Clones are amplified overnight and in parallel undergo a phenotypic screen. Amplified clones with all the correct phenotypes are pooled. This pool serves directly as the recipient for an odd round of CONEXER. After an odd round of CONEXER clones are picked from an appropriate selection plate containing chloramphenicol (selection for gain of +2), streptomycin (selection for loss of −1), and tetracycline (selection for maintenance of +5). Clones are amplified overnight and in parallel undergo a phenotypic screen. Amplified clones with all the correct phenotypes are pooled. This pool serves directly as the recipient for an even round of CONEXER. Cycling through even and odd rounds of CONEXER lead to continuous synthesis of a synthetic genome from the corresponding BACs. b, Colonies obtained from a step of CONEXER are picked from the selection plate. They are grown up individually in a 96-well plate and phenotyped for the functionality of selection markers on agar plates. Subsequently, clones that show the correct growth phenotype (even steps: growth on kanamycin (selection for +1), growth on sucrose (selection against −2); no growth on streptomycin (selection against −1), no growth on chloramphenicol (selection for +2); odd steps: growth on streptomycin (selection against −1), growth on chloramphenicol (selection for +2); no growth on kanamycin (selection for +1), no growth on sucrose (selection against −2)) are pooled into one culture. This culture immediately serves as the recipient strain for the next step of CONEXER. The selectable markers are +1, blue, kanR (selected for with kanamycin); −1, yellow, rpsL (selected against with streptomycin); +2, green, cat (selected for with chloramphenicol); −2, pink, sacB (selected against with sucrose); −3, orange, pheS (selected against with 4-chlorophenylalanine); +5, dark blue, tetR (selected for with tetracycline).

Extended Data Fig. 10 Continuous synthesis of 500 kb of a recoded E. coli genome from CONEXER BACs.

Continuous genome synthesis via rounds of CONEXER in ∆recA recipients. Genomic DNA is depicted in grey and synthetic, recoded DNA in pink. The selectable markers are +1, blue, kanR (selected for with kanamycin); −1, yellow, rpsL (selected against with streptomycin); +2, green, cat (selected for with chloramphenicol); −2, pink, sacB (selected against with sucrose). Each round of CONEXER replaces approximately 100 kb of the E. coli genome with synthetic DNA, and takes two days. Continuous synthesis of a 500 kb synthetic section in the E.coli genome was achieved in 10 days.

Supplementary information

Supplementary Information

Supplementary Notes 1 and 2 and Supplementary Figs. 1–6.

Reporting Summary

Supplementary Data

Supplementary Data 1–20 and a guide to the Supplementary Data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zürcher, J.F., Kleefeldt, A.A., Funke, L.F.H. et al. Continuous synthesis of E. coli genome sections and Mb-scale human DNA assembly. Nature 619, 555–562 (2023). https://doi.org/10.1038/s41586-023-06268-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-023-06268-1

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research