High-fidelity gene synthesis by retrieval of sequence-verified DNA identified using high-throughput pyrosequencing

Journal name:
Nature Biotechnology
Year published:
Published online

The construction of synthetic biological systems involving millions of nucleotides is limited by the lack of high-quality synthetic DNA. Consequently, the field requires advances in the accuracy and scale of chemical DNA synthesis and in the processing of longer DNA assembled from short fragments. Here we describe a highly parallel and miniaturized method, called megacloning, for obtaining high-quality DNA by using next-generation sequencing (NGS) technology as a preparative tool. We demonstrate our method by processing both chemically synthesized and microarray-derived DNA oligonucleotides with a robotic system for imaging and picking beads directly off of a high-throughput pyrosequencing platform. The method can reduce error rates by a factor of 500 compared to the starting oligonucleotide pool generated by microarray. We use DNA obtained by megacloning to assemble synthetic genes. In principle, millions of DNA fragments can be sequenced, characterized and sorted in a single megacloner run, enabling constructive biology up to the megabase scale.

At a glance


  1. Coalescence of DNA reading and writing.
    Figure 1: Coalescence of DNA reading and writing.

    The general approach begins with DNA from a variety of sources. Here we used oligonucleotides synthesized from microarrays as well as from conventional sources. Then, next-generation sequencing is used to read and identify oligonucleotides with desired sequences. Here we used the GS FLX platform (454/Roche). Finally, the DNA is sorted and retrieved selectively, in this case with a microactuator-controlled micropipette guided by two microscope cameras. The technologies used for retrieval depend on the sequencing platform.

  2. NGS-based comparison of untreated and megacloned oligonucleotide pools from microarray.
    Figure 2: NGS-based comparison of untreated and megacloned oligonucleotide pools from microarray.

    (a) Comparison of the initial microarray oligonucleotide pool (blue) and the pool enriched with the megacloner technology (red) based on the results of the Illumina GAII runs. The bars in set 1 represent the fraction of reads that could be mapped allowing up to three errors. Bars in set 2 show the fractions of perfectly matching reads to the sequence set of the initial pool (3,918 sequences). The difference between the blue and the red bar in set 2 represents the enrichment of correct sequences by megacloning. The bars in set 3 and set 4 show the fractions of reads mapping to sequences from the selected pool of 319 sequences. The difference between blue and red bars in set 3 shows the enrichment of a selected 319 sequences before megacloning compared with after. Blue and red bars in set 4 represent the enrichment of sequences that are in the set of 319 selected sequences and that are correct. (b) Histogram of read counts in the Illumina GAII data of the initial pool (blue) and the enriched megacloned sample (red). Only reads mapping without errors to one of the 319 selected target sequences have been taken into account. To compare the two NGS runs on the basis of read counts, we converted the numbers into parts-per-million (p.p.m.) from the total number of filtered reads. (c) Composition of reads from the Illumina GAII data including 319 selected sequences in the initial pool (top) and the enriched pool (bottom). The oligonucleotides are sorted by the fraction of correct reads. Green, correct reads; red, error-prone reads (compartments in the red bars represent single sequences with a read count of 0.1% or more of total reads for the particular sequence); light blue, sum of nonunique error-prone reads where each sequence represents less than 0.1% of total reads for the particular sequence; blue, unique reads. In the Illumina GAII data set from the enriched sample, just 315 out of 319 selected sequences could be detected.


  1. Endy, D. Foundations for engineering biology. Nature 438, 449453 (2005).
  2. Menzella, H.G. et al. Combinatorial polyketide biosynthesis by de novo design and rearrangement of modular polyketide synthase genes. Nat. Biotechnol. 23, 11711176 (2005).
  3. Gibson, D.G. et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329, 5256 (2010).
  4. Carr, P.A. & Church, G.M. Genome engineering. Nat. Biotechnol. 27, 11511162 (2009).
  5. Gao, X. et al. A flexible light-directed DNA chip synthesis gated by deprotection using solution photogenerated acids. Nucleic Acids Res. 29, 47444750 (2001).
  6. Singh-Gasson, S. et al. Maskless fabrication of light-directed oligonucleotide microarrays using a digital micromirror array. Nat. Biotechnol. 17, 974978 (1999).
  7. Tian, J. et al. Accurate multiplex gene synthesis from programmable DNA microchips. Nature 432, 10501054 (2004).
  8. Porreca, G.J. et al. Multiplex amplification of large sets of human exons. Nat. Methods 4, 931936 (2007).
  9. Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376380 (2005).
  10. Wicker, T. et al. 454 sequencing put to the test using the complex genome of barley. BMC Genomics 7, 275 (2006).
  11. Willenbrock, H. et al. Quantitative miRNA expression analysis: comparing microarrays with next-generation sequencing. RNA 15, 20282034 (2009).
  12. Stemmer, W.P., Crameri, A., Ha, K.D., Brennan, T.M. & Heyneker, H.L. Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. Gene 164, 4953 (1995).
  13. Richmond, K.E. et al. Amplification and assembly of chip-eluted DNA (AACED): a method for high-throughput gene synthesis. Nucleic Acids Res. 32, 50115018 (2004).
  14. Jefferson, R.A., Burgess, S.M. & Hirsh, D. beta-Glucuronidase from Escherichia coli as a gene-fusion marker. Proc. Natl. Acad. Sci. USA 83, 84478451 (1986).
  15. Couteaudier, Y., Daboussi, M.J., Eparvier, A., Langin, T. & Orcival, J. The GUS gene fusion system (Escherichia coli beta-D-glucuronidase gene), a useful tool in studies of root colonization by Fusarium oxysporum. Appl. Environ. Microbiol. 59, 17671773 (1993).
  16. Cline, J., Braman, J.C. & Hogrefe, H.H. PCR fidelity of pfu DNA polymerase and other thermostable DNA polymerases. Nucleic Acids Res. 24, 35463551 (1996).
  17. Carr, P.A. et al. Protein-mediated error correction for de novo DNA synthesis. Nucleic Acids Res. 32, e162 (2004).
  18. Smith, J. & Modrich, P. Removal of polymerase-produced mutant sequences from PCR products. Proc. Natl. Acad. Sci. USA 94, 68476850 (1997).
  19. Bang, D. & Church, G.M. Gene synthesis by circular assembly amplification. Nat. Methods 5, 3739 (2008).
  20. Fuhrmann, M., Oertel, W., Berthold, P. & Hegemann, P. Removal of mismatched bases from synthetic genes by enzymatic mismatch cleavage. Nucleic Acids Res. 33, e58 (2005).
  21. Binkowski, B.F., Richmond, K.E., Kaysen, J., Sussman, M.R. & Belshaw, P.J. Correcting errors in synthetic DNA through consensus shuffling. Nucleic Acids Res. 33, e55 (2005).
  22. McKernan, K.J. et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 15271541 (2009).
  23. Kosuri, S. et al. Scalable gene synthesis by selective amplification of DNA pools from high-fidelity microchips. Nat. Biotechnol. advance online publication, doi:10.1038/nbt.1716 (28 November 2010).
  24. Williams, R. et al. Amplification of complex gene libraries by emulsion PCR. Nat. Methods 3, 545550 (2006).

Download references

Author information

  1. These authors contributed equally to this work.

    • Mark Matzas &
    • Peer F Stähler


  1. febit group, Heidelberg, Germany.

    • Mark Matzas,
    • Peer F Stähler,
    • Nathalie Kefer,
    • Nicole Siebelt,
    • Valesca Boisguérin,
    • Jack T Leonard,
    • Andreas Keller,
    • Cord F Stähler &
    • Pamela Häberle
  2. Stanford Genome Technology Center, Stanford University, Palo Alto, California, USA.

    • Baback Gharizadeh &
    • Farbod Babrzadeh
  3. Harvard Medical School, Boston, Massachusetts, USA.

    • George M Church
  4. Wyss Institute for Biologically Inspired Engineering, Boston, Massachusetts, USA.

    • George M Church


M.M., P.F.S. and G.M.C. conceptualized the megacloning method and wrote the manuscript; M.M. designed and lead the study, wrote all algorithms for sequence design, data analysis, image conversion, image processing and microactuator control; M.M., N.K., N.S. acquired the used technology, set up the microactuator device and optical systems; N.S. designed the uidA genetic model; M.M., N.K., N.S., V.B. and P.H. designed and optimized molecular biological methods; C.F.S. and J.T.L. contributed to bead picking and engineering concepts; A.K. set up the statistical models and calculations; J.T.L. contributed to the design of molecular biological steps and the acquisition of sequencing samples; B.G. and F.B. evaluated and implemented necessary changes into the sample preparation and the sequencing process on the 454/Roche platform.

Competing financial interests

M.M., N.K., N.S., V.B., J.T.L., P.F.S., A.K., C.F.S. and P.H. have potentially competing financial interests in companies affiliated to the febit group.

Corresponding author

Correspondence to:

*To contact author by email, please substitute an @ for the # in the email address provided.

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (3 MB)

    Supplementary Data

Additional data