Complex oligonucleotide (oligo) libraries are essential materials for diverse applications in synthetic biology, pharmaceutical production, nanotechnology and DNA-based data storage. However, the error rates in synthesizing complex oligo libraries can be substantial, leading to increment in cost and labor for the applications. As most synthesis errors arise from faulty insertions and deletions, we developed a length-based method with single-base resolution for purification of complex libraries containing oligos of identical or different lengths. Our method—purification of multiplex oligonucleotide libraries by synthesis and selection—can be performed either step-by-step manually or using a next-generation sequencer. When applied to a digital data-encoded library containing oligos of identical length, the method increased the purity of full-length oligos from 83% to 97%. We also show that libraries encoding the complementarity-determining region H3 with three different lengths (with an empirically achieved diversity >106) can be simultaneously purified in one pot, increasing the in-frame oligo fraction from 49.6% to 83.5%.
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
All sequencing data are available in the Sequence Read Archive under accession number PRJNA698654.
Tian, J. et al. Accurate multiplex gene synthesis from programmable DNA microchips. Nature 432, 1050–1054 (2004).
Kosuri, S. et al. Scalable gene synthesis by selective amplification of DNA pools from high-fidelity microchips. Nat. Biotechnol. 28, 1295–1299 (2010).
Agarwal, K. L. et al. Total synthesis of the structural gene for an alanine transfer ribonucleic acid from yeast. Nature 227, 27–34 (1970).
Sidhu, S. S. & Fellouse, F. A. Synthetic therapeutic antibodies. Nat. Chem. Biol. 2, 682–688 (2006).
Bai, X., Kim, J., Kang, S., Kim, W. & Shim, H. A novel human scFv library with non- combinatorial synthetic CDR diversity. PLoS ONE 10, 1–18 (2015).
Ong, L. L. et al. Programmable self-assembly of three-dimensional nanostructures from 104 unique components. Nature 552, 72–77 (2017).
Han, D. et al. DNA origami with complex curvatures in three-dimensional space. Science 332, 342–346 (2011).
Organick, L. et al. Random access in large-scale DNA data storage. Nat. Biotechnol. 36, 242–248 (2018).
Erlich, Y. & Zielinski, D. DNA Fountain enables a robust and efficient storage architecture. Science 355, 950–954 (2017).
Sanson, K. R. et al. Optimized libraries for CRISPR–Cas9 genetic screens with multiple modalities. Nat. Commun. 9, 5416 (2018).
Kosuri, S. & Church, G. M. Large-scale de novo DNA synthesis: technologies and applications. Nat. Methods 11, 499–507 (2014).
Wysoczynski, C. L. et al. Reversed-phase ion-pair liquid chromatography method for purification of duplex DNA with single base pair resolution. Nucleic Acids Res. 41, 1–10 (2013).
Behlke, M. A. & Devor, E. J. Chemical synthesis of oligonucleotides. http://www.crchudequebec.ulaval.ca/wp-content/uploads/2015/10/Chemical_Synthesis_of_Oligonucleotides.pdf (2005).
Findlay, G. M., Boyle, E. A., Hause, R. J., Klein, J. C. & Shendure, J. Saturation editing of genomic regions by multiplex homology-directed repair. Nature 513, 120–123 (2014).
Ma, S., Saaem, I. & Tian, J. Error correction in gene synthesis technology. Trends Biotechnol. 30, 147–154 (2012).
Lubock, N. B., Zhang, D., Sidore, A. M., Church, G. M. & Kosuri, S. A systematic comparison of error correction enzymes by next-generation sequencing. Nucleic Acids Res. 45, 9206–9217 (2017).
Pinto, A., Chen, S. X. & Zhang, D. Y. Simultaneous and stoichiometric purification of hundreds of oligonucleotides. Nat. Commun. 9, 2467 (2018).
Binkowski, B. F., Richmond, K. E., Kaysen, J., Sussman, M. R. & Belshaw, P. J. Correcting errors in synthetic DNA through consensus shuffling. Nucleic Acids Res. 33, 1–8 (2005).
Wan, W. et al. Error removal in microchip-synthesized DNA using immobilized MutS. Nucleic Acids Res. 42, 1–14 (2014).
Fuhrmann, M., Oertel, W., Berthold, P. & Hegemann, P. Removal of mismatched bases from synthetic genes by enzymatic mismatch cleavage. Nucleic Acids Res. 33, 1–8 (2005).
Carr, P. A. et al. Protein-mediated error correction for de novo DNA synthesis. Nucleic Acids Res. 32, 1–9 (2004).
Till, B. J., Burtner, C., Comai, L. & Henikoff, S. Mismatch cleavage by single-strand specific nucleases. Nucleic Acids Res. 32, 2632–2641 (2004).
Zhang, J. et al. Efficient and low-cost error removal in DNA synthesis by a high-durability MutS. ACS Synth. Biol. 9, 940–952 (2020).
Matzas, M. et al. High-fidelity gene synthesis by retrieval of sequence-verified DNA identified using high-throughput pyrosequencing. Nat. Biotechnol. 28, 1291–1294 (2010).
Lee, H. et al. A high-throughput optomechanical retrieval method for sequence-verified clonal DNA from the NGS platform. Nat. Commun. 6, 6073 (2015).
Schwartz, J. J., Lee, C. & Shendure, J. Accurate gene synthesis with tag-directed retrieval of sequence-verified DNA molecules. Nat. Methods 9, 913–915 (2012).
Kim, H. et al. ‘Shotgun DNA synthesis’ for the high-throughput construction of large DNA molecules. Nucleic Acids Res. 40, e140 (2012).
Guo, J. et al. Four-color DNA sequencing with 3′-O-modified nucleotide reversible terminators and chemically cleavable fluorescent dideoxynucleotides. Proc. Natl Acad. Sci. USA 105, 9145–9150 (2008).
Kebschull, J. M. & Zador, A. M. Sources of PCR-induced distortions in high-throughput sequencing data sets. Nucleic Acids Res. 43, 1–15 (2015).
Gao, Y., Chen, X., Qiao, H., Ke, Y. & Qi, H. Low-bias manipulation of DNA oligo pool for robust data storage. ACS Synth. Biol. 9, 3344–3352 (2020).
Choi, Y. et al. DNA micro-disks for the management of DNA-based data storage with index and write-once–read-many (WORM) memory features. Adv. Mater. 32, 1–8 (2020).
Heckel, R., Mikutis, G. & Grass, R. N. A characterization of the DNA data storage channel. Sci. Rep. 9, 9663 (2019).
Blawat, M. et al. Forward error correction for DNA data storage. Procedia Comput. Sci. 80, 1011–1022 (2016).
Press, W. H., Hawkins, J. A., Schaub, J. M., Schaub, J. M. & Finkelstein, I. J. HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints. Proc. Natl. Acad. Sci. USA 117, 18489–18496 (2020).
Choi, Y. et al. High information capacity DNA-based data storage with augmented encoding characters using degenerate bases. Sci. Rep. 9, 6582 (2019).
Rayner, S. et al. MerMade: an oligodeoxyribonucleotide synthesizer for high throughput oligonucleotide production in dual 96-well plates. Genome Res. 8, 741–747 (1998).
Quan, J. et al. Parallel on-chip gene synthesis and application to optimization of protein expression. Nat. Biotechnol. 29, 449–452 (2011).
Chen, C. Y. DNA polymerases drive DNA sequencing-by-synthesis technologies: both past and present. Front. Microbiol. 5, 1–11 (2014).
Lee, C. V. et al. High-affinity human antibodies from phage-displayed synthetic Fab libraries with a single framework scaffold. J. Mol. Biol. 340, 1073–1093 (2004).
Kitzman, J. O., Starita, L. M., Lo, R. S., Fields, S. & Shendure, J. Massively parallel single-amino-acid mutagenesis. Nat. Methods 12, 203–206 (2015).
Cho, N. et al. De novo assembly and next-generation sequencing to analyse full-length gene variants from codon-barcoded libraries. Nat. Commun. 6, 8351 (2015).
Wu, T. T., Johnson, G. & Kabat, E. A. Length distribution of CDRH3 in antibodies. Proteins 16, 1–7 (1993).
Lin, M. et al. Effects of short indels on protein structure and function in human genomes. Sci. Rep. 7, 9313 (2017).
Yang, H. Y., Kang, K. J., Chung, J. E. & Shim, H. Construction of a large synthetic human scFv library with six diversified CDRs and high functional diversity. Mol. Cells 27, 225–235 (2009).
Pfeiffer, F. et al. Systematic evaluation of error rates and causes in short samples in next-generation sequencing. Sci. Rep. 8, 10950 (2018).
Choi, Y., Choi, H., Lee, A. C., Lee, H. & Kwon, S. A reconfigurable DNA accordion rack. Angew. Chemie Int. Ed. 57, 2811–2815 (2018).
This research was supported by the Global Research Development Center Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Science and ICT (MSIT) (2015K1A4A3047345 to S.K.); the Brain Korea 21 Plus Project in 2020 to S.K.; the MSIT and the NRF (NRF-2020R1A3B3079653 to S.K. and NRF-2021R1C1C2010079 to H.Y.); the Bio & Medical Technology Development Program of the NRF, funded by the Korean government (MSIT) (no. 2018M3A9D7079488 to T.R.); and K-BIO KIURI Center through the MSIT (2020M3H1A1073304 to A.C.L.). J.C. is grateful for financial support from Hyundai Motor Chung Mong-Koo Foundation.
H.C., Y.C., J.C., A.C.L, T.R. and S.K are inventors of a patent application for the method described in this article. The remaining authors declare no financial conflicts of interest.
Peer review information Nature Biotechnology thanks Hyunbo Shim, David Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Choi, H., Choi, Y., Choi, J. et al. Purification of multiplex oligonucleotide libraries by synthesis and selection. Nat Biotechnol (2021). https://doi.org/10.1038/s41587-021-00988-3