Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Purification of multiplex oligonucleotide libraries by synthesis and selection

Abstract

Complex oligonucleotide (oligo) libraries are essential materials for diverse applications in synthetic biology, pharmaceutical production, nanotechnology and DNA-based data storage. However, the error rates in synthesizing complex oligo libraries can be substantial, leading to increment in cost and labor for the applications. As most synthesis errors arise from faulty insertions and deletions, we developed a length-based method with single-base resolution for purification of complex libraries containing oligos of identical or different lengths. Our method—purification of multiplex oligonucleotide libraries by synthesis and selection—can be performed either step-by-step manually or using a next-generation sequencer. When applied to a digital data-encoded library containing oligos of identical length, the method increased the purity of full-length oligos from 83% to 97%. We also show that libraries encoding the complementarity-determining region H3 with three different lengths (with an empirically achieved diversity >106) can be simultaneously purified in one pot, increasing the in-frame oligo fraction from 49.6% to 83.5%.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The method identifies error-free oligos by identifying oligos with altered lengths due to indels from the complex oligo library.
Fig. 2: Column-synthesized oligos are purified on the glass.
Fig. 3: Microarray-synthesized oligo libraries with high complexity were purified using NGS instrument.
Fig. 4: The purification method was applied to a digital data-encoding oligo library.
Fig. 5: The purification method was applied to the CDR H3 combinatorial libraries.

Similar content being viewed by others

Data availability

All sequencing data are available in the Sequence Read Archive under accession number PRJNA698654.

References

  1. Tian, J. et al. Accurate multiplex gene synthesis from programmable DNA microchips. Nature 432, 1050–1054 (2004).

    Article  CAS  PubMed  Google Scholar 

  2. Kosuri, S. et al. Scalable gene synthesis by selective amplification of DNA pools from high-fidelity microchips. Nat. Biotechnol. 28, 1295–1299 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Agarwal, K. L. et al. Total synthesis of the structural gene for an alanine transfer ribonucleic acid from yeast. Nature 227, 27–34 (1970).

    Article  CAS  PubMed  Google Scholar 

  4. Sidhu, S. S. & Fellouse, F. A. Synthetic therapeutic antibodies. Nat. Chem. Biol. 2, 682–688 (2006).

    Article  CAS  PubMed  Google Scholar 

  5. Bai, X., Kim, J., Kang, S., Kim, W. & Shim, H. A novel human scFv library with non- combinatorial synthetic CDR diversity. PLoS ONE 10, 1–18 (2015).

    Article  CAS  Google Scholar 

  6. Ong, L. L. et al. Programmable self-assembly of three-dimensional nanostructures from 104 unique components. Nature 552, 72–77 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Han, D. et al. DNA origami with complex curvatures in three-dimensional space. Science 332, 342–346 (2011).

    Article  CAS  PubMed  Google Scholar 

  8. Organick, L. et al. Random access in large-scale DNA data storage. Nat. Biotechnol. 36, 242–248 (2018).

    Article  CAS  PubMed  Google Scholar 

  9. Erlich, Y. & Zielinski, D. DNA Fountain enables a robust and efficient storage architecture. Science 355, 950–954 (2017).

    Article  CAS  PubMed  Google Scholar 

  10. Sanson, K. R. et al. Optimized libraries for CRISPR–Cas9 genetic screens with multiple modalities. Nat. Commun. 9, 5416 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Kosuri, S. & Church, G. M. Large-scale de novo DNA synthesis: technologies and applications. Nat. Methods 11, 499–507 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Wysoczynski, C. L. et al. Reversed-phase ion-pair liquid chromatography method for purification of duplex DNA with single base pair resolution. Nucleic Acids Res. 41, 1–10 (2013).

    Article  Google Scholar 

  13. Behlke, M. A. & Devor, E. J. Chemical synthesis of oligonucleotides. http://www.crchudequebec.ulaval.ca/wp-content/uploads/2015/10/Chemical_Synthesis_of_Oligonucleotides.pdf (2005).

  14. Findlay, G. M., Boyle, E. A., Hause, R. J., Klein, J. C. & Shendure, J. Saturation editing of genomic regions by multiplex homology-directed repair. Nature 513, 120–123 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Ma, S., Saaem, I. & Tian, J. Error correction in gene synthesis technology. Trends Biotechnol. 30, 147–154 (2012).

    Article  CAS  PubMed  Google Scholar 

  16. Lubock, N. B., Zhang, D., Sidore, A. M., Church, G. M. & Kosuri, S. A systematic comparison of error correction enzymes by next-generation sequencing. Nucleic Acids Res. 45, 9206–9217 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Pinto, A., Chen, S. X. & Zhang, D. Y. Simultaneous and stoichiometric purification of hundreds of oligonucleotides. Nat. Commun. 9, 2467 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Binkowski, B. F., Richmond, K. E., Kaysen, J., Sussman, M. R. & Belshaw, P. J. Correcting errors in synthetic DNA through consensus shuffling. Nucleic Acids Res. 33, 1–8 (2005).

    Article  Google Scholar 

  19. Wan, W. et al. Error removal in microchip-synthesized DNA using immobilized MutS. Nucleic Acids Res. 42, 1–14 (2014).

    Article  Google Scholar 

  20. Fuhrmann, M., Oertel, W., Berthold, P. & Hegemann, P. Removal of mismatched bases from synthetic genes by enzymatic mismatch cleavage. Nucleic Acids Res. 33, 1–8 (2005).

    Article  Google Scholar 

  21. Carr, P. A. et al. Protein-mediated error correction for de novo DNA synthesis. Nucleic Acids Res. 32, 1–9 (2004).

    Article  Google Scholar 

  22. Till, B. J., Burtner, C., Comai, L. & Henikoff, S. Mismatch cleavage by single-strand specific nucleases. Nucleic Acids Res. 32, 2632–2641 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Zhang, J. et al. Efficient and low-cost error removal in DNA synthesis by a high-durability MutS. ACS Synth. Biol. 9, 940–952 (2020).

    Article  CAS  PubMed  Google Scholar 

  24. Matzas, M. et al. High-fidelity gene synthesis by retrieval of sequence-verified DNA identified using high-throughput pyrosequencing. Nat. Biotechnol. 28, 1291–1294 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Lee, H. et al. A high-throughput optomechanical retrieval method for sequence-verified clonal DNA from the NGS platform. Nat. Commun. 6, 6073 (2015).

    Article  CAS  PubMed  Google Scholar 

  26. Schwartz, J. J., Lee, C. & Shendure, J. Accurate gene synthesis with tag-directed retrieval of sequence-verified DNA molecules. Nat. Methods 9, 913–915 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Kim, H. et al. ‘Shotgun DNA synthesis’ for the high-throughput construction of large DNA molecules. Nucleic Acids Res. 40, e140 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Guo, J. et al. Four-color DNA sequencing with 3′-O-modified nucleotide reversible terminators and chemically cleavable fluorescent dideoxynucleotides. Proc. Natl Acad. Sci. USA 105, 9145–9150 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Kebschull, J. M. & Zador, A. M. Sources of PCR-induced distortions in high-throughput sequencing data sets. Nucleic Acids Res. 43, 1–15 (2015).

    Google Scholar 

  30. Gao, Y., Chen, X., Qiao, H., Ke, Y. & Qi, H. Low-bias manipulation of DNA oligo pool for robust data storage. ACS Synth. Biol. 9, 3344–3352 (2020).

  31. Choi, Y. et al. DNA micro-disks for the management of DNA-based data storage with index and write-once–read-many (WORM) memory features. Adv. Mater. 32, 1–8 (2020).

    Article  Google Scholar 

  32. Heckel, R., Mikutis, G. & Grass, R. N. A characterization of the DNA data storage channel. Sci. Rep. 9, 9663 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Blawat, M. et al. Forward error correction for DNA data storage. Procedia Comput. Sci. 80, 1011–1022 (2016).

    Article  Google Scholar 

  34. Press, W. H., Hawkins, J. A., Schaub, J. M., Schaub, J. M. & Finkelstein, I. J. HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints. Proc. Natl. Acad. Sci. USA 117, 18489–18496 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Choi, Y. et al. High information capacity DNA-based data storage with augmented encoding characters using degenerate bases. Sci. Rep. 9, 6582 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Rayner, S. et al. MerMade: an oligodeoxyribonucleotide synthesizer for high throughput oligonucleotide production in dual 96-well plates. Genome Res. 8, 741–747 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Quan, J. et al. Parallel on-chip gene synthesis and application to optimization of protein expression. Nat. Biotechnol. 29, 449–452 (2011).

    Article  CAS  PubMed  Google Scholar 

  38. Chen, C. Y. DNA polymerases drive DNA sequencing-by-synthesis technologies: both past and present. Front. Microbiol. 5, 1–11 (2014).

    Article  Google Scholar 

  39. Lee, C. V. et al. High-affinity human antibodies from phage-displayed synthetic Fab libraries with a single framework scaffold. J. Mol. Biol. 340, 1073–1093 (2004).

    Article  CAS  PubMed  Google Scholar 

  40. Kitzman, J. O., Starita, L. M., Lo, R. S., Fields, S. & Shendure, J. Massively parallel single-amino-acid mutagenesis. Nat. Methods 12, 203–206 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Cho, N. et al. De novo assembly and next-generation sequencing to analyse full-length gene variants from codon-barcoded libraries. Nat. Commun. 6, 8351 (2015).

    Article  CAS  PubMed  Google Scholar 

  42. Wu, T. T., Johnson, G. & Kabat, E. A. Length distribution of CDRH3 in antibodies. Proteins 16, 1–7 (1993).

    Article  CAS  PubMed  Google Scholar 

  43. Lin, M. et al. Effects of short indels on protein structure and function in human genomes. Sci. Rep. 7, 9313 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Yang, H. Y., Kang, K. J., Chung, J. E. & Shim, H. Construction of a large synthetic human scFv library with six diversified CDRs and high functional diversity. Mol. Cells 27, 225–235 (2009).

    Article  CAS  PubMed  Google Scholar 

  45. Pfeiffer, F. et al. Systematic evaluation of error rates and causes in short samples in next-generation sequencing. Sci. Rep. 8, 10950 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Choi, Y., Choi, H., Lee, A. C., Lee, H. & Kwon, S. A reconfigurable DNA accordion rack. Angew. Chemie Int. Ed. 57, 2811–2815 (2018).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This research was supported by the Global Research Development Center Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Science and ICT (MSIT) (2015K1A4A3047345 to S.K.); the Brain Korea 21 Plus Project in 2020 to S.K.; the MSIT and the NRF (NRF-2020R1A3B3079653 to S.K. and NRF-2021R1C1C2010079 to H.Y.); the Bio & Medical Technology Development Program of the NRF, funded by the Korean government (MSIT) (no. 2018M3A9D7079488 to T.R.); and K-BIO KIURI Center through the MSIT (2020M3H1A1073304 to A.C.L.). J.C. is grateful for financial support from Hyundai Motor Chung Mong-Koo Foundation.

Author information

Authors and Affiliations

Authors

Contributions

H.C., Y.C., T.R., and S.K initiated and designed the experiments. H.C., Y.C., J.C, A.C.L., T.R. and S.K. wrote the manuscript. H.C., Y.C., A.C.L., H.Y., J.H. and T.R. conducted the research, including DNA synthesis and analysis.

Corresponding authors

Correspondence to Taehoon Ryu or Sunghoon Kwon.

Ethics declarations

Competing interests

H.C., Y.C., J.C., A.C.L, T.R. and S.K are inventors of a patent application for the method described in this article. The remaining authors declare no financial conflicts of interest.

Additional information

Peer review information Nature Biotechnology thanks Hyunbo Shim, David Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–5, Figs. 1–18 and Tables 1–9.

Reporting Summary

Supplementary Data

Oligo sequences used for the purification

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choi, H., Choi, Y., Choi, J. et al. Purification of multiplex oligonucleotide libraries by synthesis and selection. Nat Biotechnol 40, 47–53 (2022). https://doi.org/10.1038/s41587-021-00988-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-021-00988-3

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research