Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing

Abstract

High-throughput amplicon sequencing of large genomic regions remains challenging for short-read technologies. Here, we report a high-throughput amplicon sequencing approach combining unique molecular identifiers (UMIs) with Oxford Nanopore Technologies (ONT) or Pacific Biosciences circular consensus sequencing, yielding high-accuracy single-molecule consensus sequences of large genomic regions. We applied our approach to sequence ribosomal RNA operon amplicons (~4,500 bp) and genomic sequences (>10,000 bp) of reference microbial communities in which we observed a chimera rate <0.02%. To reach a mean UMI consensus error rate <0.01%, a UMI read coverage of 15× (ONT R10.3), 25× (ONT R9.4.1) and 3× (Pacific Biosciences circular consensus sequencing) is needed, which provides a mean error rate of 0.0042%, 0.0041% and 0.0007%, respectively.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Dual UMI-tagging approach for long-read amplicon sequencing.
Fig. 2: Error profiling of long-read amplicon sequencing strategies.
Fig. 3: BLAST-based consensus taxonomic assignment against the Web of Life database for whole rRNA operons, using the combination of 16S and 23S rRNAs and individual rRNA genes.

Similar content being viewed by others

Data availability

Raw and assembled sequencing data are available at the European Nucleotide Archive under the project number PRJEB32674. A complete data overview is in Supplementary Table 11 and data yield is in Supplementary Table 12.

Public data used in this study include SILVA 138.1 SSURef Nr99 database (https://www.arb-silva.de/), gene-specific databases from the Web of Life (https://github.com/biocore/wol), Greengenes 13_5 database (https://greengenes.secondgenome.com/), EMP 16S V4 Deblur sOTU profiles (https://github.com/biocore/emp and https://github.com/biocore/redbiom), reference sequences for ZymoBIOMICS Microbial Community Standard D6300 (https://s3.amazonaws.com/zymo-files/BioPool/ZymoBIOMICS.STD.refseq.v2.zip), raw ONT sequencing data from ZymoBIOMICS Microbial Community Standard D6300 (ENA accession ERR2887847), Illumina sequencing data from ZymoBIOMICS Microbial Community Standard D6300 (ENA accessions: ERR2935851, ERR2935850, ERR2935852, ERR2935857, ERR2935854, ERR2935853, ERR2935848 and ERR2935849) and E.coli str_K12_MG1655 genome (NCBI: U00096.3).

Code availability

Source code and analysis scripts are freely available at https://github.com/SorenKarst/longread_umi. The repository release version used to generate the data in this article was v.0.4.2.

References

  1. Meldrum, C., Doyle, M. A. & Tothill, R. W. Next-generation sequencing for cancer diagnostics: a practical perspective. Clin. Biochem. Rev. 32, 177–195 (2011).

    PubMed  PubMed Central  Google Scholar 

  2. Guibert, N. et al. Amplicon-based next-generation sequencing of plasma cell-free DNA for detection of driver and resistance mutations in advanced non-small cell lung cancer. Ann. Oncol. 29, 1049–1055 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Campbell, P. J. et al. Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proc. Natl Acad. Sci. USA 105, 13081–13086 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Goldsmith, D. B., Parsons, R. J., Beyene, D., Salamon, P. & Breitbart, M. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea. PeerJ 3, e997 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Adriaenssens, E. M. & Cowan, D. A. Using signature genes as tools to assess environmental viral ecology and diversity. Appl. Environ. Microbiol. 80, 4470–4480 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Uyaguari-Diaz, M. I. et al. A comprehensive method for amplicon-based and metagenomic characterization of viruses, bacteria, and eukaryotes in freshwater samples. Microbiome 4, 20 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Caporaso, J. G. et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc. Natl Acad. Sci. USA 108, 4516–4522 (2011).

    Article  CAS  PubMed  Google Scholar 

  8. Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).

    Article  CAS  PubMed  Google Scholar 

  9. Johnson, J. S. et al. Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nat. Commun. 10, 5029 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Hiatt, J. B., Patwardhan, R. P., Turner, E. H., Lee, C. & Shendure, J. Parallel, tag-directed assembly of locally derived short sequence reads. Nat. Methods 7, 119–122 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Stapleton, J. A. et al. Haplotype-phased synthetic long reads from short-read sequencing. PLoS ONE 11, e0147229 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Wick, R. R., Judd, L. M. & Holt, K. E. Deepbinner: demultiplexing barcoded Oxford Nanopore reads with deep convolutional neural networks. PLoS Comput. Biol. 14, e1006583 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Ardui, S., Ameur, A., Vermeesch, J. R. & Hestand, M. S. Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics. Nucleic Acids Res. 46, 2159–2168 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Karlsson, K. & Linnarsson, S. Single-cell mRNA isoform diversity in the mouse brain. BMC Genomics 18, 126 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Gupta, I. et al. Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells. Nat. Biotechnol. 36, 1197–1202 (2018).

    Article  CAS  Google Scholar 

  16. Russell, A. B., Elshina, E., Kowalsky, J. R., Te Velthuis, A. J. W. & Bloom, J. D. Single-cell virus sequencing of influenza infections that trigger innate immunity. J. Virol. https://doi.org/10.1128/JVI.00500-19 (2019).

  17. Burke, C. M. & Darling, A. E. A method for high precision sequencing of near full-length 16S rRNA genes on an Illumina MiSeq. PeerJ 4, e2492 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Bowden, R. et al. Sequencing of human genomes with nanopore technology. Nat. Commun. 10, 1869 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Gilpatrick, T. et al. Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat. Biotechnol. 38, 433–438 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Kivioja, T. et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9, 72–74 (2011).

    Article  PubMed  Google Scholar 

  21. Sze, M. A. & Schloss, P. D. The impact of DNA polymerase and number of rounds of amplification in PCR on 16S rRNA gene sequence data. mSphere https://doi.org/10.1016/j.mimet.2020.106033 (2019).

  22. McDonald, D. et al. American Gut: an Open platform for citizen science microbiome research. mSystems https://doi.org/10.1128/mSystems.00031-18 (2018).

  23. Zhu, Q. et al. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains bacteria and archaea. Nat. Commun. 10, 5477 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. de Oliveira Martins, L., Page, A. J., Mather, A. E. & Charles, I. G. Taxonomic resolution of the ribosomal RNA operon in bacteria: implications for its use with long-read sequencing. NAR Genom Bioinform https://doi.org/10.1093/nargab/lqz016 (2020).

  25. Fu, G. K., Hu, J., Wang, P.-H. & Fodor, S. P. A. Counting individual DNA molecules by the stochastic attachment of diverse labels. Proc. Natl Acad. Sci. USA 108, 9026–9031 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Li, C. et al. INC-Seq: accurate single molecule reads using nanopore sequencing. Gigascience 5, 34 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Volden, R. et al. Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proc. Natl Acad. Sci. USA 115, 9726–9731 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Calus, S. T., Ijaz, U. Z. & Pinto, A. J. NanoAmpli-Seq: a workflow for amplicon sequencing for mixed microbial communities on the nanopore sequencing platform. Gigascience 7, 1–16 (2018).

    Article  CAS  Google Scholar 

  29. Callahan, B. J. et al. High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution. Nucleic Acids Res. 47, e103 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Hathaway, N. J., Parobek, C. M., Juliano, J. J. & Bailey, J. A. SeekDeep: single-base resolution de novo clustering for amplicon deep sequencing. Nucleic Acids Res. 46, e21 (2018).

    Article  CAS  PubMed  Google Scholar 

  31. Edgar, R. C. UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing. Preprint at bioRxiv https://doi.org/10.1101/081257 (2016).

  32. Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Nicholls, S. M., Quick, J. C., Tang, S. & Loman, N. J. Ultra-deep, long-read nanopore sequencing of mock microbial community standards. Gigascience 8, 1–7 (2019).

    Article  CAS  Google Scholar 

  35. Sevim, V. et al. Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies. Sci. Data 6, 285 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Wright, E. S., Yilmaz, L. S. & Noguera, D. R. DECIPHER, a search-based approach to chimera identification for 16S rRNA sequences. Appl. Environ. Microbiol. 78, 717–725 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Klindworth, A. et al. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 41, e1 (2013).

    Article  CAS  PubMed  Google Scholar 

  38. Hunt, D. E. et al. Evaluation of 23S rRNA PCR primers for use in phylogenetic studies of bacterial diversity. Appl. Environ. Microbiol. 72, 2221–2225 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).

    Article  Google Scholar 

  40. Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).

    Article  CAS  PubMed  Google Scholar 

  41. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Vaser, R., Sović, I., Nagarajan, N. & Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Tange, O. Gnu Parallel 20150322 (’Hellwig’). USENIX Magazine https://doi.org/10.5281/ZENODO.16303 (2015).

  45. R Team. R: A language and environment for statistical computing (2018).

  46. R Team. RStudio: integrated development for R. http://www.rstudio.com (2015).

  47. Wickham, H. Tidyverse: easily install and load the ‘Tidyverse’. R package v.1.2. 1 (2017).

  48. DebRoy, H. P., Aboyoun, P., Gentleman, R. & DebRoy, S. Biostrings: Efficient manipulation of biological strings. https://bioconductor.org/packages/Biostrings (2018).

  49. Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–3108 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahé, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. McDonald, D. et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 6, 610–618 (2012).

    Article  CAS  PubMed  Google Scholar 

  54. McDonald, D. et al. redbiom: a rapid sample discovery and feature characterization system. mSystems https://doi.org/10.1128/mSystems.00215-19 (2019).

  55. Parada, A. E., Needham, D. M. & Fuhrman, J. A. Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. Environ. Microbiol. 18, 1403–1414 (2016).

    Article  CAS  PubMed  Google Scholar 

  56. Gonzalez, A. et al. Qiita: rapid, web-enabled microbiome meta-analysis. Nat. Methods 15, 796–798 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).

    Article  Google Scholar 

  58. Virtanen, P. et al. SciPy 1.0—Fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The study was funded by research grants from VILLUM FONDEN (15510) and the Poul Due Jensen Foundation (Microflora Danica). R.M.Z. was funded by grants from the Natural Sciences and Engineering Research Council of Canada (Discovery Grant) and Genome British Columbia (SIP011).

Author information

Authors and Affiliations

Authors

Contributions

S.M.K. and R.M.Z. conceived the method and developed the bioinformatics pipeline. S.M.K. performed the wet laboratory method development and experiments. E.A.S. performed Nanopore UMI sequencing of E.coli. R.H.K. assembled reference genomes. S.M.K., R.M.Z. and M.A. performed data analysis on method performance. D.M., Q.Z. and R.K. analyzed American Gut Project samples. S.M.K., R.M.Z. and M.A. wrote the first draft of the manuscript. All authors contributed to the content and revision of the manuscript.

Corresponding author

Correspondence to Mads Albertsen.

Ethics declarations

Competing interests

M.A., S.M.K. and R.H.K. are co-owners of DNASense ApS. The other authors declare no competing interests.

Additional information

Peer review information Lei Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Human research participants The American Gut Project relies primarily on crowd-sourced samples without active recruitment. This research was performed in accordance with the University of Colorado Boulder’s Institutional Review Board protocol number 12-0582 and the University of California San Diego’s Human Research Protection Program, protocol no. 141853.

Supplementary information

Supplementary Information

Supplementary Note, Figs. 1–18, Tables 1–13 and references.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karst, S.M., Ziels, R.M., Kirkegaard, R.H. et al. High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing. Nat Methods 18, 165–169 (2021). https://doi.org/10.1038/s41592-020-01041-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-020-01041-y

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research