High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing

Abstract

High-throughput amplicon sequencing of large genomic regions remains challenging for short-read technologies. Here, we report a high-throughput amplicon sequencing approach combining unique molecular identifiers (UMIs) with Oxford Nanopore Technologies (ONT) or Pacific Biosciences circular consensus sequencing, yielding high-accuracy single-molecule consensus sequences of large genomic regions. We applied our approach to sequence ribosomal RNA operon amplicons (~4,500 bp) and genomic sequences (>10,000 bp) of reference microbial communities in which we observed a chimera rate <0.02%. To reach a mean UMI consensus error rate <0.01%, a UMI read coverage of 15× (ONT R10.3), 25× (ONT R9.4.1) and 3× (Pacific Biosciences circular consensus sequencing) is needed, which provides a mean error rate of 0.0042%, 0.0041% and 0.0007%, respectively.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Dual UMI-tagging approach for long-read amplicon sequencing.
Fig. 2: Error profiling of long-read amplicon sequencing strategies.
Fig. 3: BLAST-based consensus taxonomic assignment against the Web of Life database for whole rRNA operons, using the combination of 16S and 23S rRNAs and individual rRNA genes.

Data availability

Raw and assembled sequencing data are available at the European Nucleotide Archive under the project number PRJEB32674. A complete data overview is in Supplementary Table 11 and data yield is in Supplementary Table 12.

Public data used in this study include SILVA 138.1 SSURef Nr99 database (https://www.arb-silva.de/), gene-specific databases from the Web of Life (https://github.com/biocore/wol), Greengenes 13_5 database (https://greengenes.secondgenome.com/), EMP 16S V4 Deblur sOTU profiles (https://github.com/biocore/emp and https://github.com/biocore/redbiom), reference sequences for ZymoBIOMICS Microbial Community Standard D6300 (https://s3.amazonaws.com/zymo-files/BioPool/ZymoBIOMICS.STD.refseq.v2.zip), raw ONT sequencing data from ZymoBIOMICS Microbial Community Standard D6300 (ENA accession ERR2887847), Illumina sequencing data from ZymoBIOMICS Microbial Community Standard D6300 (ENA accessions: ERR2935851, ERR2935850, ERR2935852, ERR2935857, ERR2935854, ERR2935853, ERR2935848 and ERR2935849) and E.coli str_K12_MG1655 genome (NCBI: U00096.3).

Code availability

Source code and analysis scripts are freely available at https://github.com/SorenKarst/longread_umi. The repository release version used to generate the data in this article was v.0.4.2.

References

  1. 1.

    Meldrum, C., Doyle, M. A. & Tothill, R. W. Next-generation sequencing for cancer diagnostics: a practical perspective. Clin. Biochem. Rev. 32, 177–195 (2011).

    PubMed  PubMed Central  Google Scholar 

  2. 2.

    Guibert, N. et al. Amplicon-based next-generation sequencing of plasma cell-free DNA for detection of driver and resistance mutations in advanced non-small cell lung cancer. Ann. Oncol. 29, 1049–1055 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  3. 3.

    Campbell, P. J. et al. Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proc. Natl Acad. Sci. USA 105, 13081–13086 (2008).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  4. 4.

    Goldsmith, D. B., Parsons, R. J., Beyene, D., Salamon, P. & Breitbart, M. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea. PeerJ 3, e997 (2015).

    PubMed  PubMed Central  Article  Google Scholar 

  5. 5.

    Adriaenssens, E. M. & Cowan, D. A. Using signature genes as tools to assess environmental viral ecology and diversity. Appl. Environ. Microbiol. 80, 4470–4480 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  6. 6.

    Uyaguari-Diaz, M. I. et al. A comprehensive method for amplicon-based and metagenomic characterization of viruses, bacteria, and eukaryotes in freshwater samples. Microbiome 4, 20 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  7. 7.

    Caporaso, J. G. et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc. Natl Acad. Sci. USA 108, 4516–4522 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  8. 8.

    Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  9. 9.

    Johnson, J. S. et al. Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nat. Commun. 10, 5029 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  10. 10.

    Hiatt, J. B., Patwardhan, R. P., Turner, E. H., Lee, C. & Shendure, J. Parallel, tag-directed assembly of locally derived short sequence reads. Nat. Methods 7, 119–122 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. 11.

    Stapleton, J. A. et al. Haplotype-phased synthetic long reads from short-read sequencing. PLoS ONE 11, e0147229 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  12. 12.

    Wick, R. R., Judd, L. M. & Holt, K. E. Deepbinner: demultiplexing barcoded Oxford Nanopore reads with deep convolutional neural networks. PLoS Comput. Biol. 14, e1006583 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  13. 13.

    Ardui, S., Ameur, A., Vermeesch, J. R. & Hestand, M. S. Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics. Nucleic Acids Res. 46, 2159–2168 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  14. 14.

    Karlsson, K. & Linnarsson, S. Single-cell mRNA isoform diversity in the mouse brain. BMC Genomics 18, 126 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  15. 15.

    Gupta, I. et al. Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells. Nat. Biotechnol. 36, 1197–1202 (2018).

    CAS  Article  Google Scholar 

  16. 16.

    Russell, A. B., Elshina, E., Kowalsky, J. R., Te Velthuis, A. J. W. & Bloom, J. D. Single-cell virus sequencing of influenza infections that trigger innate immunity. J. Virol. https://doi.org/10.1128/JVI.00500-19 (2019).

  17. 17.

    Burke, C. M. & Darling, A. E. A method for high precision sequencing of near full-length 16S rRNA genes on an Illumina MiSeq. PeerJ 4, e2492 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  18. 18.

    Bowden, R. et al. Sequencing of human genomes with nanopore technology. Nat. Commun. 10, 1869 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  19. 19.

    Gilpatrick, T. et al. Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat. Biotechnol. 38, 433–438 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  20. 20.

    Kivioja, T. et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9, 72–74 (2011).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  21. 21.

    Sze, M. A. & Schloss, P. D. The impact of DNA polymerase and number of rounds of amplification in PCR on 16S rRNA gene sequence data. mSphere https://doi.org/10.1016/j.mimet.2020.106033 (2019).

  22. 22.

    McDonald, D. et al. American Gut: an Open platform for citizen science microbiome research. mSystems https://doi.org/10.1128/mSystems.00031-18 (2018).

  23. 23.

    Zhu, Q. et al. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains bacteria and archaea. Nat. Commun. 10, 5477 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  24. 24.

    de Oliveira Martins, L., Page, A. J., Mather, A. E. & Charles, I. G. Taxonomic resolution of the ribosomal RNA operon in bacteria: implications for its use with long-read sequencing. NAR Genom Bioinform https://doi.org/10.1093/nargab/lqz016 (2020).

  25. 25.

    Fu, G. K., Hu, J., Wang, P.-H. & Fodor, S. P. A. Counting individual DNA molecules by the stochastic attachment of diverse labels. Proc. Natl Acad. Sci. USA 108, 9026–9031 (2011).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  26. 26.

    Li, C. et al. INC-Seq: accurate single molecule reads using nanopore sequencing. Gigascience 5, 34 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. 27.

    Volden, R. et al. Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proc. Natl Acad. Sci. USA 115, 9726–9731 (2018).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  28. 28.

    Calus, S. T., Ijaz, U. Z. & Pinto, A. J. NanoAmpli-Seq: a workflow for amplicon sequencing for mixed microbial communities on the nanopore sequencing platform. Gigascience 7, 1–16 (2018).

    CAS  Article  Google Scholar 

  29. 29.

    Callahan, B. J. et al. High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution. Nucleic Acids Res. 47, e103 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  30. 30.

    Hathaway, N. J., Parobek, C. M., Juliano, J. J. & Bailey, J. A. SeekDeep: single-base resolution de novo clustering for amplicon deep sequencing. Nucleic Acids Res. 46, e21 (2018).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  31. 31.

    Edgar, R. C. UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing. Preprint at bioRxiv https://doi.org/10.1101/081257 (2016).

  32. 32.

    Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. 33.

    Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. 34.

    Nicholls, S. M., Quick, J. C., Tang, S. & Loman, N. J. Ultra-deep, long-read nanopore sequencing of mock microbial community standards. Gigascience 8, 1–7 (2019).

    CAS  Article  Google Scholar 

  35. 35.

    Sevim, V. et al. Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies. Sci. Data 6, 285 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  36. 36.

    Wright, E. S., Yilmaz, L. S. & Noguera, D. R. DECIPHER, a search-based approach to chimera identification for 16S rRNA sequences. Appl. Environ. Microbiol. 78, 717–725 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. 37.

    Klindworth, A. et al. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 41, e1 (2013).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  38. 38.

    Hunt, D. E. et al. Evaluation of 23S rRNA PCR primers for use in phylogenetic studies of bacterial diversity. Appl. Environ. Microbiol. 72, 2221–2225 (2006).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  39. 39.

    Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).

    Article  Google Scholar 

  40. 40.

    Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  41. 41.

    Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  42. 42.

    Vaser, R., Sović, I., Nagarajan, N. & Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  43. 43.

    Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  44. 44.

    Tange, O. Gnu Parallel 20150322 (’Hellwig’). USENIX Magazine https://doi.org/10.5281/ZENODO.16303 (2015).

  45. 45.

    R Team. R: A language and environment for statistical computing (2018).

  46. 46.

    R Team. RStudio: integrated development for R. http://www.rstudio.com (2015).

  47. 47.

    Wickham, H. Tidyverse: easily install and load the ‘Tidyverse’. R package v.1.2. 1 (2017).

  48. 48.

    DebRoy, H. P., Aboyoun, P., Gentleman, R. & DebRoy, S. Biostrings: Efficient manipulation of biological strings. https://bioconductor.org/packages/Biostrings (2018).

  49. 49.

    Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  50. 50.

    Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–3108 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  51. 51.

    Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahé, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  52. 52.

    Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  53. 53.

    McDonald, D. et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 6, 610–618 (2012).

    CAS  Article  Google Scholar 

  54. 54.

    McDonald, D. et al. redbiom: a rapid sample discovery and feature characterization system. mSystems https://doi.org/10.1128/mSystems.00215-19 (2019).

  55. 55.

    Parada, A. E., Needham, D. M. & Fuhrman, J. A. Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. Environ. Microbiol. 18, 1403–1414 (2016).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  56. 56.

    Gonzalez, A. et al. Qiita: rapid, web-enabled microbiome meta-analysis. Nat. Methods 15, 796–798 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  57. 57.

    Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).

    Article  Google Scholar 

  58. 58.

    Virtanen, P. et al. SciPy 1.0—Fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  59. 59.

    Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

Download references

Acknowledgements

The study was funded by research grants from VILLUM FONDEN (15510) and the Poul Due Jensen Foundation (Microflora Danica). R.M.Z. was funded by grants from the Natural Sciences and Engineering Research Council of Canada (Discovery Grant) and Genome British Columbia (SIP011).

Author information

Affiliations

Authors

Contributions

S.M.K. and R.M.Z. conceived the method and developed the bioinformatics pipeline. S.M.K. performed the wet laboratory method development and experiments. E.A.S. performed Nanopore UMI sequencing of E.coli. R.H.K. assembled reference genomes. S.M.K., R.M.Z. and M.A. performed data analysis on method performance. D.M., Q.Z. and R.K. analyzed American Gut Project samples. S.M.K., R.M.Z. and M.A. wrote the first draft of the manuscript. All authors contributed to the content and revision of the manuscript.

Corresponding author

Correspondence to Mads Albertsen.

Ethics declarations

Competing interests

M.A., S.M.K. and R.H.K. are co-owners of DNASense ApS. The other authors declare no competing interests.

Additional information

Peer review information Lei Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Human research participants The American Gut Project relies primarily on crowd-sourced samples without active recruitment. This research was performed in accordance with the University of Colorado Boulder’s Institutional Review Board protocol number 12-0582 and the University of California San Diego’s Human Research Protection Program, protocol no. 141853.

Supplementary information

Supplementary Information

Supplementary Note, Figs. 1–18, Tables 1–13 and references.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Karst, S.M., Ziels, R.M., Kirkegaard, R.H. et al. High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing. Nat Methods (2021). https://doi.org/10.1038/s41592-020-01041-y

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing