High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing

Karst, Søren M.; Ziels, Ryan M.; Kirkegaard, Rasmus H.; Sørensen, Emil A.; McDonald, Daniel; Zhu, Qiyun; Knight, Rob; Albertsen, Mads

doi:10.1038/s41592-020-01041-y

Article
Published: 11 January 2021

High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing

Nature Methods volume 18, pages 165–169 (2021)Cite this article

37k Accesses
148 Citations
221 Altmetric
Metrics details

Subjects

Abstract

High-throughput amplicon sequencing of large genomic regions remains challenging for short-read technologies. Here, we report a high-throughput amplicon sequencing approach combining unique molecular identifiers (UMIs) with Oxford Nanopore Technologies (ONT) or Pacific Biosciences circular consensus sequencing, yielding high-accuracy single-molecule consensus sequences of large genomic regions. We applied our approach to sequence ribosomal RNA operon amplicons (~4,500 bp) and genomic sequences (>10,000 bp) of reference microbial communities in which we observed a chimera rate <0.02%. To reach a mean UMI consensus error rate <0.01%, a UMI read coverage of 15× (ONT R10.3), 25× (ONT R9.4.1) and 3× (Pacific Biosciences circular consensus sequencing) is needed, which provides a mean error rate of 0.0042%, 0.0041% and 0.0007%, respectively.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Dual UMI-tagging approach for long-read amplicon sequencing.**

**Fig. 2: Error profiling of long-read amplicon sequencing strategies.**

**Fig. 3: BLAST-based consensus taxonomic assignment against the Web of Life database for whole rRNA operons, using the combination of 16S and 23S rRNAs and individual rRNA genes.**

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Genome assembly in the telomere-to-telomere era

Article 22 April 2024

Nanopore sequencing technology, bioinformatics and applications

Article 08 November 2021

Data availability

Raw and assembled sequencing data are available at the European Nucleotide Archive under the project number PRJEB32674. A complete data overview is in Supplementary Table 11 and data yield is in Supplementary Table 12.

Public data used in this study include SILVA 138.1 SSURef Nr99 database (https://www.arb-silva.de/), gene-specific databases from the Web of Life (https://github.com/biocore/wol), Greengenes 13_5 database (https://greengenes.secondgenome.com/), EMP 16S V4 Deblur sOTU profiles (https://github.com/biocore/emp and https://github.com/biocore/redbiom), reference sequences for ZymoBIOMICS Microbial Community Standard D6300 (https://s3.amazonaws.com/zymo-files/BioPool/ZymoBIOMICS.STD.refseq.v2.zip), raw ONT sequencing data from ZymoBIOMICS Microbial Community Standard D6300 (ENA accession ERR2887847), Illumina sequencing data from ZymoBIOMICS Microbial Community Standard D6300 (ENA accessions: ERR2935851, ERR2935850, ERR2935852, ERR2935857, ERR2935854, ERR2935853, ERR2935848 and ERR2935849) and E. coli str_K12_MG1655 genome (NCBI: U00096.3).

Code availability

Source code and analysis scripts are freely available at https://github.com/SorenKarst/longread_umi. The repository release version used to generate the data in this article was v.0.4.2.

References

Meldrum, C., Doyle, M. A. & Tothill, R. W. Next-generation sequencing for cancer diagnostics: a practical perspective. Clin. Biochem. Rev. 32, 177–195 (2011).
PubMed PubMed Central Google Scholar
Guibert, N. et al. Amplicon-based next-generation sequencing of plasma cell-free DNA for detection of driver and resistance mutations in advanced non-small cell lung cancer. Ann. Oncol. 29, 1049–1055 (2018).
Article CAS PubMed PubMed Central Google Scholar
Campbell, P. J. et al. Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proc. Natl Acad. Sci. USA 105, 13081–13086 (2008).
Article CAS PubMed PubMed Central Google Scholar
Goldsmith, D. B., Parsons, R. J., Beyene, D., Salamon, P. & Breitbart, M. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea. PeerJ 3, e997 (2015).
Article PubMed PubMed Central Google Scholar
Adriaenssens, E. M. & Cowan, D. A. Using signature genes as tools to assess environmental viral ecology and diversity. Appl. Environ. Microbiol. 80, 4470–4480 (2014).
Article PubMed PubMed Central Google Scholar
Uyaguari-Diaz, M. I. et al. A comprehensive method for amplicon-based and metagenomic characterization of viruses, bacteria, and eukaryotes in freshwater samples. Microbiome 4, 20 (2016).
Article PubMed PubMed Central Google Scholar
Caporaso, J. G. et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc. Natl Acad. Sci. USA 108, 4516–4522 (2011).
Article CAS PubMed Google Scholar
Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
Article CAS PubMed Google Scholar
Johnson, J. S. et al. Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nat. Commun. 10, 5029 (2019).
Article PubMed PubMed Central Google Scholar
Hiatt, J. B., Patwardhan, R. P., Turner, E. H., Lee, C. & Shendure, J. Parallel, tag-directed assembly of locally derived short sequence reads. Nat. Methods 7, 119–122 (2010).
Article CAS PubMed PubMed Central Google Scholar
Stapleton, J. A. et al. Haplotype-phased synthetic long reads from short-read sequencing. PLoS ONE 11, e0147229 (2016).
Article PubMed PubMed Central Google Scholar
Wick, R. R., Judd, L. M. & Holt, K. E. Deepbinner: demultiplexing barcoded Oxford Nanopore reads with deep convolutional neural networks. PLoS Comput. Biol. 14, e1006583 (2018).
Article PubMed PubMed Central Google Scholar
Ardui, S., Ameur, A., Vermeesch, J. R. & Hestand, M. S. Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics. Nucleic Acids Res. 46, 2159–2168 (2018).
Article CAS PubMed PubMed Central Google Scholar
Karlsson, K. & Linnarsson, S. Single-cell mRNA isoform diversity in the mouse brain. BMC Genomics 18, 126 (2017).
Article PubMed PubMed Central Google Scholar
Gupta, I. et al. Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells. Nat. Biotechnol. 36, 1197–1202 (2018).
Article CAS Google Scholar
Russell, A. B., Elshina, E., Kowalsky, J. R., Te Velthuis, A. J. W. & Bloom, J. D. Single-cell virus sequencing of influenza infections that trigger innate immunity. J. Virol. https://doi.org/10.1128/JVI.00500-19 (2019).
Burke, C. M. & Darling, A. E. A method for high precision sequencing of near full-length 16S rRNA genes on an Illumina MiSeq. PeerJ 4, e2492 (2016).
Article PubMed PubMed Central Google Scholar
Bowden, R. et al. Sequencing of human genomes with nanopore technology. Nat. Commun. 10, 1869 (2019).
Article PubMed PubMed Central Google Scholar
Gilpatrick, T. et al. Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat. Biotechnol. 38, 433–438 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kivioja, T. et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9, 72–74 (2011).
Article PubMed Google Scholar
Sze, M. A. & Schloss, P. D. The impact of DNA polymerase and number of rounds of amplification in PCR on 16S rRNA gene sequence data. mSphere https://doi.org/10.1016/j.mimet.2020.106033 (2019).
McDonald, D. et al. American Gut: an Open platform for citizen science microbiome research. mSystems https://doi.org/10.1128/mSystems.00031-18 (2018).
Zhu, Q. et al. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains bacteria and archaea. Nat. Commun. 10, 5477 (2019).
Article CAS PubMed PubMed Central Google Scholar
de Oliveira Martins, L., Page, A. J., Mather, A. E. & Charles, I. G. Taxonomic resolution of the ribosomal RNA operon in bacteria: implications for its use with long-read sequencing. NAR Genom Bioinform https://doi.org/10.1093/nargab/lqz016 (2020).
Fu, G. K., Hu, J., Wang, P.-H. & Fodor, S. P. A. Counting individual DNA molecules by the stochastic attachment of diverse labels. Proc. Natl Acad. Sci. USA 108, 9026–9031 (2011).
Article CAS PubMed PubMed Central Google Scholar
Li, C. et al. INC-Seq: accurate single molecule reads using nanopore sequencing. Gigascience 5, 34 (2016).
Article CAS PubMed PubMed Central Google Scholar
Volden, R. et al. Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proc. Natl Acad. Sci. USA 115, 9726–9731 (2018).
Article CAS PubMed PubMed Central Google Scholar
Calus, S. T., Ijaz, U. Z. & Pinto, A. J. NanoAmpli-Seq: a workflow for amplicon sequencing for mixed microbial communities on the nanopore sequencing platform. Gigascience 7, 1–16 (2018).
Article CAS Google Scholar
Callahan, B. J. et al. High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution. Nucleic Acids Res. 47, e103 (2019).
Article CAS PubMed PubMed Central Google Scholar
Hathaway, N. J., Parobek, C. M., Juliano, J. J. & Bailey, J. A. SeekDeep: single-base resolution de novo clustering for amplicon deep sequencing. Nucleic Acids Res. 46, e21 (2018).
Article CAS PubMed Google Scholar
Edgar, R. C. UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing. Preprint at bioRxiv https://doi.org/10.1101/081257 (2016).
Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).
Article CAS PubMed PubMed Central Google Scholar
Nicholls, S. M., Quick, J. C., Tang, S. & Loman, N. J. Ultra-deep, long-read nanopore sequencing of mock microbial community standards. Gigascience 8, 1–7 (2019).
Article CAS Google Scholar
Sevim, V. et al. Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies. Sci. Data 6, 285 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wright, E. S., Yilmaz, L. S. & Noguera, D. R. DECIPHER, a search-based approach to chimera identification for 16S rRNA sequences. Appl. Environ. Microbiol. 78, 717–725 (2012).
Article CAS PubMed PubMed Central Google Scholar
Klindworth, A. et al. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 41, e1 (2013).
Article CAS PubMed Google Scholar
Hunt, D. E. et al. Evaluation of 23S rRNA PCR primers for use in phylogenetic studies of bacterial diversity. Appl. Environ. Microbiol. 72, 2221–2225 (2006).
Article CAS PubMed PubMed Central Google Scholar
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).
Article Google Scholar
Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).
Article CAS PubMed Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Vaser, R., Sović, I., Nagarajan, N. & Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).
Article CAS PubMed PubMed Central Google Scholar
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Article CAS PubMed PubMed Central Google Scholar
Tange, O. Gnu Parallel 20150322 (’Hellwig’). USENIX Magazine https://doi.org/10.5281/ZENODO.16303 (2015).
R Team. R: A language and environment for statistical computing (2018).
R Team. RStudio: integrated development for R. http://www.rstudio.com (2015).
Wickham, H. Tidyverse: easily install and load the ‘Tidyverse’. R package v.1.2. 1 (2017).
DebRoy, H. P., Aboyoun, P., Gentleman, R. & DebRoy, S. Biostrings: Efficient manipulation of biological strings. https://bioconductor.org/packages/Biostrings (2018).
Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017).
Article CAS PubMed PubMed Central Google Scholar
Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–3108 (2007).
Article CAS PubMed PubMed Central Google Scholar
Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahé, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016).
Article PubMed PubMed Central Google Scholar
Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).
Article CAS PubMed PubMed Central Google Scholar
McDonald, D. et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 6, 610–618 (2012).
Article CAS PubMed Google Scholar
McDonald, D. et al. redbiom: a rapid sample discovery and feature characterization system. mSystems https://doi.org/10.1128/mSystems.00215-19 (2019).
Parada, A. E., Needham, D. M. & Fuhrman, J. A. Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. Environ. Microbiol. 18, 1403–1414 (2016).
Article CAS PubMed Google Scholar
Gonzalez, A. et al. Qiita: rapid, web-enabled microbiome meta-analysis. Nat. Methods 15, 796–798 (2018).
Article CAS PubMed PubMed Central Google Scholar
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Article Google Scholar
Virtanen, P. et al. SciPy 1.0—Fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Article CAS PubMed PubMed Central Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The study was funded by research grants from VILLUM FONDEN (15510) and the Poul Due Jensen Foundation (Microflora Danica). R.M.Z. was funded by grants from the Natural Sciences and Engineering Research Council of Canada (Discovery Grant) and Genome British Columbia (SIP011).

Author information

These authors contributed equally: Søren M. Karst, Ryan M. Ziels.

Authors and Affiliations

Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
Søren M. Karst, Rasmus H. Kirkegaard, Emil A. Sørensen & Mads Albertsen
Department of Civil Engineering, The University of British Columbia, Vancouver, British Columbia, Canada
Ryan M. Ziels
Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
Daniel McDonald, Qiyun Zhu & Rob Knight
Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
Rob Knight
Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
Rob Knight
Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
Rob Knight

Authors

Søren M. Karst
View author publications
You can also search for this author in PubMed Google Scholar
Ryan M. Ziels
View author publications
You can also search for this author in PubMed Google Scholar
Rasmus H. Kirkegaard
View author publications
You can also search for this author in PubMed Google Scholar
Emil A. Sørensen
View author publications
You can also search for this author in PubMed Google Scholar
Daniel McDonald
View author publications
You can also search for this author in PubMed Google Scholar
Qiyun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Rob Knight
View author publications
You can also search for this author in PubMed Google Scholar
Mads Albertsen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.M.K. and R.M.Z. conceived the method and developed the bioinformatics pipeline. S.M.K. performed the wet laboratory method development and experiments. E.A.S. performed Nanopore UMI sequencing of E. coli. R.H.K. assembled reference genomes. S.M.K., R.M.Z. and M.A. performed data analysis on method performance. D.M., Q.Z. and R.K. analyzed American Gut Project samples. S.M.K., R.M.Z. and M.A. wrote the first draft of the manuscript. All authors contributed to the content and revision of the manuscript.

Corresponding author

Correspondence to Mads Albertsen.

Ethics declarations

Competing interests

M.A., S.M.K. and R.H.K. are co-owners of DNASense ApS. The other authors declare no competing interests.

Additional information

Peer review information Lei Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Human research participants The American Gut Project relies primarily on crowd-sourced samples without active recruitment. This research was performed in accordance with the University of Colorado Boulder’s Institutional Review Board protocol number 12-0582 and the University of California San Diego’s Human Research Protection Program, protocol no. 141853.

Supplementary information

Supplementary Information

Supplementary Note, Figs. 1–18, Tables 1–13 and references.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karst, S.M., Ziels, R.M., Kirkegaard, R.H. et al. High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing. Nat Methods 18, 165–169 (2021). https://doi.org/10.1038/s41592-020-01041-y

Download citation

Received: 10 January 2020
Accepted: 03 December 2020
Published: 11 January 2021
Issue Date: February 2021
DOI: https://doi.org/10.1038/s41592-020-01041-y

This article is cited by

Correcting PCR amplification errors in unique molecular identifiers to generate accurate numbers of sequencing molecules
- Jianfeng Sun
- Martin Philpott
- Adam P. Cribbs
Nature Methods (2024)
Exploiting long read sequencing to detect azole fungicide resistance mutations in Pyrenophora teres using unique molecular identifiers
- Katherine G. Zulak
- Lina Farfan-Caceres
- Francisco J. Lopez-Ruiz
Scientific Reports (2024)
Establishment of microbial model communities capable of removing trace organic chemicals for biotransformation mechanisms research
- Lijia Cao
- Sarahi L. Garcia
- Christian Wurzbacher
Microbial Cell Factories (2023)
In it for the long run: perspectives on exploiting long-read sequencing in livestock for population scale studies of structural variants
- Tuan V. Nguyen
- Christy J. Vander Jagt
- Iona M. MacLeod
Genetics Selection Evolution (2023)
Highly-multiplexed and efficient long-amplicon PacBio and Nanopore sequencing of hundreds of full mitochondrial genomes
- Benjamin R. Karin
- Selene Arellano
- Jimmy A. McGuire
BMC Genomics (2023)

High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing

Subjects

Abstract

Access options

Similar content being viewed by others

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Genome assembly in the telomere-to-telomere era

Nanopore sequencing technology, bioinformatics and applications

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

About this article

Cite this article

This article is cited by

Correcting PCR amplification errors in unique molecular identifiers to generate accurate numbers of sequencing molecules

Exploiting long read sequencing to detect azole fungicide resistance mutations in Pyrenophora teres using unique molecular identifiers

Establishment of microbial model communities capable of removing trace organic chemicals for biotransformation mechanisms research

In it for the long run: perspectives on exploiting long-read sequencing in livestock for population scale studies of structural variants

Highly-multiplexed and efficient long-amplicon PacBio and Nanopore sequencing of hundreds of full mitochondrial genomes

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links