Retrieval of a million high-quality, full-length microbial 16S and 18S rRNA gene sequences without primer bias

Karst, Søren M; Dueholm, Morten S; McIlroy, Simon J; Kirkegaard, Rasmus H; Nielsen, Per H; Albertsen, Mads

doi:10.1038/nbt.4045

Letter
Published: 01 January 2018

Retrieval of a million high-quality, full-length microbial 16S and 18S rRNA gene sequences without primer bias

Nature Biotechnology volume 36, pages 190–195 (2018)Cite this article

19k Accesses
119 Citations
211 Altmetric
Metrics details

Subjects

Abstract

Small subunit ribosomal RNA (SSU rRNA) genes, 16S in bacteria and 18S in eukaryotes, have been the standard phylogenetic markers used to characterize microbial diversity and evolution for decades. However, the reference databases of full-length SSU rRNA gene sequences are skewed to well-studied ecosystems and subject to primer bias and chimerism, which results in an incomplete view of the diversity present in a sample. We combine poly(A)-tailing and reverse transcription of SSU rRNA molecules with synthetic long-read sequencing to generate high-quality, full-length SSU rRNA sequences, without primer bias, at high throughput. We apply our approach to samples from seven different ecosystems and obtain more than a million SSU rRNA sequences from all domains of life, with an estimated raw error rate of 0.17%. We observe a large proportion of novel diversity, including several deeply branching phylum-level lineages putatively related to the Asgard Archaea. Our approach will enable expansion of the SSU rRNA reference databases by orders of magnitude, and contribute to a comprehensive census of the tree of life.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Full-length SSU rRNA sequencing.**

**Figure 2: Coverage of the tree of life.**

**Figure 3: Coverage of the domain Archaea.**

Elucidation of genes enhancing natural product biosynthesis through co-evolution analysis

Article 12 April 2024

Genome assembly in the telomere-to-telomere era

Article 22 April 2024

Complexity of avian evolution revealed by family-level genomes

Article 01 April 2024

Accession codes

Primary accessions

European Nucleotide Archive

PRJEB22259

References

Giovannoni, S.J., Britschgi, T.B., Moyer, C.L. & Field, K.G. Genetic diversity in Sargasso Sea bacterioplankton. Nature 345, 60–63 (1990).
Article CAS Google Scholar
Ward, D.M., Weller, R. & Bateson, M.M. 16S rRNA sequences reveal numerous uncultured microorganisms in a natural community. Nature 345, 63–65 (1990).
Article CAS Google Scholar
Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).
Article CAS Google Scholar
Locey, K.J. & Lennon, J.T. Scaling laws predict global microbial diversity. Proc. Natl. Acad. Sci. USA 113, 5970–5975 (2016).
Article CAS Google Scholar
Goodwin, S., McPherson, J.D. & McCombie, W.R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
Article CAS Google Scholar
Singer, E. et al. High-resolution phylogenetic microbial community profiling. ISME J. 10, 2020–2032 (2016).
Article Google Scholar
Travers, K.J., Chin, C.-S., Rank, D.R., Eid, J.S. & Turner, S.W. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res. 38, e159 (2010).
Article Google Scholar
Schloss, P.D., Westcott, S.L., Jenior, M.L. & Highlander, S.K. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system. PeerJ Prepr. 3, e778v1 (2015).
Google Scholar
Li, C. et al. INC-Seq: accurate single molecule reads using nanopore sequencing. Gigascience 5, 34 (2016).
Article CAS Google Scholar
Burke, C.M. & Darling, A.E. A method for high precision sequencing of near full-length 16S rRNA genes on an Illumina MiSeq. PeerJ 4, e2492 (2016).
Article Google Scholar
Eloe-Fadrosh, E.A., Ivanova, N.N., Woyke, T. & Kyrpides, N.C. Metagenomics uncovers gaps in amplicon-based detection of microbial diversity. Nat. Microbiol. 1, 15032 (2016).
Article CAS Google Scholar
Klindworth, A. et al. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 41, e1 (2013).
Article CAS Google Scholar
Botero, L.M. et al. Poly(A) polymerase modification and reverse transcriptase PCR amplification of environmental RNA. Appl. Environ. Microbiol. 71, 1267–1275 (2005).
Article CAS Google Scholar
Hoshino, T. & Inagaki, F. A comparative study of microbial diversity and community structure in marine sediments using poly(A) tailing and reverse transcription-PCR. Front. Microbiol. 4, 160 (2013).
Article Google Scholar
Hiatt, J.B., Patwardhan, R.P., Turner, E.H., Lee, C. & Shendure, J. Parallel, tag-directed assembly of locally derived short sequence reads. Nat. Methods 7, 119–122 (2010).
Article CAS Google Scholar
Hong, L.Z. et al. BAsE-Seq: a method for obtaining long viral haplotypes from short sequence reads. Genome Biol. 15, 517 (2014).
Article Google Scholar
Stapleton, J.A. et al. Haplotype-phased synthetic long reads from short-read sequencing. PLoS One 11, e0147229 (2016).
Article Google Scholar
Keohavong, P. & Thilly, W.G. Fidelity of DNA polymerases in DNA amplification. Proc. Natl. Acad. Sci. USA 86, 9253–9257 (1989).
Article CAS Google Scholar
Haas, B.J. et al. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res. 21, 494–504 (2011).
Article CAS Google Scholar
Rosenberg, A., Sinai, L., Smith, Y. & Ben-Yehuda, S. Dynamic expression of the translational machinery during Bacillus subtilis life cycle at a single cell level. PLoS One 7, e41921 (2012).
Article CAS Google Scholar
Deutscher, M.P. Degradation of stable RNA in bacteria. J. Biol. Chem. 278, 45041–45044 (2003).
Article CAS Google Scholar
Schuch, W. & Loening, U.E. The ribosomal ribonucleic acid of Agrobacterium tumefaciens. Biochem. J. 149, 17–22 (1975).
Article CAS Google Scholar
Springer, N. et al. Occurrence of fragmented 16S rRNA in an obligate bacterial endosymbiont of Paramecium caudatum. Proc. Natl. Acad. Sci. USA 90, 9892–9895 (1993).
Article CAS Google Scholar
Quail, M.A., Swerdlow, H. & Turner, D.J. Improved protocols for the illumina genome analyzer sequencing system. Curr. Protoc. Hum. Genet. 62, 18.2.1–18.2.27 (2009).
Google Scholar
Walters, W.A. et al. PrimerProspector: de novo design and taxonomic analysis of barcoded polymerase chain reaction primers. Bioinformatics 27, 1159–1161 (2011).
Article CAS Google Scholar
Yarza, P. et al. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat. Rev. Microbiol. 12, 635–645 (2014).
Article CAS Google Scholar
Rinke, C. et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431–437 (2013).
Article CAS Google Scholar
Brown, C.T. et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523, 208–211 (2015).
Article CAS Google Scholar
Zaremba-Niedzwiedzka, K. et al. Asgard archaea illuminate the origin of eukaryotic cellular complexity. Nature 541, 353–358 (2017).
Article CAS Google Scholar
Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013).
Article CAS Google Scholar
Yilmaz, P. et al. The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks. Nucleic Acids Res. 42, D643–D648 (2014).
Article CAS Google Scholar
Geisen, S., Laros, I., Vizcaíno, A., Bonkowski, M. & de Groot, G.A. Not all are free-living: high-throughput DNA metabarcoding reveals a diverse community of protists parasitizing soil metazoa. Mol. Ecol. 24, 4556–4569 (2015).
Article CAS Google Scholar
Fiore-Donno, A.M., Weinert, J., Wubet, T. & Bonkowski, M. Metacommunity analysis of amoeboid protists in grassland soils. Sci. Rep. 6, 19068 (2016).
Article CAS Google Scholar
Geisen, S. et al. Metatranscriptomic census of active protists in soils. ISME J. 9, 2178–2190 (2015).
Article CAS Google Scholar
Rosenberg, K. et al. Soil amoebae rapidly change bacterial community composition in the rhizosphere of Arabidopsis thaliana. ISME J. 3, 675–684 (2009).
Article CAS Google Scholar
Schloss, P.D., Girard, R.A., Martin, T., Edwards, J. & Thrash, J.C. Status of the archaeal and bacterial census: an update. MBio 7, e00201–e00216 (2016).
Article CAS Google Scholar
Chen, T. et al. The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information. Database 2010, baq013 (2010).
Article Google Scholar
McIlroy, S.J. et al. MiDAS: the field guide to the microbes of activated sludge. Database 2015, bav062 (2015).
Article Google Scholar
Ritari, J., Salojärvi, J., Lahti, L. & de Vos, W.M. Improved taxonomic assignment of human intestinal 16S rRNA sequences by a dedicated reference database. BMC Genomics 16, 1056 (2015).
Article Google Scholar
Gruber-Dorninger, C. et al. Functionally relevant diversity of closely related Nitrospira in activated sludge. ISME J. 9, 643–655 (2015).
Article CAS Google Scholar
Hug, L.A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).
Article CAS Google Scholar
Price, M.N., Dehal, P.S. & Arkin, A.P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS One 5, e9490 (2010).
Article Google Scholar
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Article CAS Google Scholar
Jacobs, M.A. et al. Comprehensive transposon mutant library of Pseudomonas aeruginosa. Proc. Natl. Acad. Sci. USA 100, 14339–14344 (2003).
Article CAS Google Scholar
Albertsen, M., Karst, S.M., Ziegler, A.S., Kirkegaard, R.H. & Nielsen, P.H. Back to basics—the influence of DNA extraction and primer choice on phylogenetic analysis of activated sludge communities. PLoS One 10, e0132783 (2015).
Article Google Scholar
Tessier, D.C., Brousseau, R. & Vernet, T. Ligation of single-stranded oligodeoxyribonucleotides by T4 RNA ligase. Anal. Biochem. 158, 171–178 (1986).
Article CAS Google Scholar
Schloss, P.D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009).
Article CAS Google Scholar
Edgar, R.C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).
Article CAS Google Scholar
Pruesse, E., Peplies, J. & Glöckner, F.O. SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics 28, 1823–1829 (2012).
Article CAS Google Scholar
R Core Team. R. A language and environment for statistical computing (2016).
RStudio Team. RStudio: Integrated Development Environment for R. (2015).
Wickham, H. tidyverse: Easily Install and Load 'Tidyverse' Packages. (2016).
Oksanen, J. et al. vegan: Community Ecology Package. R package version 2.3–0 (2015).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10 (2011).
Article Google Scholar
Ludwig, W. et al. ARB: a software environment for sequence data. Nucleic Acids Res. 32, 1363–1371 (2004).
Article CAS Google Scholar
Miller, M.A., Pfeiffer, W. & Schwartz, T. Creating the CIPRES Science Gateway for inference of large phylogenetic trees. Proceedings of the Gateway Computing Environments Workshop, 14 November 2010, New Orleans, 1–8 (2010).

Download references

Acknowledgements

The study was funded by the Danish Research Council for Independent Research (FTP, grant 6111-00617B), the Innovation Fund Denmark (1305-00018B, NomiGas), the Villum Foundation (grant VKR 022796 and 13351), and the Poul Due Jensen (Grundfos) Foundation. S.J.M. was supported by a Danish Council for Independent Research grant (no. 4093-00127A). M.A. was supported by a research grant (15510) from VILLUM FONDEN. We thank H. Daims and M. Wagner for insightful discussions of the manuscript.

Author information

Søren M Karst and Morten S Dueholm: These authors contributed equally to this work.a

Authors and Affiliations

Department of Chemistry and Bioscience, Center for Microbial Communities, Aalborg University, Denmark
Søren M Karst, Morten S Dueholm, Simon J McIlroy, Rasmus H Kirkegaard, Per H Nielsen & Mads Albertsen

Authors

Søren M Karst
View author publications
You can also search for this author in PubMed Google Scholar
Morten S Dueholm
View author publications
You can also search for this author in PubMed Google Scholar
Simon J McIlroy
View author publications
You can also search for this author in PubMed Google Scholar
Rasmus H Kirkegaard
View author publications
You can also search for this author in PubMed Google Scholar
Per H Nielsen
View author publications
You can also search for this author in PubMed Google Scholar
Mads Albertsen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.M.K., M.S.D. and M.A. conceived the method. S.M.K. and M.S.D. performed wet lab method development and experiments. R.H.K. performed Nanopore sequencing and data analysis. S.M.K. and M.A. developed the bioinformatics pipeline and performed data analysis. S.J.M. performed the phylogenetic analysis. S.M.K., M.S.D., S.J.M., R.H.K., P.H.N. and M.A. wrote the manuscript.

Corresponding author

Correspondence to Mads Albertsen.

Ethics declarations

Competing interests

M.A., S.M.K., R.H.K., and P.H.N. are co-owners of DNASense ApS. The other authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Detailed overview of the primer-free full-length SSU rRNA library preparation.

Detailed overview of the primer-free full-length SSU rRNA library preparation.

Supplementary Figure 2 Detailed overview of the primer-based full-length SSU rRNA library preparation.

Detailed overview of the primer-based full-length SSU rRNA library preparation.

Supplementary Figure 3 rRNA length distribution after adapter trimming.

rRNA length distribution after adapter trimming.

Supplementary Figure 4 Maximum-likelihood phylogenetic tree showing coverage of the domain Bacteria.

Maximum-likelihood phylogenetic tree showing coverage of the domain Bacteria. The tree includes all bacterial OTUs clustered at 97% generated in this study, their closest match in the Silva SSU NR99 v. 128 database and the reference set from the recent Tree of Life article (Hug et al., 2016). Hypervariable regions were masked with a 40% positional conservation filter, giving 1392 alignment positions, and the tree calculated using FastTree v. 2.1.3 SSE3 (Price et al., 2010). Clade names and clustering are based on the position of reference sequences. *Indicates clades that do not include a genome or pure culture reference sequence – being based on classification of reference sequences in the Silva v. 128 taxonomy. Reference sequences appear black whilst those generated in the current study are color coded based on their similarity to existing database sequences.

Supplementary Figure 5 Rarefaction curves for the different samples split based on kingdom.

Rarefaction curves for the different samples split based on kingdom.

Supplementary Figure 6 Maximum-likelihood phylogenetic tree showing coverage of the domain Archaea.

Maximum-likelihood phylogenetic tree showing coverage of the domain Archaea. The tree includes all archaeal OTUs clustered at 97% generated in this study, their closest match in the Silva SSU NR99 v. 128 database and the reference set from the recent Tree of Life article (Hug et al., 2016). Hypervariable regions were masked with a 40% positional conservation filter, giving 1257 alignment positions, and the tree calculated using FastTree v. 2.1.3 S SE3 (Price et al., 2010). Clade names and clustering are based on the position of reference sequences. *Indicates clades that do not include a genome or pure culture reference sequence – being based on classification of reference sequences in the Silva v. 1.28 taxonomy. Reference sequences appear black whilst those generated in the current study are color coded based on their similarity to existing database sequences.

Supplementary Figure 7 Maximum-likelihood phylogenetic tree showing coverage of the domain Eukarya.

Maximum-likelihood phylogenetic tree showing coverage of the domain Eukarya. The tree includes all eukaryotic OTUs clustered at 97% generated in this study, their closest match in the Silva SSU NR99 v. 128 database and the reference set from the recent Tree of Life article (Hug et al., 2016). Hypervariable regions were masked with a 40% positional conservation filter, giving 1548 alignment positions, and the tree calculated using FastTree v. 2.1.3 S SE3 (Price et al., 2010). Clade names and clustering are based on the position of reference sequences. Reference sequences appear black whilst those generated in the current study are color coded based on their similarity to existing database sequences.

Supplementary Figure 8 Error-correction of Oxford Nanopore MinION data using molecular tagging.

Error-correction of Oxford Nanopore MinION data using molecular tagging. Error-rate of the individual error-corrected consensus sequences as a function of the number of reads used to generate the consensus sequence. “1” represents the raw 2D reads.

Supplementary Figure 9 Data processing overview.

Data processing overview. An overview of the data processing steps, important data outputs and data types used for different analysis.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9 (PDF 1954 kb)

Life Sciences Reporting Summary (PDF 128 kb)

Supplementary Tables 1–13 (PDF 1352 kb)

Supplementary Notes (PDF 127 kb)

Supplementary Code

fSSu pipeline (TAR 8373 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karst, S., Dueholm, M., McIlroy, S. et al. Retrieval of a million high-quality, full-length microbial 16S and 18S rRNA gene sequences without primer bias. Nat Biotechnol 36, 190–195 (2018). https://doi.org/10.1038/nbt.4045

Download citation

Received: 08 December 2016
Accepted: 22 November 2017
Published: 01 January 2018
Issue Date: February 2018
DOI: https://doi.org/10.1038/nbt.4045

This article is cited by

Exploring the gut microbiota in patients with pre-diabetes and treatment naïve diabetes type 2 - a pilot study
- Kristin Gravdal
- Katrine H. Kirste
- Christina Casèn
BMC Endocrine Disorders (2023)
Expanded catalogue of metagenome-assembled genomes reveals resistome characteristics and athletic performance-associated microbes in horse
- Cunyuan Li
- Xiaoyue Li
- Shengwei Hu
Microbiome (2023)
The microbial dark matter and “wanted list” in worldwide wastewater treatment plants
- Yulin Zhang
- Yulin Wang
- Tong Zhang
Microbiome (2023)
An integrative protocol for one-step PCR amplicon library construction and accurate demultiplexing of pooled sequencing data
- Jiahao Ni
- Jiao Pan
- Weiyi Li
Marine Life Science & Technology (2023)
Ecosystem-specific microbiota and microbiome databases in the era of big data
- Victor Lobanov
- Angélique Gobet
- Alyssa Joyce
Environmental Microbiome (2022)