Detecting contamination in viromes using ViromeQC

Zolfo, Moreno; Pinto, Federica; Asnicar, Francesco; Manghi, Paolo; Tett, Adrian; Bushman, Frederic D.; Segata, Nicola

doi:10.1038/s41587-019-0334-5

Correspondence
Published: 20 November 2019

Detecting contamination in viromes using ViromeQC

Nature Biotechnology volume 37, pages 1408–1412 (2019)Cite this article

4425 Accesses
56 Citations
58 Altmetric
Metrics details

Subjects

Metagenomics

Access through your institution

Buy or subscribe

To the Editor — Eukaryotic viruses and bacteriophages have important roles in microbiomes, but characterization of viruses in metagenomics data is difficult. Viral-like particle (VLP) purification enables enrichment for viruses from microbiome samples before sequencing, but contamination can result in misleading conclusions. We present a software tool named ViromeQC for analyzing virome data. Here, we demonstrate the utility of ViromeQC by applying it to 2,050 human, animal and environmental samples from 35 metagenomic virome sequencing studies that used one of the available VLP enrichment techniques. The resulting analysis reveals these viromes to be rife with bacterial, archaeal and fungal contamination. Most samples show only modest virus enrichment, and such enrichment is very variable between viromes in the same study. To address these issues, we present a validated contamination quality-control pipeline to enable more robust virome metagenomic analyses.

Viruses affect the ecology and composition of microbial communities^1,2. Bacteriophages (viruses of bacteria and archaea) are extremely abundant and diverse, and they affect microbiomes in several ways, including transduction, which is an important mechanism of lateral gene transfer³. Metagenomics can be used to characterize phage populations, but phage are so diverse, and evolve so rapidly, that they are poorly represented in sequence databases. Also, there are no universal viral genetic markers, and the overall biomass of viruses, compared with that of other microorganisms in a sample, is low. For these reasons, phage sequences are difficult to identify in metagenomes, although specific methods that are partly based on sequence characteristics of known phages have been reported^4,5.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
- Ling-Yi Wu
- , Yasas Wijesekara
- … Bas E. Dutilh
Genome Biology Open Access 15 April 2024
A distinct Fusobacterium nucleatum clade dominates the colorectal cancer niche
- Martha Zepeda-Rivera
- , Samuel S. Minot
- … Christopher D. Johnston
Nature Open Access 20 March 2024
Eco-evolutionary dynamics of gut phageome in wild gibbons (Hoolock tianxing) with seasonal diet variations
- Shao-Ming Gao
- , Han-Lan Fei
- … Peng-Fei Fan
Nature Communications Open Access 10 February 2024

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Survey of viral enrichment rates on 1,977 samples from 35 studies estimated as percentage of reads aligning to the small subunit rRNA gene.**

**Fig. 2: Combined quantification of ribosomal genes and genes encoding universal proteins identifies the cross-study set of 101 samples with >100× VLP enrichment.**

Data availability

The raw reads analyzed in this study are available using accession numbers provided in Supplementary Tables 1 and 2.

Code availability

Code and documentation are available at http://segatalab.cibio.unitn.it/tools/viromeqc.

References

Shkoporov, A. N. & Hill, C. Cell Host Microbe 25, 195–209 (2019).
Article CAS Google Scholar
Suttle, C. A. Nat. Rev. Microbiol. 5, 801–812 (2007).
Article CAS Google Scholar
Wang, X. et al. Nat. Commun. 1, 147 (2010).
Article Google Scholar
Roux, S., Enault, F., Hurwitz, B. L. & Sullivan, M. B. PeerJ 3, e985 (2015).
Article Google Scholar
Ren, J., Ahlgren, N. A., Lu, Y. Y., Fuhrman, J. A. & Sun, F. Microbiome 5, 69 (2017).
Article Google Scholar
Thurber, R. V., Haynes, M., Breitbart, M., Wegley, L. & Rohwer, F. Nat. Protoc. 4, 470–483 (2009).
Article CAS Google Scholar
Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J. & Segata, N. Nat. Biotechnol. 35, 833–844 (2017).
Article CAS Google Scholar
Reyes, A. et al. Nature 466, 334–338 (2010).
Article CAS Google Scholar
McCann, A. et al. PeerJ 6, e4694 (2018).
Article Google Scholar
Roux, S. et al. Nature 537, 689–693 (2016).
Article CAS Google Scholar
Watkins, S. C. et al. Mar. Freshw. Res. 67, 1700–1708 (2016).
Article Google Scholar
Rosario, K., Fierer, N., Miller, S., Luongo, J. & Breitbart, M. Environ. Sci. Technol. 52, 1014–1027 (2018).
Article CAS Google Scholar
Roux, S., Krupovic, M., Debroas, D., Forterre, P. & Enault, F. Open Biol. 3, 130160 (2013).
Article Google Scholar
Minot, S. et al. Genome Res. 21, 1616–1625 (2011).
Article CAS Google Scholar
Emerson, J. B. et al. Appl. Environ. Microbiol. 78, 6309–6320 (2012).
Article CAS Google Scholar
Minot, S. et al. Proc. Natl. Acad. Sci. USA 110, 12450–12455 (2013).
Article CAS Google Scholar
Kim, Y., Aw, T. G., Teal, T. K. & Rose, J. B. Environ. Sci. Technol. 49, 8396–8407 (2015).
Article CAS Google Scholar
Ly, M. et al. Microbiome 4, 64 (2016).
Article Google Scholar
Reyes, A. et al. Proc. Natl. Acad. Sci. USA 112, 11941–11946 (2015).
Article CAS Google Scholar
Roux, S. et al. PLoS One 7, e33641 (2012).
Article CAS Google Scholar
Weynberg, K. D., Wood-Charlson, E. M., Suttle, C. A. & van Oppen, M. J. H. Front. Microbiol. 5, 206 (2014).
Article Google Scholar
Hannigan, G.D. et al. MBio 6, e01578–15 (2015).
Article CAS Google Scholar
Aguirre de Cárcer, D., López-Bueno, A., Alonso-Lobo, J. M., Quesada, A. & Alcamí, A. FEMS Microbiol. Ecol. 92, fiw074 (2016).
Article Google Scholar
Shkoporov, A. N. et al. Microbiome 6, 68 (2018).
Article Google Scholar
Pasolli, E. et al. Nat. Methods 14, 1023–1024 (2017).
Article CAS Google Scholar
Leinonen, R., Sugawara, H. & Shumway, M. & International Nucleotide Sequence Database Collaboration. Nucleic Acids Res. 39, D19–D21 (2011).
Article CAS Google Scholar
Zolfo, M., Tett, A., Jousson, O., Donati, C. & Segata, N. Nucleic Acids Res. 45, e7 gkw837 (2016).
Quince, C. et al. Genome Biol. 18, 181 (2017).
Article Google Scholar
Wu, M. & Scott, A. J. Bioinformatics 28, 1033–1034 (2012).
Article CAS Google Scholar
Mizuno, C. M. et al. Nat. Commun. 10, 752 (2019).
Article Google Scholar

Download references

Acknowledgements

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 716575) to N.S. The work was also supported by MIUR ‘Futuro in Ricerca’’ RBFR13EWWI_001 and by the European Union (H2020-SFS-2018-1 project MASTER-818368 and H2020-SC1-BHC project ONCOBIOME-825410) to N.S.

Author information

Authors and Affiliations

Department CIBIO, University of Trento, Trento, Italy
Moreno Zolfo, Federica Pinto, Francesco Asnicar, Paolo Manghi, Adrian Tett & Nicola Segata
Department of Microbiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
Frederic D. Bushman

Authors

Moreno Zolfo
View author publications
You can also search for this author in PubMed Google Scholar
Federica Pinto
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Asnicar
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Manghi
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Tett
View author publications
You can also search for this author in PubMed Google Scholar
Frederic D. Bushman
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Segata
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Study conception and design: M.Z. and N.S. Methodology and analysis: M.Z., F.P., F.A., A.T., F.B. and N.S. Public datasets collection and curation: M.Z. and P.M. All authors contributed to the writing of the final manuscript.

Corresponding author

Correspondence to Nicola Segata.

Ethics declarations

Competing interests

The authors declare no competing interests.

Supplementary information

Supplementary Materials

Supplementary Methods, Supplementary Note 1 and Supplementary Figures 1–7

Supplementary Table 1

Summary of the 2,050 virome datasets considered in the analysis. Dataset sample sizes are related to the actual number of samples that could be classified as DNA VLP viromes according to the available metadata. The reference number refers to Fig. 1. Fig. 2d and Supplementary Fig. 1.

Supplementary Table 2

Summary of the 2,189 metagenomes and 109 synthetic metagenomes and mock communities considered in the analysis. Dataset sample sizes are related to the actual number of samples that could be classified as DNA metagenomes according to the available metadata. The reference number refers to Fig. 1. Fig. 2d and Supplementary Fig. 1.

Supplementary Table 3

Full dataset of metagenomes and viromes. Contaminant abundances and enrichment data for all the 1,871 metagenomes, 1,670 viromes and 109 synthetic and mock communities that passed all quality controls. Sample type and number of starting reads are provided, as well as the percentage of SSU and LSU rRNAs stratified by life domain.

Supplementary Table 4

Validation of the rRNA mapping approach. Expected abundances of 16S rRNA genes are reported for the 108 synthetic and mock communities (tab 1) and 917 16S amplicon sequencing samples (tab 2). Control metagenomes and 16S samples were mapped against the SSU rRNA genes and filtered at different stringency thresholds (see Supplementary Methods). For the amplicon 16S samples at the expected value was set to 100%. The selected threshold is highlighted in blue. The composition of each synthetic metagenome is reported in tab 3. The rRNA abundances in RNA viromes are reported in tab 4.

Supplementary Table 5

Detection of single-copy bacterial markers in viral genomes. Number of genomes in each database in which the 31 single-copy markers are detected. The IMG/VR database was split into isolate viruses and uncultivated viruses (tab 1). Number of distinct single-copy markers detected in each database (tab 2).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zolfo, M., Pinto, F., Asnicar, F. et al. Detecting contamination in viromes using ViromeQC. Nat Biotechnol 37, 1408–1412 (2019). https://doi.org/10.1038/s41587-019-0334-5

Download citation

Published: 20 November 2019
Issue Date: December 2019
DOI: https://doi.org/10.1038/s41587-019-0334-5

This article is cited by

Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
- Ling-Yi Wu
- Yasas Wijesekara
- Bas E. Dutilh
Genome Biology (2024)
Eco-evolutionary dynamics of gut phageome in wild gibbons (Hoolock tianxing) with seasonal diet variations
- Shao-Ming Gao
- Han-Lan Fei
- Peng-Fei Fan
Nature Communications (2024)
A distinct Fusobacterium nucleatum clade dominates the colorectal cancer niche
- Martha Zepeda-Rivera
- Samuel S. Minot
- Christopher D. Johnston
Nature (2024)
Extracellular vesicles are the main contributor to the non-viral protected extracellular sequence space
- Dominik Lücking
- Coraline Mercier
- Susanne Erdmann
ISME Communications (2023)
Bifidobacteria define gut microbiome profiles of golden lion tamarin (Leontopithecus rosalia) and marmoset (Callithrix sp.) metagenomic shotgun pools
- Joanna Malukiewicz
- Mirela D’arc
- André F. A. Santos
Scientific Reports (2023)