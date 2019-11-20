Detecting contamination in viromes using ViromeQC

Fig. 1: Survey of viral enrichment rates on 1,977 samples from 35 studies estimated as percentage of reads aligning to the small subunit rRNA gene.
Fig. 2: Combined quantification of ribosomal genes and genes encoding universal proteins identifies the cross-study set of 101 samples with >100× VLP enrichment.

Data availability

The raw reads analyzed in this study are available using accession numbers provided in Supplementary Tables 1 and 2.

Code availability

Code and documentation are available at http://segatalab.cibio.unitn.it/tools/viromeqc.

Acknowledgements

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 716575) to N.S. The work was also supported by MIUR ‘Futuro in Ricerca’’ RBFR13EWWI_001 and by the European Union (H2020-SFS-2018-1 project MASTER-818368 and H2020-SC1-BHC project ONCOBIOME-825410) to N.S.

Author information

Affiliations

  1. Department CIBIO, University of Trento, Trento, Italy

    • Moreno Zolfo
    • , Federica Pinto
    • , Francesco Asnicar
    • , Paolo Manghi
    • , Adrian Tett
    •  & Nicola Segata

  2. Department of Microbiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA

    • Frederic D. Bushman

Authors

  1. Search for Moreno Zolfo in:

  2. Search for Federica Pinto in:

  3. Search for Francesco Asnicar in:

  4. Search for Paolo Manghi in:

  5. Search for Adrian Tett in:

  6. Search for Frederic D. Bushman in:

  7. Search for Nicola Segata in:

Contributions

Study conception and design: M.Z. and N.S. Methodology and analysis: M.Z., F.P., F.A., A.T., F.B. and N.S. Public datasets collection and curation: M.Z. and P.M. All authors contributed to the writing of the final manuscript.

Corresponding author

Correspondence to Nicola Segata.

Ethics declarations

Competing interests

The authors declare no competing interests.

Supplementary information

Supplementary Materials

Supplementary Methods, Supplementary Note 1 and Supplementary Figures 1–7

Supplementary Table 1

Summary of the 2,050 virome datasets considered in the analysis. Dataset sample sizes are related to the actual number of samples that could be classified as DNA VLP viromes according to the available metadata. The reference number refers to Fig. 1. Fig. 2d and Supplementary Fig. 1.

Supplementary Table 2

Summary of the 2,189 metagenomes and 109 synthetic metagenomes and mock communities considered in the analysis. Dataset sample sizes are related to the actual number of samples that could be classified as DNA metagenomes according to the available metadata. The reference number refers to Fig. 1. Fig. 2d and Supplementary Fig. 1.

Supplementary Table 3

Full dataset of metagenomes and viromes. Contaminant abundances and enrichment data for all the 1,871 metagenomes, 1,670 viromes and 109 synthetic and mock communities that passed all quality controls. Sample type and number of starting reads are provided, as well as the percentage of SSU and LSU rRNAs stratified by life domain.

Supplementary Table 4

Validation of the rRNA mapping approach. Expected abundances of 16S rRNA genes are reported for the 108 synthetic and mock communities (tab 1) and 917 16S amplicon sequencing samples (tab 2). Control metagenomes and 16S samples were mapped against the SSU rRNA genes and filtered at different stringency thresholds (see Supplementary Methods). For the amplicon 16S samples at the expected value was set to 100%. The selected threshold is highlighted in blue. The composition of each synthetic metagenome is reported in tab 3. The rRNA abundances in RNA viromes are reported in tab 4.

Supplementary Table 5

Detection of single-copy bacterial markers in viral genomes. Number of genomes in each database in which the 31 single-copy markers are detected. The IMG/VR database was split into isolate viruses and uncultivated viruses (tab 1). Number of distinct single-copy markers detected in each database (tab 2).

