We present an extension of the Minimum Information about any (x) Sequence (MIxS) standard for reporting sequences of uncultivated virus genomes. Minimum Information about an Uncultivated Virus Genome (MIUViG) standards were developed within the Genomic Standards Consortium framework and include virus origin, genome quality, genome annotation, taxonomic classification, biogeographic distribution and in silico host prediction. Community-wide adoption of MIUViG standards, which complement the Minimum Information about a Single Amplified Genome (MISAG) and Metagenome-Assembled Genome (MIMAG) standards for uncultivated bacteria and archaea, will improve the reporting of uncultivated virus genomes in public databases. In turn, this should enable more robust comparative studies and a systematic exploration of the global virosphere.
Current estimates are that virus particles massively outnumber live cells in most habitats1,2, but only a tiny fraction of viruses have been cultivated in the laboratory. An unprecedented diversity of viruses are being discovered through culture-independent sequencing3. Progress has been made in reconstructing genomes of uncultivated viruses de novo, from biotic and abiotic environments, without laboratory isolation of the virus–host system. For example, in the past 2 years, more than 750,000 uncultivated virus genomes (UViGs) have been identified in metagenome and metatranscriptome datasets4,5,6,7,8,9, five times the total number of genomes sequenced from virus isolates (Fig. 1), and UViGs already represent ≥95% of the taxonomic diversity in publicly available virus sequences10,11. Although double-stranded DNA (dsDNA) genomes are over-represented in UViGs because most metagenomic protocols exclusively target dsDNA, UViGs nonetheless enable an assessment of global virus diversity and an evaluation of structure and drivers of viral communities. UViGs also contribute to improving our understanding of the evolutionary history of viruses and virus–host interactions.
Analysis and interpretation of standalone genomes present substantial challenges, whether the genomes are eukaryotic, bacterial, archaeal or viral. To address these challenges, MISAG and MIMAG standards were drafted to improve the quality of reporting of microbial genomes derived from single cell or metagenome sequences, which are often incomplete12. Although some aspects of MISAG and MIMAG can be applied to UViGs, the extraordinary diversity of viral genome composition and content, replication strategies, and hosts means that the completeness, quality, taxonomy and ecology of UViGs need to be evaluated via virus-specific metrics.
The Genomic Standards Consortium (http://gensc.org) maintains metadata checklists for MIxS, encompassing genome and metagenome sequences13, marker gene sequences14 and single amplified and metagenome-assembled bacterial and archaeal genomes12. Here we present a set of standards that extend the MIxS checklists to include identification, quality assessment, analysis and reporting of UViGs (Table 1 and Supplementary Tables 1 and 2), together with recommendations on how to perform these analyses. We provide a metadata checklist for database submission and publication of UViGs designed to be flexible enough to accommodate technological and methodological changes over time (Table 1 and Supplementary Table 1). The information gathered through the MIUViG checklist can be directly submitted with new UViG sequences to International Nucleotide Sequence Database Collaboration (INSDC) member databases—the DNA Database of Japan (DDBJ), the European Molecular Biology Laboratory–European Bioinformatics Institute (EMBL-EBI) and US National Center for Biotechnology Information (NCBI)—which will host and display checklist metadata alongside the UViG sequence. These MIUViG standards should also be used along with existing guidelines for virus genome analysis, including those issued by the International Committee on Taxonomy of Viruses (ICTV), which recently endorsed the incorporation of UViGs into the official virus classification scheme15 (https://talk.ictvonline.org). Although MIUViG standards and best practices were designed for genomes of viruses infecting microorganisms, they can also be applied to viruses infecting animals, fungi and plants, and are compatible with standards that are already in place for epidemiological analysis of these viruses16 (Supplementary Table 3).
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the Laboratory Directed Research and Development Program of Lawrence Berkeley National Laboratory under US Department of Energy Contract No. DE-AC02-05CH11231 for S.R.; the Netherlands Organization for Scientific Research (NWO) Vidi grant 864.14.004 for B.E.D.; the Intramural Research Program of the National Library of Medicine, National Institutes of Health for E.V.K., I.K.M., J.R.B. and N.Y.; the Virus-X project (EU Horizon 2020, No. 685778) for F.E. and M.K.; Battelle Memorial Institute's prime contract with the US National Institute of Allergy and Infectious Diseases (NIAID) under Contract No. HHSN272200700016I for J.H.K.; the GOA grant “Bacteriophage Biosystems” from KU Leuven for R.L.; the European Molecular Biology Laboratory for C.A. and G.R.C.; Cairo University Grant 2016-57 for R.K.A.; National Science Foundation award 1456778, National Institutes of Health awards R01 AI132581 and R21 HD086833, and The Vanderbilt Microbiome Initiative award for S.R.B.; National Science Foundation awards DEB-1239976 for M.B. and K.R. and DEB-1555854 for M.B.; the NSF Early Career award DEB-1555854 and NSF Dimensions of Biodiversity #1342701 for K.C.W. and R.A.D.; the Agence Nationale de la Recherche JCJC grant ANR-13-JSV6-0004 and Investissements d'Avenir Méditerranée Infection 10-IAHU-03 for C.D.; the Gordon and Betty Moore Foundation Marine Microbiology Initiative No. 3779 and the Simons Foundation for J.A.F.; the French government “Investissements d'Avenir” program OCEANOMICS ANR-11-BTBR-0008 and European FEDER Fund 1166-39417 for P. Hingamp; Australian Research Council Laureate Fellowship FL150100038 to P. Hugenholtz the National Science Foundation award 1801367 and C-DEBI Research Grant for J.M.L.; the Gordon and Betty Moore Foundation grant 5334 and Ministry of Economy and Competitivity refs. CGL2013-40564-R and SAF2013-49267-EXP for M.M.-G.; the Grant-in-Aid for Scientific Research on Innovative Areas from the Ministry of Education, Culture, Science, Sports, and Technology (MEXT) of Japan No. 16H06429, 16K21723, and 16H06437 for H.O. and T.Y.; National Science Foundation award DBI-1661357 to C.P.; the Ministry of Economy and Competitivity ref CGL2016-76273-P (cofunded with FEDER funds) for F.R.-V.; the Gordon and Betty Moore Foundation awards 3305 and 3790 and NSF Biological Oceanography OCE 1536989 for M.B.S.; the ETH Zurich and Helmut Horten Foundation and the Novartis Foundation for Medical-Biological Research (17B077) for S.S.; a BIOS-SCOPE award from Simons Foundation International and NERC award NE/P008534/1 to B.T.; NSF Biological Oceanography Grant 1635913 for R.V.T.; the Australian Research Council Future Fellowship FT120100480 for N.S.W.; a Gilead Sciences Cystic Fibrosis Research Scholarship for K.L.W.; Gordon and Better Moore Foundation Grant 4971 for S.W.W.; the NSF EPSCoR grant 1736030 for K.E.W.; the National Science Foundation award DEB-4W4596 and National Institutes of Health award R01 GM117361 for M.J.Y.; the Gordon and Betty Moore Foundation No. 7000 and the National Oceanic and Atmospheric Administration (NOAA) under award NA15OAR4320071 for L.Z.A. DDBJ is supported by ROIS and MEXT. The work conducted by the US Department of Energy Joint Genome Institute is supported by the Office of Science of the US Department of Energy under contract no. DE-AC02-05CH11231. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the US Department of Health and Human Services or of the institutions and companies affiliated with the authors. B.E.D., A.K., M.K., J.H.K., R.L. and A.V. are members of the ICTV Executive Committee, but the views and opinions expressed are those of the authors and not those of the ICTV.
Integrated supplementary information
List of mandatory and optional metadata for UViGs
List of metadata from previous standards relevant for UViGs21
Comparison between UViGs categories and the quality categories proposed for small DNA/RNA virus whole-genome sequencing for epidemiology and surveillance by Ladner et al.22
List and characteristics of tools used to identify virus sequences in mixed datasets published or updated since 201223–31
Variation in genome length for virus families and genera with two or more genomes, from NCBI RefSeq v83.
List of potential marker genes for virus orders, families or genera, based on the VOGdb v83 (http://vogdb.org/)
List of UViGs from the GOV dataset4 considered as high-quality drafts or finished genomes
List of databases providing collections of HMM profiles for virus protein families32–35
Current species demarcation criteria from ICTV ninth and tenth reports.
Approaches available for in silico host prediction18,37–42