Current knowledge of RNA virus biodiversity is both biased and fragmentary, reflecting a focus on culturable or disease-causing agents. Here we profile the transcriptomes of over 220 invertebrate species sampled across nine animal phyla and report the discovery of 1,445 RNA viruses, including some that are sufficiently divergent to comprise new families. The identified viruses fill major gaps in the RNA virus phylogeny and reveal an evolutionary history that is characterized by both host switching and co-divergence. The invertebrate virome also reveals remarkable genomic flexibility that includes frequent recombination, lateral gene transfer among viruses and hosts, gene gain and loss, and complex genomic rearrangements. Together, these data present a view of the RNA virosphere that is more phylogenetically and genomically diverse than that depicted in current classification schemes and provide a more solid foundation for studies in virus ecology and evolution.
- The ancient Virus World and evolution of cells. Biol. Direct 1, 29 (2006) , &
- Virus discovery and recent insights into virus diversity in arthropods. Curr. Opin. Microbiol. 16, 507–513 (2013) &
- Unprecedented genomic diversity of RNA viruses in arthropods reveals the ancestry of negative-sense RNA viruses. eLife 4, e05378 (2015) et al.
- Discovery and initial analysis of novel viral genomes in the soybean cyst nematode. J. Gen. Virol. 92, 1870–1879 (2011) , , &
- Discovery and evolution of bunyavirids in arctic phantom midges and ancient bunyavirid-like sequences in insect genomes. J. Virol. 88, 8783–8794 (2014) , , , &
- A tick-borne segmented RNA virus contains genome segments derived from unsegmented viral ancestors. Proc. Natl Acad. Sci. USA 111, 6744–6749 (2014) et al.
- Virome analysis of Amblyomma americanum, Dermacentor variabilis, and Ixodes scapularis ticks reveals novel highly divergent vertebrate and invertebrate viruses. J. Virol. 88, 11480–11492 (2014) et al.
- The discovery, distribution, and evolution of viruses associated with Drosophila melanogaster. PLoS Biol. 13, e1002210 (2015) et al.
- Divergent viruses discovered in arthropods and vertebrates revise the evolutionary history of the Flaviviridae and related viruses. J. Virol. 90, 659–669 (2015) et al.
- The Evolution and Emergence of RNA Viruses. (Oxford Univ. Press, 2009)
- The phylogeny of RNA-dependent RNA polymerases of positive-strand RNA viruses. J. Gen. Virol. 72, 2197–2206 (1991)
- Endogenous viruses: insights into viral evolution and impact on host biology. Nat. Rev. Genet. 13, 283–296 (2012) &
- Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa, and Protostomia. Mol. Biol. Evol. 22, 1246–1253 (2005) , &
- Virus Taxonomy: 9th Report of the International Committee on Taxonomy of Viruses. (Elsevier Academic Press, 2012) , , &
- Viral load estimation in asymptomatic honey bee colonies using the quantitative RT–PCR technique. Apidologie (Celle) 38, 426–435 (2007) et al.
- The German bee monitoring project: a long term study to understand periodically high winter losses of honey bee colonies. Apidologie (Celle) 41, 332–352 (2010) et al.
- Prevalence and seasonal variations of six bee viruses in Apis mellifera L. and Varroa destructor mite populations in France. Appl. Environ. Microbiol. 70, 7185–7191 (2004) et al.
- Evolution of cell recognition by viruses. Science 292, 1102–1105 (2001) , &
- Origins of mitochondria and hydrogenosomes. Curr. Opin. Microbiol. 2, 535–541 (1999) &
- Mitochondrial evolution. Science 283, 1476–1481 (1999) , &
- A theory of modular evolution for bacteriophages. Ann. NY Acad. Sci. 354, 484–490 (1980)
- Viruses in the sea. Nature 437, 356–361 (2005)
- Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011) et al.
- Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012) &
- Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform. 14, 178–192 (2013) , &
- RNA-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26, 493–500 (2010) , , , &
- MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013) &
- trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009) , &
- ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27, 1164–1165 (2011) , , &
- A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003) &
Extended data figures and tables
Extended Data Figures
- Extended Data Figure 1: The contribution of major viral clades to the total virome of each host phylum/order. (303 KB)
a, b, These analyses are based on viruses at all frequency levels (a), and viruses in which the frequency exceeds 0.1% of the total number of non-rRNA reads (b).
- Extended Data Figure 2: Phylogenetic incongruence between the RdRp and structural proteins. (546 KB)
a, Match between the phylogenies of the RdRp and coat proteins (S-domain like) for non-segmented members of the Tombus–Noda clade. The relationship between the two phylogenies is displayed to maximize topological congruence. b, The degree of phylogenetic incongruence for different pairs of structural and non-structural phylogenies. The comparisons were based on patristic distances matrices derived from the phylogenies.
- Extended Data Figure 3: The gain and loss of RNA virus structural proteins. (229 KB)
a, The parallel acquisition of multiple copies of structural proteins by viruses within the Hepe–Virga clade. Left panel shows an outline of the structural part of their genomes, with homologous structural genes marked in yellow and multiple copies of these proteins within the same genome labelled as ‘I’, ‘II’, and ‘III’. Right panel shows a maximum-likelihood phylogeny depicting the evolutionary history of the corresponding structural proteins of these viruses. b, Acquisition of a glycoprotein in the genome of Hubei Lepidoptera virus 2 from the Mono–Chu Clade. Its genome is compared against that of a closely related virus (Hubei dimarhabdovirus-like virus 2). Homologous proteins are connected with dotted lines, and the target glycoprotein is shown in red. c, Three examples of glycoprotein loss in the Mono–Chu Clade. Homologous proteins are connected with dotted lines, and the target glycoproteins are shown in blue.
- Extended Data Figure 4: Lateral gene transfer between RNA viruses and cellular organisms. (469 KB)
a, Evolutionary origin of two exoribonucleases (cd06133) in two sea-slater-associated viruses (Beihai hepe-like virus 2 and Beihai sea slater virus 4). Top, alignment of viral and (human) cellular exoribonucleases. The solid triangles indicate the key catalytic sites. Lower left panel shows the phylogenetic positions of the two viruses (marked with solid red circles) whose genomes contain these exoribonucleases. The host information for each virus is shown in parentheses. Lower right panel shows the phylogenetic position of the virus exoribonucleases (solid red circle) in the context of cellular exoribonucleases. b, Evolutionary origin of viral serine proteases (cd00190). The phylogeny contains serine proteases from RNA viruses (solid red circles), DNA viruses (solid blue circles) and cellular organisms. Serine proteases from RNA viruses are either highly divergent or group within the diversity of cellular proteins. c, Relative positions of different protein domains in the replicase of selected Hepe–Virga viruses. The domains are shown as ovals and marked with different colours, and comprise: RdRp (cd01699), Helicase (pfam01443), FstJ (pfam01728), OTU (OTU-like cysteine protease, pfam02338), Macro (cl00019), NADAR (cd15457), and viral methyltransferase (pfam01660). More detailed depictions of lateral gene transfer can be found in Supplementary Data 22–36.
Extended Data Tables
- Supplementary Data (1.8 MB)
This file contains Supplementary Data 1-36, phylogenies and genome structures of each major virus clade. The phylogenies (SI data 1-21) contain detailed information on evolutionary relationships, the name of the viruses, the frequency of viral RNA, and the presence and location of endogenous virus elements (EVEs). The genome structures (SI data 22-36) contain information on the genome organization and the structural domains of representative viruses.
- Supplementary Table 1 (217 KB)
This table contains the detailed information of each pool/library.
- Supplementary Table 2 (231 KB)
This table contains the detailed information on each virus discovered in this study.