Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Redefining the invertebrate RNA virosphere


Current knowledge of RNA virus biodiversity is both biased and fragmentary, reflecting a focus on culturable or disease-causing agents. Here we profile the transcriptomes of over 220 invertebrate species sampled across nine animal phyla and report the discovery of 1,445 RNA viruses, including some that are sufficiently divergent to comprise new families. The identified viruses fill major gaps in the RNA virus phylogeny and reveal an evolutionary history that is characterized by both host switching and co-divergence. The invertebrate virome also reveals remarkable genomic flexibility that includes frequent recombination, lateral gene transfer among viruses and hosts, gene gain and loss, and complex genomic rearrangements. Together, these data present a view of the RNA virosphere that is more phylogenetically and genomically diverse than that depicted in current classification schemes and provide a more solid foundation for studies in virus ecology and evolution.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: The frequency and diversity of viral RNA transcripts in invertebrate transcriptomes.
Figure 2: Phylogenetic diversity of RNA viruses.
Figure 3: Genetic exchange among RNA viruses.
Figure 4: Evolution of genome organization in RNA viruses.

Accession codes

Primary accessions


NCBI Reference Sequence


  1. 1

    Koonin, E. V., Senkevich, T. G. & Dolja, V. V. The ancient Virus World and evolution of cells. Biol. Direct 1, 29 (2006)

    Article  Google Scholar 

  2. 2

    Junglen, S. & Drosten, C. Virus discovery and recent insights into virus diversity in arthropods. Curr. Opin. Microbiol. 16, 507–513 (2013)

    CAS  Article  Google Scholar 

  3. 3

    Li, C. X. et al. Unprecedented genomic diversity of RNA viruses in arthropods reveals the ancestry of negative-sense RNA viruses. eLife 4, e05378 (2015)

    Article  Google Scholar 

  4. 4

    Bekal, S., Domier, L. L., Niblack, T. L. & Lambert, K. N. Discovery and initial analysis of novel viral genomes in the soybean cyst nematode. J. Gen. Virol. 92, 1870–1879 (2011)

    CAS  Article  Google Scholar 

  5. 5

    Ballinger, M. J., Bruenn, J. A., Hay, J., Czechowski, D. & Taylor, D. J. Discovery and evolution of bunyavirids in arctic phantom midges and ancient bunyavirid-like sequences in insect genomes. J. Virol. 88, 8783–8794 (2014)

    Article  Google Scholar 

  6. 6

    Qin, X. C. et al. A tick-borne segmented RNA virus contains genome segments derived from unsegmented viral ancestors. Proc. Natl Acad. Sci. USA 111, 6744–6749 (2014)

    CAS  ADS  Article  Google Scholar 

  7. 7

    Tokarz, R. et al. Virome analysis of Amblyomma americanum, Dermacentor variabilis, and Ixodes scapularis ticks reveals novel highly divergent vertebrate and invertebrate viruses. J. Virol. 88, 11480–11492 (2014)

    Article  Google Scholar 

  8. 8

    Webster, C. L. et al. The discovery, distribution, and evolution of viruses associated with Drosophila melanogaster. PLoS Biol. 13, e1002210 (2015)

    Article  Google Scholar 

  9. 9

    Shi, M. et al. Divergent viruses discovered in arthropods and vertebrates revise the evolutionary history of the Flaviviridae and related viruses. J. Virol. 90, 659–669 (2015)

    Article  Google Scholar 

  10. 10

    Holmes, E. C. The Evolution and Emergence of RNA Viruses. (Oxford Univ. Press, 2009)

  11. 11

    Koonin, E. V. The phylogeny of RNA-dependent RNA polymerases of positive-strand RNA viruses. J. Gen. Virol. 72, 2197–2206 (1991)

    Article  Google Scholar 

  12. 12

    Feschotte, C. & Gilbert, C. Endogenous viruses: insights into viral evolution and impact on host biology. Nat. Rev. Genet. 13, 283–296 (2012)

    CAS  Article  Google Scholar 

  13. 13

    Philippe, H., Lartillot, N. & Brinkmann, H. Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa, and Protostomia. Mol. Biol. Evol. 22, 1246–1253 (2005)

    CAS  Article  Google Scholar 

  14. 14

    King, A. M. Q., Adams, M. J., Carstens, E. B. & Lefkowitz, E. J. Virus Taxonomy: 9th Report of the International Committee on Taxonomy of Viruses. (Elsevier Academic Press, 2012)

  15. 15

    Gauthier, L. et al. Viral load estimation in asymptomatic honey bee colonies using the quantitative RT–PCR technique. Apidologie (Celle) 38, 426–435 (2007)

    CAS  Article  Google Scholar 

  16. 16

    Genersch, E. et al. The German bee monitoring project: a long term study to understand periodically high winter losses of honey bee colonies. Apidologie (Celle) 41, 332–352 (2010)

    CAS  Article  Google Scholar 

  17. 17

    Tentcheva, D. et al. Prevalence and seasonal variations of six bee viruses in Apis mellifera L. and Varroa destructor mite populations in France. Appl. Environ. Microbiol. 70, 7185–7191 (2004)

    CAS  Article  Google Scholar 

  18. 18

    Baranowski, E., Ruiz-Jarabo, C. M. & Domingo, E. Evolution of cell recognition by viruses. Science 292, 1102–1105 (2001)

    CAS  ADS  Article  Google Scholar 

  19. 19

    Andersson, S. G. & Kurland, C. G. Origins of mitochondria and hydrogenosomes. Curr. Opin. Microbiol. 2, 535–541 (1999)

    CAS  Article  Google Scholar 

  20. 20

    Gray, M. W., Burger, G. & Lang, B. F. Mitochondrial evolution. Science 283, 1476–1481 (1999)

    CAS  ADS  Article  Google Scholar 

  21. 21

    Botstein, D. A theory of modular evolution for bacteriophages. Ann. NY Acad. Sci. 354, 484–490 (1980)

    CAS  ADS  Article  Google Scholar 

  22. 22

    Suttle, C. A. Viruses in the sea. Nature 437, 356–361 (2005)

    CAS  ADS  Article  Google Scholar 

  23. 23

    Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011)

    CAS  Article  Google Scholar 

  24. 24

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012)

    CAS  Article  Google Scholar 

  25. 25

    Thorvaldsdóttir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform. 14, 178–192 (2013)

    Article  Google Scholar 

  26. 26

    Li, B., Ruotti, V., Stewart, R. M., Thomson, J. A. & Dewey, C. N. RNA-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26, 493–500 (2010)

    Article  Google Scholar 

  27. 27

    Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013)

    CAS  Article  Google Scholar 

  28. 28

    Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009)

    Article  Google Scholar 

  29. 29

    Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27, 1164–1165 (2011)

    CAS  Article  Google Scholar 

  30. 30

    Guindon, S. & Gascuel, O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003)

    Article  Google Scholar 

Download references


This study was supported by the National Natural Science Foundation of China (Grants 81290343, 81273014, 81672057), the Special National Project on Research and Development of Key Biosafety Technologies (Grants 2016YFC1201900, 2016YFC1200101), the 12th Five-Year Major National Science and Technology Projects of China (2014ZX10004001-005), and an NHMRC Australia Fellowship (GNT1037231).

Author information




Conceptualization: M.S. and Y.-Z.Z. Methodology: M.S., L.-J.C, C.-X.L., J.L., J.-S.E, J.B., E.C.H. and Y.-Z.Z. Investigation: M.S., X.-D.L., J.-H.T., L.-J.C, X.C., C.-X.L. and X.-C.Q. Writing (original draft): M.S., E.C.H. and Y.-Z.Z. Writing (review and editing): M.S., X.-D.L., J.-H.T., L.-J.C, X.C., C.-X.L, J.-S.E, J.X., E.C.H. and Y.-Z.Z. Funding Acquisition: J.X., E.C.H. and Y.-Z.Z. Resources (sampling): M.S., X.-D.L., J.-H.T., L.-J.C, X.C., C.-X.L., J.-P.C., W.W. and Y.-Z.Z. Resources (computational): M.S., J.L., J.B. and E.C.H. Supervision: E.C.H. and Y.-Z.Z.

Corresponding author

Correspondence to Yong-Zhen Zhang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Reviewer Information Nature thanks E. Ghedin, D. Obbard and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data figures and tables

Extended Data Figure 1 The contribution of major viral clades to the total virome of each host phylum/order.

a, b, These analyses are based on viruses at all frequency levels (a), and viruses in which the frequency exceeds 0.1% of the total number of non-rRNA reads (b).

Extended Data Figure 2 Phylogenetic incongruence between the RdRp and structural proteins.

a, Match between the phylogenies of the RdRp and coat proteins (S-domain like) for non-segmented members of the Tombus–Noda clade. The relationship between the two phylogenies is displayed to maximize topological congruence. b, The degree of phylogenetic incongruence for different pairs of structural and non-structural phylogenies. The comparisons were based on patristic distances matrices derived from the phylogenies.

Extended Data Figure 3 The gain and loss of RNA virus structural proteins.

a, The parallel acquisition of multiple copies of structural proteins by viruses within the Hepe–Virga clade. Left panel shows an outline of the structural part of their genomes, with homologous structural genes marked in yellow and multiple copies of these proteins within the same genome labelled as ‘I’, ‘II’, and ‘III’. Right panel shows a maximum-likelihood phylogeny depicting the evolutionary history of the corresponding structural proteins of these viruses. b, Acquisition of a glycoprotein in the genome of Hubei Lepidoptera virus 2 from the Mono–Chu Clade. Its genome is compared against that of a closely related virus (Hubei dimarhabdovirus-like virus 2). Homologous proteins are connected with dotted lines, and the target glycoprotein is shown in red. c, Three examples of glycoprotein loss in the Mono–Chu Clade. Homologous proteins are connected with dotted lines, and the target glycoproteins are shown in blue.

Extended Data Figure 4 Lateral gene transfer between RNA viruses and cellular organisms.

a, Evolutionary origin of two exoribonucleases (cd06133) in two sea-slater-associated viruses (Beihai hepe-like virus 2 and Beihai sea slater virus 4). Top, alignment of viral and (human) cellular exoribonucleases. The solid triangles indicate the key catalytic sites. Lower left panel shows the phylogenetic positions of the two viruses (marked with solid red circles) whose genomes contain these exoribonucleases. The host information for each virus is shown in parentheses. Lower right panel shows the phylogenetic position of the virus exoribonucleases (solid red circle) in the context of cellular exoribonucleases. b, Evolutionary origin of viral serine proteases (cd00190). The phylogeny contains serine proteases from RNA viruses (solid red circles), DNA viruses (solid blue circles) and cellular organisms. Serine proteases from RNA viruses are either highly divergent or group within the diversity of cellular proteins. c, Relative positions of different protein domains in the replicase of selected Hepe–Virga viruses. The domains are shown as ovals and marked with different colours, and comprise: RdRp (cd01699), Helicase (pfam01443), FstJ (pfam01728), OTU (OTU-like cysteine protease, pfam02338), Macro (cl00019), NADAR (cd15457), and viral methyltransferase (pfam01660). More detailed depictions of lateral gene transfer can be found in Supplementary Data 22–36.

Extended Data Table 1 Distribution of homologous protein clusters across divergent taxonomic groups (RNA viruses, DNA viruses and cellular organisms)

Supplementary information

Supplementary Data

This file contains Supplementary Data 1-36, phylogenies and genome structures of each major virus clade. The phylogenies (SI data 1-21) contain detailed information on evolutionary relationships, the name of the viruses, the frequency of viral RNA, and the presence and location of endogenous virus elements (EVEs). The genome structures (SI data 22-36) contain information on the genome organization and the structural domains of representative viruses. (PDF 1870 kb)

Supplementary Table 1

This table contains the detailed information of each pool/library. (PDF 217 kb)

Supplementary Table 2

This table contains the detailed information on each virus discovered in this study. (XLSX 231 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shi, M., Lin, XD., Tian, JH. et al. Redefining the invertebrate RNA virosphere. Nature 540, 539–543 (2016).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing