Redefining the invertebrate RNA virosphere

Journal name:
Nature
Volume:
540,
Pages:
539–543
Date published:
DOI:
doi:10.1038/nature20167
Received
Accepted
Published online

Abstract

Current knowledge of RNA virus biodiversity is both biased and fragmentary, reflecting a focus on culturable or disease-causing agents. Here we profile the transcriptomes of over 220 invertebrate species sampled across nine animal phyla and report the discovery of 1,445 RNA viruses, including some that are sufficiently divergent to comprise new families. The identified viruses fill major gaps in the RNA virus phylogeny and reveal an evolutionary history that is characterized by both host switching and co-divergence. The invertebrate virome also reveals remarkable genomic flexibility that includes frequent recombination, lateral gene transfer among viruses and hosts, gene gain and loss, and complex genomic rearrangements. Together, these data present a view of the RNA virosphere that is more phylogenetically and genomically diverse than that depicted in current classification schemes and provide a more solid foundation for studies in virus ecology and evolution.

At a glance

Figures

  1. The frequency and diversity of viral RNA transcripts in invertebrate transcriptomes.
    Figure 1: The frequency and diversity of viral RNA transcripts in invertebrate transcriptomes.

    The top graph shows the percentage of non-rRNA reads mapped to viral RNA (RdRp; orange bars) and retrotransposons (green bars) in each library. The short name of each library is shown on top of each bar, while major host classifications are shown above the bar graph. The bottom graph shows a summary of the normalized number of virus species within each library. The total number is further subdivided to identify those RNA viruses at either high (blue) or low (magenta) frequency.

  2. Phylogenetic diversity of RNA viruses.
    Figure 2: Phylogenetic diversity of RNA viruses.

    Thirteen phylogenetic trees representing the major clades of RNA virus RdRp domains (see main text for definitions). Within each tree, the viruses discovered here are shaded red, while those described previously are shaded grey. The name of each clade is shown to the top left of each phylogeny and the names of the families or genera within clade are shown below the tree. Each scale bar indicates 0.5 amino acid substitutions per site. More detailed trees for each clade are shown in Supplementary Data 1–21, and their genome structures are shown in Supplementary Data 22–36.

  3. Genetic exchange among RNA viruses.
    Figure 3: Genetic exchange among RNA viruses.

    Comparison of the phylogenetic trees of 76 representative viral genomes with different types of structural protein (that is, four major types of capsid protein) and the equivalent phylogenies obtained for their RdRp amino acid sequences (eight clades as defined in the text, shown in different colours). Line colours correspond to those of the RdRp clade as shown to the left of the figure. Widespread recombination can be inferred when RdRp clades are associated with different types of structural protein, and vice versa.

  4. Evolution of genome organization in RNA viruses.
    Figure 4: Evolution of genome organization in RNA viruses.

    a, Genome evolution in ten representative clades of positive-sense and dsRNA viruses. Genome order follows the RdRp phylogeny. The genomes are drawn to a unified length scale shown at the top. The panel shows the pattern of segmentation (asterisks), the locations of major nonstructural and structural domains (coloured diamonds) and the arrangement of open reading frames. b, Genome evolution of negative-sense RNA viruses. N, G, and L indicate homologues from nucleoproteins, glycoproteins, and the polymerase, respectively. More detailed depictions of genome evolution are shown in Supplementary Data 22–36.

  5. The contribution of major viral clades to the total virome of each host phylum/order.
    Extended Data Fig. 1: The contribution of major viral clades to the total virome of each host phylum/order.

    a, b, These analyses are based on viruses at all frequency levels (a), and viruses in which the frequency exceeds 0.1% of the total number of non-rRNA reads (b).

  6. Phylogenetic incongruence between the RdRp and structural proteins.
    Extended Data Fig. 2: Phylogenetic incongruence between the RdRp and structural proteins.

    a, Match between the phylogenies of the RdRp and coat proteins (S-domain like) for non-segmented members of the Tombus–Noda clade. The relationship between the two phylogenies is displayed to maximize topological congruence. b, The degree of phylogenetic incongruence for different pairs of structural and non-structural phylogenies. The comparisons were based on patristic distances matrices derived from the phylogenies.

  7. The gain and loss of RNA virus structural proteins.
    Extended Data Fig. 3: The gain and loss of RNA virus structural proteins.

    a, The parallel acquisition of multiple copies of structural proteins by viruses within the Hepe–Virga clade. Left panel shows an outline of the structural part of their genomes, with homologous structural genes marked in yellow and multiple copies of these proteins within the same genome labelled as ‘I’, ‘II’, and ‘III’. Right panel shows a maximum-likelihood phylogeny depicting the evolutionary history of the corresponding structural proteins of these viruses. b, Acquisition of a glycoprotein in the genome of Hubei Lepidoptera virus 2 from the Mono–Chu Clade. Its genome is compared against that of a closely related virus (Hubei dimarhabdovirus-like virus 2). Homologous proteins are connected with dotted lines, and the target glycoprotein is shown in red. c, Three examples of glycoprotein loss in the Mono–Chu Clade. Homologous proteins are connected with dotted lines, and the target glycoproteins are shown in blue.

  8. Lateral gene transfer between RNA viruses and cellular organisms.
    Extended Data Fig. 4: Lateral gene transfer between RNA viruses and cellular organisms.

    a, Evolutionary origin of two exoribonucleases (cd06133) in two sea-slater-associated viruses (Beihai hepe-like virus 2 and Beihai sea slater virus 4). Top, alignment of viral and (human) cellular exoribonucleases. The solid triangles indicate the key catalytic sites. Lower left panel shows the phylogenetic positions of the two viruses (marked with solid red circles) whose genomes contain these exoribonucleases. The host information for each virus is shown in parentheses. Lower right panel shows the phylogenetic position of the virus exoribonucleases (solid red circle) in the context of cellular exoribonucleases. b, Evolutionary origin of viral serine proteases (cd00190). The phylogeny contains serine proteases from RNA viruses (solid red circles), DNA viruses (solid blue circles) and cellular organisms. Serine proteases from RNA viruses are either highly divergent or group within the diversity of cellular proteins. c, Relative positions of different protein domains in the replicase of selected Hepe–Virga viruses. The domains are shown as ovals and marked with different colours, and comprise: RdRp (cd01699), Helicase (pfam01443), FstJ (pfam01728), OTU (OTU-like cysteine protease, pfam02338), Macro (cl00019), NADAR (cd15457), and viral methyltransferase (pfam01660). More detailed depictions of lateral gene transfer can be found in Supplementary Data 22–36.

Tables

  1. Distribution of homologous protein clusters across divergent taxonomic groups (RNA viruses, DNA viruses and cellular organisms)
    Extended Data Table 1: Distribution of homologous protein clusters across divergent taxonomic groups (RNA viruses, DNA viruses and cellular organisms)

Accession codes

Primary accessions

BioProject

NCBI Reference Sequence

References

  1. Koonin, E. V., Senkevich, T. G. & Dolja, V. V. The ancient Virus World and evolution of cells. Biol. Direct 1, 29 (2006)
  2. Junglen, S. & Drosten, C. Virus discovery and recent insights into virus diversity in arthropods. Curr. Opin. Microbiol. 16, 507513 (2013)
  3. Li, C. X. et al. Unprecedented genomic diversity of RNA viruses in arthropods reveals the ancestry of negative-sense RNA viruses. eLife 4, e05378 (2015)
  4. Bekal, S., Domier, L. L., Niblack, T. L. & Lambert, K. N. Discovery and initial analysis of novel viral genomes in the soybean cyst nematode. J. Gen. Virol. 92, 18701879 (2011)
  5. Ballinger, M. J., Bruenn, J. A., Hay, J., Czechowski, D. & Taylor, D. J. Discovery and evolution of bunyavirids in arctic phantom midges and ancient bunyavirid-like sequences in insect genomes. J. Virol. 88, 87838794 (2014)
  6. Qin, X. C. et al. A tick-borne segmented RNA virus contains genome segments derived from unsegmented viral ancestors. Proc. Natl Acad. Sci. USA 111, 67446749 (2014)
  7. Tokarz, R. et al. Virome analysis of Amblyomma americanum, Dermacentor variabilis, and Ixodes scapularis ticks reveals novel highly divergent vertebrate and invertebrate viruses. J. Virol. 88, 1148011492 (2014)
  8. Webster, C. L. et al. The discovery, distribution, and evolution of viruses associated with Drosophila melanogaster. PLoS Biol. 13, e1002210 (2015)
  9. Shi, M. et al. Divergent viruses discovered in arthropods and vertebrates revise the evolutionary history of the Flaviviridae and related viruses. J. Virol. 90, 659669 (2015)
  10. Holmes, E. C. The Evolution and Emergence of RNA Viruses. (Oxford Univ. Press, 2009)
  11. Koonin, E. V. The phylogeny of RNA-dependent RNA polymerases of positive-strand RNA viruses. J. Gen. Virol. 72, 21972206 (1991)
  12. Feschotte, C. & Gilbert, C. Endogenous viruses: insights into viral evolution and impact on host biology. Nat. Rev. Genet. 13, 283296 (2012)
  13. Philippe, H., Lartillot, N. & Brinkmann, H. Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa, and Protostomia. Mol. Biol. Evol. 22, 12461253 (2005)
  14. King, A. M. Q., Adams, M. J., Carstens, E. B. & Lefkowitz, E. J. Virus Taxonomy: 9th Report of the International Committee on Taxonomy of Viruses. (Elsevier Academic Press, 2012)
  15. Gauthier, L. et al. Viral load estimation in asymptomatic honey bee colonies using the quantitative RT–PCR technique. Apidologie (Celle) 38, 426435 (2007)
  16. Genersch, E. et al. The German bee monitoring project: a long term study to understand periodically high winter losses of honey bee colonies. Apidologie (Celle) 41, 332352 (2010)
  17. Tentcheva, D. et al. Prevalence and seasonal variations of six bee viruses in Apis mellifera L. and Varroa destructor mite populations in France. Appl. Environ. Microbiol. 70, 71857191 (2004)
  18. Baranowski, E., Ruiz-Jarabo, C. M. & Domingo, E. Evolution of cell recognition by viruses. Science 292, 11021105 (2001)
  19. Andersson, S. G. & Kurland, C. G. Origins of mitochondria and hydrogenosomes. Curr. Opin. Microbiol. 2, 535541 (1999)
  20. Gray, M. W., Burger, G. & Lang, B. F. Mitochondrial evolution. Science 283, 14761481 (1999)
  21. Botstein, D. A theory of modular evolution for bacteriophages. Ann. NY Acad. Sci. 354, 484490 (1980)
  22. Suttle, C. A. Viruses in the sea. Nature 437, 356361 (2005)
  23. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644652 (2011)
  24. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357359 (2012)
  25. Thorvaldsdóttir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform. 14, 178192 (2013)
  26. Li, B., Ruotti, V., Stewart, R. M., Thomson, J. A. & Dewey, C. N. RNA-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26, 493500 (2010)
  27. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772780 (2013)
  28. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 19721973 (2009)
  29. Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27, 11641165 (2011)
  30. Guindon, S. & Gascuel, O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696704 (2003)

Download references

Author information

  1. These authors contributed equally to this work.

    • Mang Shi,
    • Xian-Dan Lin,
    • Jun-Hua Tian,
    • Liang-Jun Chen,
    • Xiao Chen &
    • Ci-Xiu Li

Affiliations

  1. State Key Laboratory for Infectious Disease Prevention and Control, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Changping, 100206 Beijing, China

    • Mang Shi,
    • Liang-Jun Chen,
    • Ci-Xiu Li,
    • Xin-Cheng Qin,
    • Wen Wang,
    • Jianguo Xu,
    • Edward C. Holmes &
    • Yong-Zhen Zhang
  2. Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, the University of Sydney, Sydney, New South Wales 2006, Australia

    • Mang Shi,
    • John-Sebastian Eden,
    • Jan Buchmann &
    • Edward C. Holmes
  3. Wenzhou Center for Disease Control and Prevention, Wenzhou, 325001 Zhejiang, China

    • Xian-Dan Lin
  4. Wuhan Center for Disease Control and Prevention, Wuhan, 430015 Hubei, China

    • Jun-Hua Tian
  5. Guangxi Mangrove Research Center, Beihai, 536000 Guangxi, China

    • Xiao Chen
  6. Systems Biology and Bioinformatics Group, School of Biological Sciences, Faculty of Sciences, University of Hong Kong, Hong Kong, China

    • Jun Li
  7. National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention, Shanghai, China

    • Jian-Ping Cao

Contributions

Conceptualization: M.S. and Y.-Z.Z. Methodology: M.S., L.-J.C, C.-X.L., J.L., J.-S.E, J.B., E.C.H. and Y.-Z.Z. Investigation: M.S., X.-D.L., J.-H.T., L.-J.C, X.C., C.-X.L. and X.-C.Q. Writing (original draft): M.S., E.C.H. and Y.-Z.Z. Writing (review and editing): M.S., X.-D.L., J.-H.T., L.-J.C, X.C., C.-X.L, J.-S.E, J.X., E.C.H. and Y.-Z.Z. Funding Acquisition: J.X., E.C.H. and Y.-Z.Z. Resources (sampling): M.S., X.-D.L., J.-H.T., L.-J.C, X.C., C.-X.L., J.-P.C., W.W. and Y.-Z.Z. Resources (computational): M.S., J.L., J.B. and E.C.H. Supervision: E.C.H. and Y.-Z.Z.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Reviewer Information Nature thanks E. Ghedin, D. Obbard and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: The contribution of major viral clades to the total virome of each host phylum/order. (303 KB)

    a, b, These analyses are based on viruses at all frequency levels (a), and viruses in which the frequency exceeds 0.1% of the total number of non-rRNA reads (b).

  2. Extended Data Figure 2: Phylogenetic incongruence between the RdRp and structural proteins. (546 KB)

    a, Match between the phylogenies of the RdRp and coat proteins (S-domain like) for non-segmented members of the Tombus–Noda clade. The relationship between the two phylogenies is displayed to maximize topological congruence. b, The degree of phylogenetic incongruence for different pairs of structural and non-structural phylogenies. The comparisons were based on patristic distances matrices derived from the phylogenies.

  3. Extended Data Figure 3: The gain and loss of RNA virus structural proteins. (229 KB)

    a, The parallel acquisition of multiple copies of structural proteins by viruses within the Hepe–Virga clade. Left panel shows an outline of the structural part of their genomes, with homologous structural genes marked in yellow and multiple copies of these proteins within the same genome labelled as ‘I’, ‘II’, and ‘III’. Right panel shows a maximum-likelihood phylogeny depicting the evolutionary history of the corresponding structural proteins of these viruses. b, Acquisition of a glycoprotein in the genome of Hubei Lepidoptera virus 2 from the Mono–Chu Clade. Its genome is compared against that of a closely related virus (Hubei dimarhabdovirus-like virus 2). Homologous proteins are connected with dotted lines, and the target glycoprotein is shown in red. c, Three examples of glycoprotein loss in the Mono–Chu Clade. Homologous proteins are connected with dotted lines, and the target glycoproteins are shown in blue.

  4. Extended Data Figure 4: Lateral gene transfer between RNA viruses and cellular organisms. (469 KB)

    a, Evolutionary origin of two exoribonucleases (cd06133) in two sea-slater-associated viruses (Beihai hepe-like virus 2 and Beihai sea slater virus 4). Top, alignment of viral and (human) cellular exoribonucleases. The solid triangles indicate the key catalytic sites. Lower left panel shows the phylogenetic positions of the two viruses (marked with solid red circles) whose genomes contain these exoribonucleases. The host information for each virus is shown in parentheses. Lower right panel shows the phylogenetic position of the virus exoribonucleases (solid red circle) in the context of cellular exoribonucleases. b, Evolutionary origin of viral serine proteases (cd00190). The phylogeny contains serine proteases from RNA viruses (solid red circles), DNA viruses (solid blue circles) and cellular organisms. Serine proteases from RNA viruses are either highly divergent or group within the diversity of cellular proteins. c, Relative positions of different protein domains in the replicase of selected Hepe–Virga viruses. The domains are shown as ovals and marked with different colours, and comprise: RdRp (cd01699), Helicase (pfam01443), FstJ (pfam01728), OTU (OTU-like cysteine protease, pfam02338), Macro (cl00019), NADAR (cd15457), and viral methyltransferase (pfam01660). More detailed depictions of lateral gene transfer can be found in Supplementary Data 22–36.

Extended Data Tables

  1. Extended Data Table 1: Distribution of homologous protein clusters across divergent taxonomic groups (RNA viruses, DNA viruses and cellular organisms) (125 KB)

Supplementary information

PDF files

  1. Supplementary Data (1.8 MB)

    This file contains Supplementary Data 1-36, phylogenies and genome structures of each major virus clade. The phylogenies (SI data 1-21) contain detailed information on evolutionary relationships, the name of the viruses, the frequency of viral RNA, and the presence and location of endogenous virus elements (EVEs). The genome structures (SI data 22-36) contain information on the genome organization and the structural domains of representative viruses.

  2. Supplementary Table 1 (217 KB)

    This table contains the detailed information of each pool/library.

Excel files

  1. Supplementary Table 2 (231 KB)

    This table contains the detailed information on each virus discovered in this study.

Additional data