The evolutionary history of vertebrate RNA viruses

Published online:


Our understanding of the diversity and evolution of vertebrate RNA viruses is largely limited to those found in mammalian and avian hosts and associated with overt disease. Here, using a large-scale meta-transcriptomic approach, we discover 214 vertebrate-associated viruses in reptiles, amphibians, lungfish, ray-finned fish, cartilaginous fish and jawless fish. The newly discovered viruses appear in every family or genus of RNA virus associated with vertebrate infection, including those containing human pathogens such as influenza virus, the Arenaviridae and Filoviridae families, and have branching orders that broadly reflected the phylogenetic history of their hosts. We establish a long evolutionary history for most groups of vertebrate RNA virus, and support this by evaluating evolutionary timescales using dated orthologous endogenous virus elements. We also identify new vertebrate-specific RNA viruses and genome architectures, and re-evaluate the evolution of vector-borne RNA viruses. In summary, this study reveals diverse virus–host associations across the entire evolutionary history of the vertebrates.

  • Subscribe to Nature for full access:



Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Shi, M. et al. Redefining the invertebrate RNA virosphere. Nature 540, 539–543 (2016).

  2. 2.

    King, A. M. Q., Adams, M. J., Carstens, E. B. & Lefkowitz, E. J. Virus Taxonomy: 9th Report of the International Committee on Taxonomy of Viruses (Elsevier Academic, Amsterdam, 2012).

  3. 3.

    Essbauer, S. & Ahne, W. Viruses of lower vertebrates. J. Vet. Med. B Infect. Dis. Vet. Public Health 48, 403–475 (2001).

  4. 4.

    Batts, W., Yun, S., Hedrick, R. & Winton, J. A novel member of the family Hepeviridae from cutthroat trout (Oncorhynchus clarkii). Virus Res. 158, 116–123 (2011).

  5. 5.

    Mikalsen, A. B. et al. Characterization of a novel calicivirus causing systemic infection in atlantic salmon (Salmo salar L.): proposal for a new genus of caliciviridae. PLoS ONE 9, e107132 (2014).

  6. 6.

    Shi, M. et al. Divergent viruses discovered in arthropods and vertebrates revise the evolutionary history of the Flaviviridae and related viruses. J. Virol. 90, 659–669 (2015).

  7. 7.

    Stenglein, M. D. et al. Identification, characterization, and in vitro culture of highly divergent arenaviruses from boa constrictors and annulated tree boas: candidate etiological agents for snake inclusion body disease. MBio 3, e00180–12 (2012).

  8. 8.

    Hedges, S. B., Marin, J., Suleski, M., Paymer, M. & Kumar, S. Tree of life reveals clock-like speciation and diversification. Mol. Biol. Evol 32, 835–845 (2015).

  9. 9.

    Dill, J. A. et al. Distinct viral lineages from fish and amphibians reveal the complex evolutionary history of hepadnaviruses. J. Virol. 90, 7920–7933 (2016).

  10. 10.

    Wang, T. H., Donaldson, Y. K., Brettle, R. P., Bell, J. E. & Simmonds, P. Identification of shared populations of human immunodeficiency virus type 1 infecting microglia and tissue macrophages outside the central nervous system. J. Virol. 75, 11686–11699 (2001).

  11. 11.

    Brinkmann, H., Venkatesh, B., Brenner, S. & Meyer, A. Nuclear protein-coding genes support lungfish and not the coelacanth as the closest living relatives of land vertebrates. Proc. Natl Acad. Sci. USA 101, 4900–4905 (2004).

  12. 12.

    Conow, C., Fielder, D., Ovadia, Y. & Libeskind-Hadas, R. Jane: a new tool for the cophylogeny reconstruction problem. Algorithms Mol. Biol. 5, 16 (2010).

  13. 13.

    Geoghegan, J. L., Duchêne, S. & Holmes, E. C. Comparative analysis estimates the relative frequencies of co-divergence and cross-species transmission within viral families. PLoS Pathog. 13, e1006215 (2017).

  14. 14.

    Charleston, M. A. & Robertson, D. L. Preferential host switching by primate lentiviruses can account for phylogenetic similarity with the primate phylogeny. Syst. Biol 51, 528–535 (2002).

  15. 15.

    de Vienne, D. M. et al. Cospeciation vs host-shift speciation: methods for testing, evidence from natural associations and relation to coevolution. New Phytol. 198, 347–385 (2013).

  16. 16.

    Wertheim, J. O. & Kosakovsky Pond, S. L. Purifying selection can obscure the ancient age of viral lineages. Mol. Biol. Evol 28, 3355–3365 (2011).

  17. 17.

    Zhang, Y. Z. & Holmes, E. C. What is the time-scale of hantavirus evolution? Infect. Genet. Evol. 25, 144–145 (2014).

  18. 18.

    Katzourakis, A. & Gifford, R. J. Endogenous viral elements in animal genomes. PLoS Genet. 6, e1001191 (2010).

  19. 19.

    Taylor, D. J., Leach, R. W. & Bruenn, J. Filoviruses are ancient and integrated into mammalian genomes. BMC Evol. Biol. 10, 193 (2010).

  20. 20.

    Horie, M. et al. Endogenous non-retroviral RNA virus elements in mammalian genomes. Nature 463, 84–87 (2010).

  21. 21.

    Li, C. X. et al. Unprecedented genomic diversity of RNA viruses in arthropods reveals the ancestry of negative-sense RNA viruses. eLife 4, e05378 (2015).

  22. 22.

    Longdon, B. et al. The evolution, diversity, and host associations of rhabdoviruses. Virus Evol. 1, vev014 (2015).

  23. 23.

    Qin, X. C. et al. A tick-borne segmented RNA virus contains genome segments derived from unsegmented viral ancestors. Proc. Natl Acad. Sci. USA 111, 6744–6749 (2014).

  24. 24.

    Kelly, A. G., Netzler, N. E. & White, P. A. Ancient recombination events and the origins of hepatitis E virus. BMC Evol. Biol. 16, 210 (2016).

  25. 25.

    Han, G. Z. & Worobey, M. An endogenous foamy-like viral element in the coelacanth genome. PLoS Pathog. 8, e1002790 (2012).

  26. 26.

    Aiewsakun, P. & Katzourakis, A. Marine origin of retroviruses in the early Palaeozoic Era. Nat. Commun. 8, 13954 (2017).

  27. 27.

    Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

  28. 28.

    Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).

  29. 29.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

  30. 30.

    Kearse, M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).

  31. 31.

    Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).

  32. 32.

    Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).

  33. 33.

    Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27, 1164–1165 (2011).

  34. 34.

    Guindon, S. & Gascuel, O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol 52, 696–704 (2003).

  35. 35.

    Ronquist, F. & Huelsenbeck, J. P. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003).

  36. 36.

    Parker, J., Rambaut, A. & Pybus, O. G. Correlating viral phenotypes with phylogeny: accounting for phylogenetic uncertainty. Infect. Genet. Evol. 8, 239–246 (2008).

  37. 37.

    Betancur-R, R. et al. The tree of life and a new classification of bony fishes. PLoS Curr. 5, (2013).

Download references


This study was supported by the Special National Project on Research and Development of Key Biosafety Technologies (2016YFC1201900, 2016YFC1200101) and the National Natural Science Foundation of China (Grants 81672057, 81611130073). E.C.H. and M.S. are funded by an ARC Australian Laureate Fellowship to E.C.H. (FL170100022). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank students at the Zoonosis branch of the China CDC, especially W.-C. Wu, J.-W. Shao, C.-X. Li, J.-J. Guo and K.-L. Song for assistance with virus and host sequence confirmation, and we thank B. Yu for help with the collection of animal samples. We acknowledge the University of Sydney high-performance computing (HPC) service at The University of Sydney for providing resources that have contributed to the research results reported within this paper

Reviewer information

Nature thanks A. Rambaut and M. Worobey for their contribution to the peer review of this work.

Author information

Author notes

  1. These authors contributed equally: Mang Shi, Xian-Dan Lin, Xiao Chen, Jun-Hua Tian.


  1. State Key Laboratory for Infectious Disease Prevention and Control, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China

    • Mang Shi
    • , Liang-Jun Chen
    • , Kun Li
    • , Wen Wang
    • , Edward C. Holmes
    •  & Yong-Zhen Zhang
  2. Shanghai Public Health Clinical Center & Institute of Biomedical Sciences, Fudan University, Shanghai, China

    • Mang Shi
    • , Edward C. Holmes
    •  & Yong-Zhen Zhang
  3. Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, New South Wales, Australia

    • Mang Shi
    • , John-Sebastian Eden
    •  & Edward C. Holmes
  4. Wenzhou Center for Disease Control and Prevention, Wenzhou, China

    • Xian-Dan Lin
    •  & Li Liu
  5. College of Marine Sciences, South China Agricultural University, Guangzhou, China

    • Xiao Chen
  6. Wuhan Center for Disease Control and Prevention, Wuhan, China

    • Jun-Hua Tian
  7. Yancheng Center for Disease Control and Prevention, Yancheng, China

    • Jin-Jin Shen


  1. Search for Mang Shi in:

  2. Search for Xian-Dan Lin in:

  3. Search for Xiao Chen in:

  4. Search for Jun-Hua Tian in:

  5. Search for Liang-Jun Chen in:

  6. Search for Kun Li in:

  7. Search for Wen Wang in:

  8. Search for John-Sebastian Eden in:

  9. Search for Jin-Jin Shen in:

  10. Search for Li Liu in:

  11. Search for Edward C. Holmes in:

  12. Search for Yong-Zhen Zhang in:


M.S. and Y.-Z.Z. conceived and designed the study. M.S., X.-D.L., X.C., J.-H.T., K.L., L.-J.C., J.-J.S., L.L. and Y.-Z.Z. organized field work, and collected samples. M.S., X.-D.L., X.C., J.-H.T., K.L., L.-J.C., W.W., J.-J.S., L.L. and Y.-Z.Z. performed experiments. M.S., J.-S.E., E.C.H. and Y.-Z.Z. analysed data. M.S., E.C.H. and Y.-Z.Z. wrote the paper with input from all authors. Y.-Z.Z. led the study.

Competing interests

The authors declare no competing interests.

Corresponding author

Correspondence to Yong-Zhen Zhang.

Extended data figures and tables

  1. Extended Data Fig. 1 Phylogenetic positions of vertebrate-associated positive-sense and double-stranded RNA viruses within the broader diversity of RNA viruses.

    Phylogenies were estimated using a maximum likelihood method and midpoint-rooted for clarity only. Viruses discovered here are labelled with solid black circles. The name of the major clade (phylogeny) is shown at the top of each tree, and taxonomic names are shown to the right. The vertebrate associated virus diversity is shaded in grey. All horizontal branch lengths are scaled to the number of amino acid substitutions per site.

  2. Extended Data Fig. 2 Phylogenetic positions of vertebrate-associated negative-sense RNA viruses within the broader diversity of RNA viruses.

    Phylogenies were estimated using a maximum likelihood method and midpoint-rooted for clarity only. Viruses discovered here are labelled with solid black circles. The name of the major clade (phylogeny) is shown at the top of each tree, and taxonomic names are shown to the right. The vertebrate associated virus diversity is shaded in grey. All horizontal branch lengths are scaled to the number of amino acid substitutions per site.

  3. Extended Data Fig. 3 The phylogenies of potentially new families of vertebrate-associated viruses.

    Viruses identified from vertebrate hosts are shaded with different colours. Sequences recovered from the Transcriptome Shotgun Assembly (TSA) database are marked with solid black diamonds, while those recovered from the Whole-Genome Shotgun (WGS) contigs database (that is, endogenous virus elements) are marked with open triangles. For vertebrate viruses, the relevant taxonomic and tissue information is provided in the sequence names.

  4. Extended Data Fig. 4 Evolutionary history of four groups of vector-borne RNA virus.

    Each phylogenetic tree was estimated using a maximum likelihood method. Within each phylogeny, the viruses newly identified here are marked with solid black circles, the vertebrate host groups are indicated by different colours, and the vector symbol is shown next to viruses known to be transmitted by vectors. The name of the virus family or genus is shown at the top of each phylogeny, and the lower level virus taxonomic names are shown to the right.

  5. Extended Data Fig. 5 Evolution of vertebrate-associated negative-sense RNA virus genomes.

    Representative genomes from negative-sense RNA virus families/genera are shown. The regions that encode major functional proteins or protein domains are labelled on each of the genomes. Homologous regions within each family are connected with orange dotted lines. Schematic phylogenetic relationships are shown next to the genomes diagrams. Coverage plots are shown underneath novel genome structures. Reverse-complementary sequences are shown for negative-sense RNA viruses with complete termini. A Sanger sequencing chromatogram is shown at a GC-rich hairpin-forming region of the Wenling frogfish arenavirus 2 genome, in which the coverage drops substantially. Host associations are labelled to the right of tree using solid circles with different colours. Host associations and abbreviation of functional domains are described at the bottom of the figure.

  6. Extended Data Fig. 6 Evolution of vertebrate-associated positive-sense RNA virus genomes.

    Representative genomes from positive-sense RNA virus families or genera are shown. The regions that encode major functional proteins or protein domains are labelled on each of the genomes. Homologous regions within or between viral families are connected by orange dotted lines. Host associations are reflected in the colour of the virus names. Host association colour schemes and the abbreviations of functional domains are described at the bottom of the figure.

Supplementary information

  1. Supplementary Table 1

    This table contains the detailed information of each pool/library

  2. Reporting Summary

  3. Supplementary Table 2

    This table contains the detailed information on each virus discovered in this study


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.