Zoonotic infectious diseases such as influenza continue to pose a grave threat to human health1. However, the factors that mediate the emergence of RNA viruses such as influenza A virus (IAV) are still incompletely understood2,3. Phylogenetic inference is crucial to reconstructing the origins and tracing the flow of IAV within and between hosts3,4,5,6,7,8. Here we show that explicitly allowing IAV host lineages to have independent rates of molecular evolution is necessary for reliable phylogenetic inference of IAV and that methods that do not do so, including ‘relaxed’ molecular clock models9, can be positively misleading. A phylogenomic analysis using a host-specific local clock model recovers extremely consistent evolutionary histories across all genomic segments and demonstrates that the equine H7N7 lineage is a sister clade to strains from birds—as well as those from humans, swine and the equine H3N8 lineage—sharing an ancestor with them in the mid to late 1800s. Moreover, major western and eastern hemisphere avian influenza lineages inferred for each gene coalesce in the late 1800s. On the basis of these phylogenies and the synchrony of these key nodes, we infer that the internal genes of avian influenza virus (AIV) underwent a global selective sweep beginning in the late 1800s, a process that continued throughout the twentieth century and up to the present. The resulting western hemispheric AIV lineage subsequently contributed most of the genomic segments to the 1918 pandemic virus and, independently, the 1963 equine H3N8 panzootic lineage. This approach provides a clear resolution of evolutionary patterns and processes in IAV, including the flow of viral genes and genomes within and between host lineages.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Morens, D. M., Folkers, G. K. & Fauci, A. S. The challenge of emerging and re-emerging infectious diseases. Nature 430, 242–249 (2004)
Parrish, C. R. et al. Cross-species virus transmission and the emergence of new epidemic diseases. Microbiol. Mol. Biol. Rev. 72, 457–470 (2008)
Holmes, E. C. The Evolution and Emergence of RNA Viruses (Oxford Univ. Press, 2009)
Webster, R. G., Bean, W. J., Gorman, O. T., Chambers, T. M. & Kawaoka, Y. Evolution and ecology of influenza A viruses. Microbiol. Rev. 56, 152–179 (1992)
Fitch, W. M., Bush, R. M., Bender, C. A. & Cox, N. J. Long term trends in the evolution of H(3) HA1 human influenza type A. Proc. Natl Acad. Sci. USA 94, 7712–7718 (1997)
Rambaut, A. et al. The genomic and epidemiological dynamics of human influenza A virus. Nature 453, 615–619 (2008)
Dugan, V. G. et al. The evolutionary genetics and emergence of avian influenza viruses in wild birds. PLoS Pathog. 4, e1000076 (2008)
Chen, R. & Holmes, E. C. Hitchhiking and the population genetic structure of avian influenza virus. J. Mol. Evol. 70, 98–105 (2010)
Drummond, A. J., Ho, S. Y. W., Phillips, M. J. & Rambaut, A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 4, e88 (2006)
Sovinova, O., Tumova, B., Poutska, F. & Nemec, J. Isolation of a virus causing respiratory disease in horses. Acta Virol. 2, 52–61 (1958)
Morens, D. M. & Taubenberger, J. K. Historical thoughts on influenza viral ecosystems, or behold a pale horse, dead dogs, failing fowl, and sick swine. Influenza Other Respir. Viruses 4, 327–337 (2010)
Judson, A. B. History and course of the epizoötic among horses upon the North American continent in 1872–73. Publ. Health Pap. Rep. 1, 88–109 (1873)
Morens, D. M. & Taubenberger, J. K. An avian outbreak associated with panzootic equine influenza in 1872: an early example of highly pathogenic avian influenza? Influenza Other Respir. Viruses 4, 373–377 (2010)
Rabadan, R., Levine, A. J. & Robins, H. Comparison of avian and human influenza A viruses reveals a mutational bias on the viral genomes. J. Virol. 80, 11887–11891 (2006)
Perroncito, E. Epizoozia tifoide nei gallinacei. Annli Reale Accad. Agric. Torino 21, 87–126 (1878)
Kaleta, E. F. & Rülke, C. P. A. in Avian Influenza (ed. Swayne, D. E. ) 145–189 (Blackwell, 2008)
Smith, G. J. D. et al. Dating the emergence of pandemic influenza viruses. Proc. Natl Acad. Sci. USA 106, 11709–11712 (2009)
Scholtens, R. G. & Steele, J. H. U. S. epizootic of equine influenza, 1963: Epizootiology. Public Health Rep. 79, 393–398 (1964)
Treanor, J. J., Snyder, M. H., London, W. T. & Murphy, B. R. The B allele of the NS gene of avian influenza viruses, but not the A allele, attenuates a human influenza A virus for squirrel monkeys. Virology 171, 1–9 (1989)
Barton, N. H. Genetic hitchhiking. Phil. Trans. R. Soc. Lond. B 355, 1553–1562 (2000)
Tong, S. et al. A distinct lineage of influenza A virus from bats. Proc. Natl Acad. Sci. USA 109, 4269–4274 (2012)
Tong, S. et al. New World bats harbor diverse influenza A viruses. PLoS Pathog. 9, e1003657 (2013)
Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012)
Hasegawa, M., Kishino, H. & Yano, T. Estimation of branching dates among primates by molecular clocks of nuclear DNA which slowed down in Hominoidea. J. Hum. Evol. 18, 461–476 (1989)
Rambaut, A. & Bromham, L. Estimating divergence dates from molecular sequences. Mol. Biol. Evol. 15, 442–448 (1998)
Yoder, A. D. & Yang, Z. Estimation of primate speciation dates using local molecular clocks. Mol. Biol. Evol. 17, 1081–1090 (2000)
Rambaut, A. & Grassly, N. C. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13L, 235–238 (1997)
Tavaré, S. Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math Life Sci. 17, 57–86 (1986)
Yang, Z. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10, 1396–1401 (1993)
Minin, V. N., Bloomquist, E. W. & Suchard, M. A. Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. Mol. Biol. Evol. 25, 1459–1471 (2008)
Bao, Y. et al. The influenza virus resource at the National Center for Biotechnology Information. J. Virol. 82, 596–601 (2008)
Keawcharoen, J. et al. Avian influenza H5N1 in tigers and leopards. Emerg. Infect. Dis. 10, 2189–2191 (2004)
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004)
Tamura, K. et al. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739 (2011)
Han, G. Z. & Worobey, M. Homologous recombination in negative sense RNA viruses. Viruses 3, 1358–1373 (2011)
Martin, D. P. et al. RDP3: a flexible and fast computer program for analyzing recombination. Bioinformatics 26, 2462–2463 (2010)
Smith, G. J. D. et al. Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature 459, 1122–1126 (2009)
Crawford, P. C. et al. Transmission of equine influenza virus to dogs. Science 310, 482–485 (2005)
Palese, P., Nakajima, K. & Desselberger, U. Recent human influenza A (H1N1) viruses are closely related genetically to strains isolated in 1950. Nature 274, 334–339 (1978)
Shapiro, B., Rambaut, A. & Drummond, A. J. Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol. Biol. Evol. 23, 7–9 (2006)
Swofford, D. L. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4 (Sinauer Associates, 2003)
Pond, S. L. K. et al. A random effects branch-site model for detecting episodic diversifying selection. Mol. Biol. Evol. 28, 3033–3043 (2011)
Fries, A. C. et al. Evidence for the circulation and inter-hemispheric movement of the H14 subtype influenza A virus. PLoS ONE 8, e59216 (2013)
Zhou, B. et al. Single-reaction genomic amplification accelerates sequencing and vaccine production for classical and Swine origin human influenza a viruses. J. Virol. 83, 10309–10313 (2009)
We thank J. Barnes, S. Meno, M. Shaw, R. Donis, S. Krauss, K. Friedman, R. Webster, Y. Muramoto and Y. Kawaoka for assistance in locating and sequencing A/equine/Detroit/3/1964(H7N7), A/chicken/Japan/1925(H7N7) and A/duck/Manitoba/1953(H10N7); M. Sanderson for comments on the HSLC model; S. Zohari for discussions of the NS1/2 A and B lineages; and M. Nachman, Y. Kawaoka, T. Watts, J. Cox, and D. Gill for comments. This work was supported by grants from the David and Lucile Packard Foundation to M.W., and the Wellcome Trust (grant no. 092807) to A.R. The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 278433-PREDEMICS and European Research Council grant agreement no. 260864. The methodological approach was developed in part with support from a grant from the National Institutes of Health/National Institute of Allergy and Infectious Diseases. (R01AI084691).
The authors declare no competing financial interests.
Extended data figures and tables
a, Summary of the 100 replicates corresponding to Fig. 1 (IAV-like substitution model). The box plots represent the median, quartile 1, quartile 3, minimum and maximum of the 100 median TMRCA estimates. The HSLC model recovered the ‘correct’ (model) tree topology in 100% of the simulations; the other models did so in 0%. With the relaxed clock the 95% credible interval for the TMRCA never included the real root node date, whereas the HSLC model did in 91% of the simulations. b, Summary of ten otherwise similar replicates, but simulated under a JC69 substitution model. c, Simulation with unequal sampling across clades, with ‘fast’ clade (‘avian’) sequences over-represented. (The model tree was identical to that in Fig. 1a except for the unequal number of sequences from the different clades as shown.) d, Simulation with ‘slow’ clade (‘equine’) sequences over-represented. Unlike the HSLC model, root date estimates are systematically biased under both strict and relaxed clock models and are strongly influenced by the balance of ‘fast-clade’ and ‘slow-clade’ sequences sampled.
MCC trees (median node heights) inferred under a UCLD relaxed molecular clock model for PB2 (a), PB1 (b), PA (c), HA (d), NP (e), NA (f), M1/2 (g) and NS1/2 (h). Host-specific rate distributions in substitutions per site per year are shown in insets at the top left. Trees are drawn to the same time scale, with branch lengths in years. Eastern (‘e’) and western (‘w’) hemisphere AIV lineages are highlighted with black and grey vertical bars, respectively. Colouring of branches and clades follows the pattern in Fig. 2. The median dates of node 1 and node 2 from the HSLC analyses depicted in Fig. 2 are shown here for comparison. As with the synthetic data sets (Fig. 1 and Extended Data Fig. 1), the topologies and timing estimated under a relaxed clock model seem to be compromised by a failure to account for host-specific rates. It is not readily apparent from these trees, for example, that the equine H7N7 lineage is basal to the AIV diversity or that the 1918 pandemic virus is nested within a western hemisphere AIV lineage. The root node in each tree is also severely biased towards more recent dates, similar to the results with simulated sequences. Data, input and full MCC tree files are available from http://dx.doi.org/10.5061/dryad.m04j9.
The branches are coloured to depict the proportion of substitutions along each branch that are under purifying selection (with dN/dS < 1: blue), the proportion evolving neutrally (with dN/dS = 1: grey) or under diversifying selection (with dN/dS > 1: red). In every gene, almost every site in every branch evidently evolved under purifying selection. In a few branches, a small proportion of sites show evidence of positive selection (for example, the branch between AIV and equine H7N7 in NS1/2). However, the proportion is so small that there seems to be no conceivable way that episodic diversifying selection occasioned by host jumps could be driving the overall dating estimates. Even for HA and NA, purifying selection dominates overwhelmingly.
U content patterns for PB2 (a), PB1 (b), PA (c), HA (d), NP (e), NA (f), M1/2 (g) and NS1/2 (h). The 95% confidence interval of avian U content is shown for each segment with a grey rectangle. U content versus year of sampling is shown by black diamond symbols for human H1N1 and bat H17N10, magenta diamonds for equine H7N7, and solid green circles for equine H3N8. The curves fitted to the H3N8 data are shown. The equine panzootic of 1872–73 is depicted with a vertical red line. The left dashed line corresponds to node 1 from Fig. 2; the right dashed line, node 2. P values beside the red lines reflect the tests of whether the equine H7N7 age estimates predate 1872 (see Methods); for HA, NA and NS1/2 the grey rectangle depicts the 95% confidence interval for the ingroup avian data (H7, N7 and NS1/2 A lineage, respectively). Avian H3, N8 and NS1/2 lineage B U content distributions are indicated with separate arrow lines. The estimated origin dates of the equine H7N7 genes based on U content values were: PB2 1548[1533–1574]; PB1 1842[1816–1877]; PA 1819[1795–1842]; H7 1880[1878–1884]; NP 1785[1747–1823]; N7 1387[1373–1413]; M1/2 1801[1724–1879]; NS1/2 1835[1810–1861].
a–f, Human PB2 (a), PB1 (b), PA (c), NP (d), M1/2 (e) and NS1/2 (f). g–l, Swine PB2 (g), PB1 (h), PA (i), NP (j), M1/2 (k) and NS1/2 (l). After nearly a century of steadily increasing U content in each of these mammalian hosts, these genes still show considerably lower U content than the corresponding equine H7N7 genes.
MCC trees (median node heights) inferred under the HSLC model and host-specific rate distributions (in substitutions per site per year, to the right of each tree) for H1 (a), N1 (b), H3 (c) and N8 (d). Trees are drawn to the same scale, with branch lengths in years. Eastern and western hemisphere AIV lineages are highlighted with black and grey vertical bars, respectively. Fully resolved trees including posterior probabilities for each node and 95% credible intervals on node dates are depicted in Supplementary Fig. 1i–l. These results suggest an avian origin of the H1 HA and N1 NA of the 1918 human pandemic virus, some time after the human/avian MRCA in about 1893 for HA and the human/avian MRCA in about 1914 for NA. For H1, the available sample of AIV sequences coalesces in about 1952. Hence, the H1 western and eastern hemisphere lineages were established very recently compared with the internal genes (Fig. 2). This means that current sampling can provide no information about the geographic origin of the HA gene of the 1918 virus. Similarly, for N1, a deep western hemisphere lineage shares an MRCA with the eastern hemisphere lineage in about 1919 (with a subsequent east-to-west dispersal in the early 1960s, indicated by a vertical arrow). Again, these data offer no insights into the geographical origin of the 1918 pandemic virus’s NA gene, because the 1918 sequence is not nested within either a western or an eastern hemisphere AIV clade as with the internal genes. If archival AIV sequences from closer to 1918 could be recovered, they might resolve these geographical questions. For H3 and N8, distinct equine lineages are apparent; however, when and where they crossed from the AIV reservoir remains unclear (see Supplementary Information for additional discussion).
a, Posterior density of substitution rates of HA and NA. b, Posterior density of TMRCA of all HA genetic diversity and all NA genetic diversity. c, Within-subtype TMRCAs for each HA and NA subtype.
These results are subtrees for PB2 (a), PB1 (b), PA (c), HA (d), NP (e), NA (f), M1/2 (g) and NS1/2 (h) taken from an analysis of the data sets in Fig. 2, but with the addition of the three newly sequenced complete genomes (A/chicken/Japan/1925, A/duck/Manitoba/1953 and A/equine/Detroit/3/1964), as well as several additional South American PB1 sequences, using an SRD06 substitution model (full trees are available from http://dx.doi.org/10.5061/dryad.m04j9). The main eastern hemisphere avian clades are collapsed for clarity and depicted as purple triangles. Each brown circle depicts the MRCA of the 1920s/1930s sequences from domestic birds. Each blue circle represents the MRCA of the major eastern hemisphere AIV clade and the closest 1920s/1930s virus for each gene. The A/chicken/Japan/1925 HPAI strain is highlighted in red. In each case it is clear that most of the post-1940s genetic diversity within eastern hemisphere AIV (as well as the various West-2 and West-3 western hemisphere lineages that emerged relatively recently from the eastern hemisphere) descends from the clade of 1920s/1930s ‘fowl plague’ (HPAI) and 1940s low-pathogenicity avian influenza (LPAI) avian influenza viruses of Eurasian domestic birds.
About this article
Cite this article
Worobey, M., Han, GZ. & Rambaut, A. A synchronized global sweep of the internal genes of modern avian influenza virus. Nature 508, 254–257 (2014). https://doi.org/10.1038/nature13016
Brazilian Journal of Microbiology (2020)
Nature Communications (2020)
Molecular Biology and Evolution (2020)
Radiation of the coralline red algae (Corallinophycidae, Rhodophyta) crown group as inferred from a multilocus time-calibrated phylogeny
Molecular Phylogenetics and Evolution (2020)
Detection of the epidemic of the H3N8 subtype of the equine influenza virus in large-scale donkey farms
International Journal of Veterinary Science and Medicine (2020)