Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A synchronized global sweep of the internal genes of modern avian influenza virus


Zoonotic infectious diseases such as influenza continue to pose a grave threat to human health1. However, the factors that mediate the emergence of RNA viruses such as influenza A virus (IAV) are still incompletely understood2,3. Phylogenetic inference is crucial to reconstructing the origins and tracing the flow of IAV within and between hosts3,4,5,6,7,8. Here we show that explicitly allowing IAV host lineages to have independent rates of molecular evolution is necessary for reliable phylogenetic inference of IAV and that methods that do not do so, including ‘relaxed’ molecular clock models9, can be positively misleading. A phylogenomic analysis using a host-specific local clock model recovers extremely consistent evolutionary histories across all genomic segments and demonstrates that the equine H7N7 lineage is a sister clade to strains from birds—as well as those from humans, swine and the equine H3N8 lineage—sharing an ancestor with them in the mid to late 1800s. Moreover, major western and eastern hemisphere avian influenza lineages inferred for each gene coalesce in the late 1800s. On the basis of these phylogenies and the synchrony of these key nodes, we infer that the internal genes of avian influenza virus (AIV) underwent a global selective sweep beginning in the late 1800s, a process that continued throughout the twentieth century and up to the present. The resulting western hemispheric AIV lineage subsequently contributed most of the genomic segments to the 1918 pandemic virus and, independently, the 1963 equine H3N8 panzootic lineage. This approach provides a clear resolution of evolutionary patterns and processes in IAV, including the flow of viral genes and genomes within and between host lineages.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Performance of different clock models on simulated data.
Figure 2: Host-specific local clock model results.
Figure 3: HA, NA and internal gene diversity.

Accession codes



Data deposits

Sequences for A/equine/Detroit/3/1964(H7N7), A/chicken/Japan/1925(H7N7) and A/duck/Manitoba/1953(H10N7) have been deposited in the GenBank database under accession numbers KF435047KF435062 and KF619244KF619250.


  1. Morens, D. M., Folkers, G. K. & Fauci, A. S. The challenge of emerging and re-emerging infectious diseases. Nature 430, 242–249 (2004)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  2. Parrish, C. R. et al. Cross-species virus transmission and the emergence of new epidemic diseases. Microbiol. Mol. Biol. Rev. 72, 457–470 (2008)

    PubMed  PubMed Central  Google Scholar 

  3. Holmes, E. C. The Evolution and Emergence of RNA Viruses (Oxford Univ. Press, 2009)

    Google Scholar 

  4. Webster, R. G., Bean, W. J., Gorman, O. T., Chambers, T. M. & Kawaoka, Y. Evolution and ecology of influenza A viruses. Microbiol. Rev. 56, 152–179 (1992)

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Fitch, W. M., Bush, R. M., Bender, C. A. & Cox, N. J. Long term trends in the evolution of H(3) HA1 human influenza type A. Proc. Natl Acad. Sci. USA 94, 7712–7718 (1997)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  6. Rambaut, A. et al. The genomic and epidemiological dynamics of human influenza A virus. Nature 453, 615–619 (2008)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  7. Dugan, V. G. et al. The evolutionary genetics and emergence of avian influenza viruses in wild birds. PLoS Pathog. 4, e1000076 (2008)

    PubMed  PubMed Central  Google Scholar 

  8. Chen, R. & Holmes, E. C. Hitchhiking and the population genetic structure of avian influenza virus. J. Mol. Evol. 70, 98–105 (2010)

    ADS  CAS  PubMed  Google Scholar 

  9. Drummond, A. J., Ho, S. Y. W., Phillips, M. J. & Rambaut, A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 4, e88 (2006)

    PubMed  PubMed Central  Google Scholar 

  10. Sovinova, O., Tumova, B., Poutska, F. & Nemec, J. Isolation of a virus causing respiratory disease in horses. Acta Virol. 2, 52–61 (1958)

    CAS  PubMed  Google Scholar 

  11. Morens, D. M. & Taubenberger, J. K. Historical thoughts on influenza viral ecosystems, or behold a pale horse, dead dogs, failing fowl, and sick swine. Influenza Other Respir. Viruses 4, 327–337 (2010)

    PubMed  PubMed Central  Google Scholar 

  12. Judson, A. B. History and course of the epizoötic among horses upon the North American continent in 1872–73. Publ. Health Pap. Rep. 1, 88–109 (1873)

    CAS  Google Scholar 

  13. Morens, D. M. & Taubenberger, J. K. An avian outbreak associated with panzootic equine influenza in 1872: an early example of highly pathogenic avian influenza? Influenza Other Respir. Viruses 4, 373–377 (2010)

    PubMed  PubMed Central  Google Scholar 

  14. Rabadan, R., Levine, A. J. & Robins, H. Comparison of avian and human influenza A viruses reveals a mutational bias on the viral genomes. J. Virol. 80, 11887–11891 (2006)

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Perroncito, E. Epizoozia tifoide nei gallinacei. Annli Reale Accad. Agric. Torino 21, 87–126 (1878)

    Google Scholar 

  16. Kaleta, E. F. & Rülke, C. P. A. in Avian Influenza (ed. Swayne, D. E. ) 145–189 (Blackwell, 2008)

    Google Scholar 

  17. Smith, G. J. D. et al. Dating the emergence of pandemic influenza viruses. Proc. Natl Acad. Sci. USA 106, 11709–11712 (2009)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  18. Scholtens, R. G. & Steele, J. H. U. S. epizootic of equine influenza, 1963: Epizootiology. Public Health Rep. 79, 393–398 (1964)

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Treanor, J. J., Snyder, M. H., London, W. T. & Murphy, B. R. The B allele of the NS gene of avian influenza viruses, but not the A allele, attenuates a human influenza A virus for squirrel monkeys. Virology 171, 1–9 (1989)

    CAS  Article  PubMed  Google Scholar 

  20. Barton, N. H. Genetic hitchhiking. Phil. Trans. R. Soc. Lond. B 355, 1553–1562 (2000)

    CAS  Google Scholar 

  21. Tong, S. et al. A distinct lineage of influenza A virus from bats. Proc. Natl Acad. Sci. USA 109, 4269–4274 (2012)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  22. Tong, S. et al. New World bats harbor diverse influenza A viruses. PLoS Pathog. 9, e1003657 (2013)

    PubMed  PubMed Central  Google Scholar 

  23. Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Hasegawa, M., Kishino, H. & Yano, T. Estimation of branching dates among primates by molecular clocks of nuclear DNA which slowed down in Hominoidea. J. Hum. Evol. 18, 461–476 (1989)

    Google Scholar 

  25. Rambaut, A. & Bromham, L. Estimating divergence dates from molecular sequences. Mol. Biol. Evol. 15, 442–448 (1998)

    CAS  PubMed  Google Scholar 

  26. Yoder, A. D. & Yang, Z. Estimation of primate speciation dates using local molecular clocks. Mol. Biol. Evol. 17, 1081–1090 (2000)

    CAS  PubMed  Google Scholar 

  27. Rambaut, A. & Grassly, N. C. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13L, 235–238 (1997)

    Google Scholar 

  28. Tavaré, S. Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math Life Sci. 17, 57–86 (1986)

    MathSciNet  MATH  Google Scholar 

  29. Yang, Z. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10, 1396–1401 (1993)

    CAS  PubMed  Google Scholar 

  30. Minin, V. N., Bloomquist, E. W. & Suchard, M. A. Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. Mol. Biol. Evol. 25, 1459–1471 (2008)

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Bao, Y. et al. The influenza virus resource at the National Center for Biotechnology Information. J. Virol. 82, 596–601 (2008)

    CAS  PubMed  Google Scholar 

  32. Keawcharoen, J. et al. Avian influenza H5N1 in tigers and leopards. Emerg. Infect. Dis. 10, 2189–2191 (2004)

    PubMed  PubMed Central  Google Scholar 

  33. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004)

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Tamura, K. et al. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739 (2011)

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Han, G. Z. & Worobey, M. Homologous recombination in negative sense RNA viruses. Viruses 3, 1358–1373 (2011)

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Martin, D. P. et al. RDP3: a flexible and fast computer program for analyzing recombination. Bioinformatics 26, 2462–2463 (2010)

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Smith, G. J. D. et al. Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature 459, 1122–1126 (2009)

    ADS  CAS  PubMed  Google Scholar 

  38. Crawford, P. C. et al. Transmission of equine influenza virus to dogs. Science 310, 482–485 (2005)

    ADS  CAS  PubMed  Google Scholar 

  39. Palese, P., Nakajima, K. & Desselberger, U. Recent human influenza A (H1N1) viruses are closely related genetically to strains isolated in 1950. Nature 274, 334–339 (1978)

    ADS  PubMed  Google Scholar 

  40. Shapiro, B., Rambaut, A. & Drummond, A. J. Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol. Biol. Evol. 23, 7–9 (2006)

    CAS  PubMed  Google Scholar 

  41. Swofford, D. L. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4 (Sinauer Associates, 2003)

    Google Scholar 

  42. Pond, S. L. K. et al. A random effects branch-site model for detecting episodic diversifying selection. Mol. Biol. Evol. 28, 3033–3043 (2011)

    Google Scholar 

  43. Fries, A. C. et al. Evidence for the circulation and inter-hemispheric movement of the H14 subtype influenza A virus. PLoS ONE 8, e59216 (2013)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  44. Zhou, B. et al. Single-reaction genomic amplification accelerates sequencing and vaccine production for classical and Swine origin human influenza a viruses. J. Virol. 83, 10309–10313 (2009)

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank J. Barnes, S. Meno, M. Shaw, R. Donis, S. Krauss, K. Friedman, R. Webster, Y. Muramoto and Y. Kawaoka for assistance in locating and sequencing A/equine/Detroit/3/1964(H7N7), A/chicken/Japan/1925(H7N7) and A/duck/Manitoba/1953(H10N7); M. Sanderson for comments on the HSLC model; S. Zohari for discussions of the NS1/2 A and B lineages; and M. Nachman, Y. Kawaoka, T. Watts, J. Cox, and D. Gill for comments. This work was supported by grants from the David and Lucile Packard Foundation to M.W., and the Wellcome Trust (grant no. 092807) to A.R. The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 278433-PREDEMICS and European Research Council grant agreement no. 260864. The methodological approach was developed in part with support from a grant from the National Institutes of Health/National Institute of Allergy and Infectious Diseases. (R01AI084691).

Author information

Authors and Affiliations



M.W., G.-Z.H. and A.R. designed the study. M.W. and A.R. conceived the analytical approach, and A.R. developed the software. G.-Z.H., M.W. and A.R. prepared the data sets. M.W., G.-Z.H. and A.R. performed the phylogenetic analyses. M.W. conducted the U content analyses. M.W. and A.R. wrote the paper. All authors discussed all the results and commented on the manuscript.

Corresponding authors

Correspondence to Michael Worobey or Andrew Rambaut.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Performance of different clock models on simulated data.

a, Summary of the 100 replicates corresponding to Fig. 1 (IAV-like substitution model). The box plots represent the median, quartile 1, quartile 3, minimum and maximum of the 100 median TMRCA estimates. The HSLC model recovered the ‘correct’ (model) tree topology in 100% of the simulations; the other models did so in 0%. With the relaxed clock the 95% credible interval for the TMRCA never included the real root node date, whereas the HSLC model did in 91% of the simulations. b, Summary of ten otherwise similar replicates, but simulated under a JC69 substitution model. c, Simulation with unequal sampling across clades, with ‘fast’ clade (‘avian’) sequences over-represented. (The model tree was identical to that in Fig. 1a except for the unequal number of sequences from the different clades as shown.) d, Simulation with ‘slow’ clade (‘equine’) sequences over-represented. Unlike the HSLC model, root date estimates are systematically biased under both strict and relaxed clock models and are strongly influenced by the balance of ‘fast-clade’ and ‘slow-clade’ sequences sampled.

Extended Data Figure 2 Relaxed molecular clock results.

MCC trees (median node heights) inferred under a UCLD relaxed molecular clock model for PB2 (a), PB1 (b), PA (c), HA (d), NP (e), NA (f), M1/2 (g) and NS1/2 (h). Host-specific rate distributions in substitutions per site per year are shown in insets at the top left. Trees are drawn to the same time scale, with branch lengths in years. Eastern (‘e’) and western (‘w’) hemisphere AIV lineages are highlighted with black and grey vertical bars, respectively. Colouring of branches and clades follows the pattern in Fig. 2. The median dates of node 1 and node 2 from the HSLC analyses depicted in Fig. 2 are shown here for comparison. As with the synthetic data sets (Fig. 1 and Extended Data Fig. 1), the topologies and timing estimated under a relaxed clock model seem to be compromised by a failure to account for host-specific rates. It is not readily apparent from these trees, for example, that the equine H7N7 lineage is basal to the AIV diversity or that the 1918 pandemic virus is nested within a western hemisphere AIV lineage. The root node in each tree is also severely biased towards more recent dates, similar to the results with simulated sequences. Data, input and full MCC tree files are available from

Extended Data Figure 3 Branch-site REL analyses to test for episodic diversifying selection.

The branches are coloured to depict the proportion of substitutions along each branch that are under purifying selection (with dN/dS < 1: blue), the proportion evolving neutrally (with dN/dS = 1: grey) or under diversifying selection (with dN/dS > 1: red). In every gene, almost every site in every branch evidently evolved under purifying selection. In a few branches, a small proportion of sites show evidence of positive selection (for example, the branch between AIV and equine H7N7 in NS1/2). However, the proportion is so small that there seems to be no conceivable way that episodic diversifying selection occasioned by host jumps could be driving the overall dating estimates. Even for HA and NA, purifying selection dominates overwhelmingly.

Extended Data Figure 4 Uracil content patterns.

U content patterns for PB2 (a), PB1 (b), PA (c), HA (d), NP (e), NA (f), M1/2 (g) and NS1/2 (h). The 95% confidence interval of avian U content is shown for each segment with a grey rectangle. U content versus year of sampling is shown by black diamond symbols for human H1N1 and bat H17N10, magenta diamonds for equine H7N7, and solid green circles for equine H3N8. The curves fitted to the H3N8 data are shown. The equine panzootic of 1872–73 is depicted with a vertical red line. The left dashed line corresponds to node 1 from Fig. 2; the right dashed line, node 2. P values beside the red lines reflect the tests of whether the equine H7N7 age estimates predate 1872 (see Methods); for HA, NA and NS1/2 the grey rectangle depicts the 95% confidence interval for the ingroup avian data (H7, N7 and NS1/2 A lineage, respectively). Avian H3, N8 and NS1/2 lineage B U content distributions are indicated with separate arrow lines. The estimated origin dates of the equine H7N7 genes based on U content values were: PB2 1548[1533–1574]; PB1 1842[1816–1877]; PA 1819[1795–1842]; H7 1880[1878–1884]; NP 1785[1747–1823]; N7 1387[1373–1413]; M1/2 1801[1724–1879]; NS1/2 1835[1810–1861].

Extended Data Figure 5 Uracil content patterns for human and swine IAV internal genes.

af, Human PB2 (a), PB1 (b), PA (c), NP (d), M1/2 (e) and NS1/2 (f). gl, Swine PB2 (g), PB1 (h), PA (i), NP (j), M1/2 (k) and NS1/2 (l). After nearly a century of steadily increasing U content in each of these mammalian hosts, these genes still show considerably lower U content than the corresponding equine H7N7 genes.

Extended Data Figure 6 HSLC results for H1, N1, H3 and N8.

MCC trees (median node heights) inferred under the HSLC model and host-specific rate distributions (in substitutions per site per year, to the right of each tree) for H1 (a), N1 (b), H3 (c) and N8 (d). Trees are drawn to the same scale, with branch lengths in years. Eastern and western hemisphere AIV lineages are highlighted with black and grey vertical bars, respectively. Fully resolved trees including posterior probabilities for each node and 95% credible intervals on node dates are depicted in Supplementary Fig. 1i–l. These results suggest an avian origin of the H1 HA and N1 NA of the 1918 human pandemic virus, some time after the human/avian MRCA in about 1893 for HA and the human/avian MRCA in about 1914 for NA. For H1, the available sample of AIV sequences coalesces in about 1952. Hence, the H1 western and eastern hemisphere lineages were established very recently compared with the internal genes (Fig. 2). This means that current sampling can provide no information about the geographic origin of the HA gene of the 1918 virus. Similarly, for N1, a deep western hemisphere lineage shares an MRCA with the eastern hemisphere lineage in about 1919 (with a subsequent east-to-west dispersal in the early 1960s, indicated by a vertical arrow). Again, these data offer no insights into the geographical origin of the 1918 pandemic virus’s NA gene, because the 1918 sequence is not nested within either a western or an eastern hemisphere AIV clade as with the internal genes. If archival AIV sequences from closer to 1918 could be recovered, they might resolve these geographical questions. For H3 and N8, distinct equine lineages are apparent; however, when and where they crossed from the AIV reservoir remains unclear (see Supplementary Information for additional discussion).

Extended Data Figure 7 HA and NA genetic diversity analysis rates and dates (from Fig. 3).

a, Posterior density of substitution rates of HA and NA. b, Posterior density of TMRCA of all HA genetic diversity and all NA genetic diversity. c, Within-subtype TMRCAs for each HA and NA subtype.

Extended Data Figure 8 Phylogenetic evidence of AIV gene flow from domestic to wild birds.

These results are subtrees for PB2 (a), PB1 (b), PA (c), HA (d), NP (e), NA (f), M1/2 (g) and NS1/2 (h) taken from an analysis of the data sets in Fig. 2, but with the addition of the three newly sequenced complete genomes (A/chicken/Japan/1925, A/duck/Manitoba/1953 and A/equine/Detroit/3/1964), as well as several additional South American PB1 sequences, using an SRD06 substitution model (full trees are available from The main eastern hemisphere avian clades are collapsed for clarity and depicted as purple triangles. Each brown circle depicts the MRCA of the 1920s/1930s sequences from domestic birds. Each blue circle represents the MRCA of the major eastern hemisphere AIV clade and the closest 1920s/1930s virus for each gene. The A/chicken/Japan/1925 HPAI strain is highlighted in red. In each case it is clear that most of the post-1940s genetic diversity within eastern hemisphere AIV (as well as the various West-2 and West-3 western hemisphere lineages that emerged relatively recently from the eastern hemisphere) descends from the clade of 1920s/1930s ‘fowl plague’ (HPAI) and 1940s low-pathogenicity avian influenza (LPAI) avian influenza viruses of Eurasian domestic birds.

Extended Data Table 1 Dating estimates for key nodes on Fig. 2 with different substitution models, subsamples of sequences and data partitions
Extended Data Table 2 Complete or partial sweeps of eastern hemisphere-origin AIV internal genes across western hemisphere AIV in recent decades

Supplementary information

Supplementary Information

This file contains a Supplementary Discussion, Supplementary Figure 1 and Supplementary References. (PDF 7381 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Worobey, M., Han, GZ. & Rambaut, A. A synchronized global sweep of the internal genes of modern avian influenza virus. Nature 508, 254–257 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing