Review Article | Published:

Evolutionary analysis of the dynamics of viral infectious disease

Nature Reviews Genetics volume 10, pages 540550 (2009) | Download Citation



Many organisms that cause infectious diseases, particularly RNA viruses, mutate so rapidly that their evolutionary and ecological behaviours are inextricably linked. Consequently, aspects of the transmission and epidemiology of these pathogens are imprinted on the genetic diversity of their genomes. Large-scale empirical analyses of the evolutionary dynamics of important pathogens are now feasible owing to the increasing availability of pathogen sequence data and the development of new computational and statistical methods of analysis. In this Review, we outline the questions that can be answered using viral evolutionary analysis across a wide range of biological scales.

Key points

  • The rapid evolution of many pathogens, particularly RNA viruses, means that their evolution and ecology occur on the same timescale, and therefore must be studied jointly to be fully understood.

  • The rapid growth in gene sequence data and the development of new analysis techniques has enabled researchers to study the evolutionary dynamics of important human pathogens such as HIV, influenza, hepatitis C and dengue virus. The term phylodynamics has come to be associated with such studies.

  • Phylodynamic questions arise in a number of practical contexts, including epidemic surveillance, outbreak control, forensics and clinical medicine.

  • Evolutionary analysis methods can be applied to the investigation of viral dynamics at different organizational scales, from global studies of pathogen dissemination among continents, to the dynamics of infection within the tissues of individual infected hosts.

  • Viral genomes are an important and independent source of information about epidemiological processes, thereby supporting and corroborating epidemiological results obtained using standard surveillance methods.

  • The introduction of next-generation sequencing technologies will greatly increase the amount of viral genetic data available for analysis. Substantial changes and improvements to analysis methodologies will be necessary to deal with this exciting change.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


  1. 1.

    , & Rates of evolutionary change in viruses: patterns and determinants. Nature Rev. Genet. 9, 267–276 (2008).

  2. 2.

    et al. Unifying the epidemiological and evolutionary dynamics of pathogens. Science 303, 327–332 (2004).

  3. 3.

    , , , & Revealing the history of infectious disease epidemics through phylogenetic trees. Philos. Trans. R. Soc. Lond. B 349, 33–40 (1995). An early exposition of the idea that pathogen gene sequences contain information about the epidemic history of infectious disease.

  4. 4.

    et al. Different population dynamics of human T cell lymphotropic virus type II in intravenous drug users compared with endemically infected tribes. Proc. Natl Acad. Sci. USA 96, 13253–13258 (1999).

  5. 5.

    et al. The genomic and epidemiological dynamics of human influenza A virus. Nature 453, 615–619 (2008).

  6. 6.

    et al. The global circulation of seasonal influenza A (H3N2) viruses. Science 320, 340–346 (2008). References 5 and 6 generate new insights into the global evolutionary dynamics of human influenza A virus by analysing thousands of influenza gene sequences collected worldwide.

  7. 7.

    , , , & Complete genome analysis of 33 ecologically and biologically diverse Rift Valley fever virus strains reveals widespread virus movement and low genetic diversity due to recent common ancestry. J. Virol. 81, 2805–2816 (2007).

  8. 8.

    et al. The emergence of HIV/AIDS in the Americas and beyond. Proc. Natl Acad. Sci. USA 104, 18566–18570 (2007).

  9. 9.

    et al. Chimpanzee reservoirs of pandemic and nonpandemic HIV-1. Science 313, 523–526 (2006).

  10. 10.

    et al. Timing the ancestor of the HIV-1 pandemic strains. Science 288, 1789–1796 (2000).

  11. 11.

    et al. The molecular population genetics of HIV-1 group O. Genetics 167, 1059–1068 (2004).

  12. 12.

    et al. Tracing the origin and history of the HIV-2 epidemic. Proc. Natl Acad. Sci. USA 100, 6588–6592 (2003).

  13. 13.

    , , , & Human immunodeficiency virus — phylogeny and the origin of HIV-1. Nature 410, 1047–1048 (2001).

  14. 14.

    et al. Unprecedented degree of human immunodeficiency virus type 1 (HIV-1) group M genetic diversity in the Democratic Republic of Congo suggests that the HIV-1 pandemic originated in Central Africa. J. Virol. 74, 10498–10507 (2000).

  15. 15.

    et al. Direct evidence of extensive diversity of HIV-1 in Kinshasa by 1960. Nature 455, 661–664 (2008).

  16. 16.

    et al. An African HIV-1 sequence from 1959 and implications for the origin of the epidemic. Nature 391, 594–597 (1998).

  17. 17.

    , , , & Predicting the evolution of human influenza A. Science 286, 1921–1925 (1999).

  18. 18.

    , , & Long term trends in the evolution of H(3) HA1 human influenza type A. Proc. Natl Acad. Sci. USA 94, 7712–7718 (1997).

  19. 19.

    , , & Positive Darwinian evolution in human influenza A viruses. Proc. Natl Acad. Sci. USA 88, 4270–4274 (1991).

  20. 20.

    et al. Whole-genome analysis of human influenza A virus reveals multiple persistent lineages and reassortment among recent H3N2 viruses. PLoS Biol. 3, e300 (2005).

  21. 21.

    et al. Molecular epidemiology of A/H3N2 and A/H1N1 influenza virus during a single epidemic season in the United States. PLoS Pathog. 4, e1000133 (2008).

  22. 22.

    , , , & Phylogenetic analysis reveals the global migration of seasonal influenza A viruses. PLoS Pathog. 3, 1220–1228 (2007).

  23. 23.

    et al. Multiple reassortment events in the evolutionary history of H1N1 influenza A virus since 1918. PLoS Pathog. 4, e1000012 (2008).

  24. 24.

    et al. Mapping the antigenic and genetic evolution of influenza virus. Science 305, 371–376 (2004).

  25. 25.

    , , & Relaxed phylogenetics and dating with confidence. PloS Biol. 4, 699–710 (2006).

  26. 26.

    et al. Estimating the date of origin of an HIV-1 circulating recombinant form. Virology 387, 229–234 (2009).

  27. 27.

    et al. Adaptation of HIV-1 to its human host. Mol. Biol. Evol. 24, 1853–1860 (2007).

  28. 28.

    et al. Adaptation of HIV-1 to human leukocyte antigen class I. Nature 458, 641–645 (2009).

  29. 29.

    et al. Adaptation to different human populations by HIV-1 revealed by codon-based analyses. PLoS Comput. Biol. 2, e62 (2006).

  30. 30.

    et al. The epidemic behavior of the hepatitis C virus. Science 292, 2323–2325 (2001). Explicitly links epidemiological and population genetic models for the first time, thereby demonstrating that the basic reproductive number of a virus can be estimated from genetic data.

  31. 31.

    et al. Genetic history of hepatitis C virus in East Asia. J. Virol. 83, 1071–1082 (2009).

  32. 32.

    et al. Phylogeography of circulating populations of human echovirus 30 over 50 years: nucleotide polymorphism and signature of purifying selection in the VP1 capsid protein gene. Infect. Genet. Evol. 9, 699–708 (2009).

  33. 33.

    et al. Transmission networks and population turnover of echovirus 30. J. Virol. 83, 2109–2118 (2009).

  34. 34.

    , , & Evolutionary dynamics of human and avian metapneumoviruses. J. Gen. Virol. 89, 2933–2942 (2008).

  35. 35.

    , , & A statistical phylogeography of influenza A H5N1. Proc. Natl Acad. Sci. USA 104, 4473–4478 (2007).

  36. 36.

    et al. Evolutionary and transmission dynamics of reassortant H5N1 influenza virus in Indonesia. PLoS Pathog. 4, e1000130 (2008).

  37. 37.

    et al. Evolutionary dynamics and emergence of panzootic H5N1 influenza viruses. PloS Pathog. 4, e1000161 (2008).

  38. 38.

    et al. Coronavirus as a possible cause of severe acute respiratory syndrome. Lancet 361, 1319–1325 (2003).

  39. 39.

    et al. Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China. Science 302, 276–278 (2003).

  40. 40.

    et al. Bats are natural reservoirs of SARS-like coronaviruses. Science 310, 676–679 (2005).

  41. 41.

    et al. Evidence of the recombinant origin of a bat severe acute respiratory syndrome (SARS)-like coronavirus and its implications on the direct ancestor of SARS coronavirus. J. Virol. 82, 1819–1826 (2008).

  42. 42.

    et al. Origin of the West Nile virus responsible for an outbreak of encephalitis in the northeastern United States. Science 286, 2333–2337 (1999).

  43. 43.

    et al. Recent epidemic of acute hepatitis C virus in HIV-positive men who have sex with men linked to high-risk sexual behaviours. Aids 21, 983–991 (2007).

  44. 44.

    , , & Genetic analysis reveals the complex structure of HIV-1 transmission within defined risk groups. Proc. Natl Acad. Sci. USA 102, 4425–4429 (2005).

  45. 45.

    et al. Genomic epidemiology of a dengue virus epidemic in urban Singapore. J. Virol., 83, 4163–4173 (2009).

  46. 46.

    , , , & A high-resolution genetic signature of demographic and spatial expansion in epizootic rabies virus. Proc. Natl Acad. Sci. USA 104, 7993–7998 (2007). Shows the application of evolutionary analysis to wildlife disease and provides an excellent example of how epidemiological, spatial and genetic data can be combined.

  47. 47.

    , , , & Invasion and maintenance of dengue virus type 2 and type 4 in the Americas. J. Virol. 79, 14680–14687 (2005).

  48. 48.

    , & Wave-like spread of Ebola Zaire. PloS Biol. 3, e371 (2005).

  49. 49.

    et al. Epidemiology, genetic diversity, and evolution of endemic feline immunodeficiency virus in a population of wild cougars. J. Virol. 77, 9578–9589 (2003).

  50. 50.

    et al. Pandemic potential of a strain of influenza A (H1N1): early findings. Science 11 May 2009 (doi:10.1126/science.1176062).

  51. 51.

    et al. A comparison of the molecular clock of hepatitis C virus in the United States and Japan predicts that hepatocellular carcinoma incidence in the United States will increase over the next two decades. Proc. Natl Acad. Sci. USA 99, 15584–15589 (2002).

  52. 52.

    , , , & The epidemiology and iatrogenic transmission of hepatitis C virus in Egypt: a Bayesian coalescent approach. Mol. Biol. Evol. 20, 381–387 (2003).

  53. 53.

    , , , & The epidemic origin and molecular properties of B′: a founder strain of the HIV-1 transmission in Asia. AIDS 22, 1851–1858 (2008).

  54. 54.

    et al. Increasing prevalence of HIV-1 subtype A in Greece: estimating epidemic history and origin. J. Infect. Dis. 196, 1167–1176 (2007).

  55. 55.

    et al. Diversification of rice yellow mottle virus and related viruses spans the history of agriculture from the Neolithic to the present. PloS Pathog. 4, e1000125 (2008).

  56. 56.

    , , , & Spread of an introduced vector-borne banana virus in Hawaii. Mol. Ecol. 18, 136–146 (2009).

  57. 57.

    & The molecular clock of HIV-1 unveiled through analysis of a known transmission history. Proc. Natl Acad. Sci. USA 96, 10752–10757 (1999).

  58. 58.

    et al. Population genetic estimation of the loss of genetic diversity during horizontal transmission of HIV-1. BMC Evol. Biol. 6, 28 (2006).

  59. 59.

    et al. Long-term evolution of the hypervariable region of hepatitis C virus in a common-source-infected cohort. J. Virol. 72, 4893–4905 (1998).

  60. 60.

    Clinical outcomes after hepatitis C infection from contaminated anti-D immune globulin. Irish Hepatology Research Group. N. Engl. J. Med. 340, 1228–1233 (1999).

  61. 61.

    et al. Analysis of the overdispersed clock in the short-term evolution of hepatitis C virus: using the E1/E2 gene sequences to infer infection dates in a single source outbreak. Mol. Biol. Evol. 23, 1242–1253 (2006).

  62. 62.

    et al. Molecular footprint of drug-selective pressure in a human immunodeficiency virus transmission chain. J. Virol. 79, 11981–11989 (2005).

  63. 63.

    , , , & Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. Proc. Natl Acad. Sci. USA 93, 10864–10869 (1996).

  64. 64.

    et al. Phylogenetic reconstruction of a known HIV-1 CRF04_cpx transmission network using maximum likelihood and Bayesian methods. J. Mol. Evol. 59, 709–717 (2004).

  65. 65.

    , & Germs, genomes and genealogies. Trends Ecol. Evol. 20, 39–45 (2005).

  66. 66.

    , , & HIV phylogenetics — criminal convictions relying solely on this to establish transmission are unsafe. BMJ 335, 460–461 (2007).

  67. 67.

    et al. Integrating genetic and epidemiological data to determine transmission pathways of foot-and-mouth disease virus. Proc. R. Soc. Lond. B 275, 887–895 (2008). An innovative statistical analysis of the UK FMDV outbreak that directly combines genetic sequence data with epidemiological surveillance data.

  68. 68.

    et al. Molecular epidemiology — HIV-1 and HCV sequences from Libyan outbreak. Nature 444, 836–837 (2006).

  69. 69.

    et al. Molecular epidemiology of the novel coronavirus that causes severe acute respiratory syndrome. Lancet 363, 99–104 (2004).

  70. 70.

    , , , & Episodic sexual transmission of HIV revealed by molecular phylodynamics. PloS Med. 5, 392–402 (2008).

  71. 71.

    et al. Infection with chikungunya virus in Italy: an outbreak in a temperate region. Lancet 370, 1840–1846 (2007).

  72. 72.

    et al. Viral quasi-species evolution during hepatitis Be antigen seroconversion. Gastroenterology 133, 951–958 (2007).

  73. 73.

    et al. Consistent viral evolutionary changes associated with the progression of human immunodeficiency virus type 1 infection. J. Virol. 73, 10489–10502 (1999). Reports a comprehensive set of HIV-1 sequences sampled from nine infected patients, the analysis of which has provided new insights into the evolution of HIV.

  74. 74.

    et al. Adaptive evolution of human immunodeficiency virus-type 1 during the natural course of infection. Science 272, 537–542 (1996).

  75. 75.

    et al. Synonymous substitution rates predict HIV disease progression as a result of underlying replication dynamics. PLoS Comput. Biol. 3, 282–292 (2007).

  76. 76.

    Adaptation in the env gene of HIV-1 and evolutionary theories of disease progression. Mol. Biol. Evol. 20, 1318–1325 (2003).

  77. 77.

    et al. Characterization of human immunodeficiency virus type 1 (HIV-1) envelope variation and neutralizing antibody responses during transmission of HIV-1 subtype B. J. Virol. 79, 6523–6527 (2005).

  78. 78.

    et al. The outcome of acute hepatitis C predicted by the evolution of the viral quasispecies. Science 288, 339–344 (2000).

  79. 79.

    , , & High-resolution phylogenetic analysis of hepatitis C virus adaptation and its relationship to disease progression. J. Virol. 78, 3447–3454 (2004).

  80. 80.

    et al. Evolution of hepatitis C viral quasispecies and hepatic injury in perinatally infected children followed prospectively. Proc. Natl Acad. Sci. USA 103, 8475–8480 (2006).

  81. 81.

    et al. HIV evolution: CTL escape mutation and reversion after transmission. Nature Med. 10, 282–289 (2004).

  82. 82.

    et al. Evidence of HIV-1 adaptation to HLA-restricted immune responses at a population level. Science 296, 1439–1443 (2002).

  83. 83.

    et al. Founder effects in the assessment of HIV polymorphisms and HLA allele associations. Science 315, 1583–1586 (2007).

  84. 84.

    , , & Inefficient cytotoxic T lymphocyte-mediated killing of HIV-1-infected cells in vivo. PLoS Biol. 4, e90 (2006).

  85. 85.

    et al. In vivo compartmentalization of human immunodeficiency virus: evidence from the examination of pol sequences from autopsy tissues. J. Virol. 71, 2059–2071 (1997).

  86. 86.

    et al. Genetic differences between blood- and brain-derived viral sequences from human immunodeficiency virus type 1-infected patients: evidence of conserved elements in the V3 region of the envelope protein of brain-derived sequences. J. Virol. 68, 7467–7481 (1994).

  87. 87.

    et al. Phylodynamic analysis of human immunodeficiency virus type 1 in distinct brain compartments provides a model for the neuropathogenesis of AIDS. J. Virol. 79, 11343–11352 (2005).

  88. 88.

    et al. Preferential detection of HIV subtype C′ over subtype A in cervical cells from a dually infected woman. AIDS 19, 990–993 (2005).

  89. 89.

    et al. Semen-specific genetic characteristics of human immunodeficiency virus type 1 env. J. Virol. 79, 1734–1742 (2005).

  90. 90.

    et al. Distinct hepatitis C virus core and F protein quasispecies in tumoral and nontumoral hepatocytes isolated via microdissection. Hepatology 46, 1704–1712 (2007).

  91. 91.

    , & Linking within- and between-host dynamics in the evolutionary epidemiology of infectious diseases. Trends Ecol. Evol. 23, 511–517 (2008).

  92. 92.

    et al. Prevalence of drug-resistant HIV-1 variants in untreated individuals in Europe: implications for clinical management. J. Infect. Dis. 192, 958–966 (2005).

  93. 93.

    & Virus Dynamics (Oxford Univ. Press, Oxford, 2000).

  94. 94.

    et al. Evolutionary indicators of human immunodeficiency virus type 1 reservoirs and compartments. J. Virol. 77, 5540–5546 (2003).

  95. 95.

    , , , & Linking dynamical and population genetic models of persistent viral infection. Am. Nat. 162, 14–28 (2003).

  96. 96.

    , , , & Variation in HIV-1 set-point viral load: epidemiological analysis and an evolutionary hypothesis. Proc. Natl Acad. Sci. USA 104, 17441–17446 (2007).

  97. 97.

    et al. Coalescent estimates of HIV-1 generation time in vivo. Proc. Natl Acad. Sci. USA 96, 2187–2191 (1999). A key paper that extended coalescent theory to sequences sampled at different times. Itis the first study to apply coalescent theory to virus dynamics at the within-host level.

  98. 98.

    et al. A robust measure of HIV-1 population turnover within chronically infected individuals. Mol. Biol. Evol. 21, 1902–1912 (2004).

  99. 99.

    et al. Traces of human migrations in Helicobacter pylori populations. Science 299, 1582–1585 (2003).

  100. 100.

    et al. Novel mammalian herpesviruses and lineages within the Gammaherpesvirinae: cospeciation and interspecies transfer. J. Virol. 82, 3509–3516 (2008).

  101. 101.

    et al. Early origin and recent expansion of Plasmodium falciparum. Science 300, 318–321 (2003).

  102. 102.

    et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).

  103. 103.

    & Many-core algorithms for statistical phylogenetics. Bioinformatics 25, 1370–1376 (2009).

  104. 104.

    SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14, 68–73 (1998).

  105. 105.

    , & Causes of HIV diversity. Nature 376, 125 (1995).

  106. 106.

    & The population genetics of dN/dS. PLoS Genet. 4, e1000304 (2008).

  107. 107.

    , & Correlating viral phenotypes with phylogeny: accounting for phylogenetic uncertainty. Infect. Genet. Evol. 8, 239–246 (2008).

  108. 108.

    et al. Latent infection of CD4+ T cells provides a mechanism for lifelong persistence of HIV-1, even in patients on effective combination therapy. Nature Med. 5, 512–517 (1999).

  109. 109.

    , & HIV evolutionary dynamics within and among hosts. AIDS Rev. 8, 125–140 (2006).

  110. 110.

    , , & Dynamic correlation between intrahost HIV-1 quasispecies evolution and disease progression. PLoS Comput. Biol. 4, e1000240 (2008).

  111. 111.

    et al. Unequal evolutionary rates in the human immunodeficiency virus type 1 (HIV-1) pandemic: the evolutionary rate of HIV-1 slows down when the epidemic rate increases. J. Virol. 81, 10625–10635 (2007).

Download references


We would like to thank E. Holmes and three referees for commenting on the manuscript and improving it immeasurably. We thank A. Drummond and P. Lemey for providing rates of HIV-1 evolution for FIG. 4. Finally we gratefully acknowledge The Royal Society of London, which supports both authors.

Author information


  1. Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK.

    • Oliver G. Pybus
  2. Institute for Evolutionary Biology, University of Edinburgh, Kings Buildings, Ashworth Laboratories, West Mains Road, Edinburgh EH9 3JT, UK.

    • Andrew Rambaut


  1. Search for Oliver G. Pybus in:

  2. Search for Andrew Rambaut in:


Balancing selection

Any form of natural selection that results in the maintenance of genetic polymorphisms in a population, as opposed to their loss through fixation or elimination.

Diversifying selection

Any form of natural selection that generates high levels of genetic diversity; for example, recurrent positive selection or balancing selection.

Parsimony approach

A principle of evolutionary inference, based on the assumption that the best-supported evolutionary history for a characteristic is the one that requires the fewest number of changes in that characteristic.

Molecular clock

A statistical model that describes the relationship between time and the genetic distances among nucleotide sequences. In contrast to older molecular clock models, contemporary models no longer require the assumption that the rates of nucleotide change are constant through time.

Coalescent theory

A theory that describes the shape and size of genealogies that represent the shared ancestry of sampled genes. It describes how the statistical distribution of branch lengths in genealogies depends on population processes such as size change and structure.


A form of genome recombination occasionally exhibited by viruses, such as influenza, which have a genome composed of multiple RNA molecules (genomic segments). The resulting virus produced by reassortment possesses a mixture of genomic segments from two or more parental viruses.

Selective sweep

The rapid increase in frequency of a mutation owing to positive selection for that mutation.


An infectious disease transmitted from animals to humans, or the event of cross-species transmission.

Positive selection

Also known as directional selection. A form of natural selection that results from an increase in the relative frequency of one genetic variant compared with other variants. It often results in the fixation of the selected variant in the population.

Herd immunity

The protection of susceptible members of a population from infection owing to the sufficiently high prevalence of immune individuals.

Metapopulation structure

A metapopulation is composed of multiple subpopulations, among which there is gene flow. Subpopulations also arise and become extinct dynamically through time.

Fundamental epidemiological parameter

(R0). The basic reproductive number of an infectious disease, from which many epidemiological predictions can be made. It is equal to the number of secondary infections caused by a single infection in a wholly susceptible host population.

Index case

The first infection in an epidemic or outbreak, from which all subsequent infections are ultimately descended.

Cross-sectional survey

An investigation that samples a population at a specific point in time. A longitudinal survey, by contrast, samples a population at several different times.

Bayesian inference

A method of statistical inference that uses Bayes' theorem to calculate the probability of a hypothesis. Such methods combine prior information with new observations or data.

About this article

Publication history



Further reading