A predictive fitness model for influenza

Journal name:
Nature
Volume:
507,
Pages:
57–61
Date published:
DOI:
doi:10.1038/nature13087
Received
Accepted
Published online

Abstract

The seasonal human influenza A/H3N2 virus undergoes rapid evolution, which produces significant year-to-year sequence turnover in the population of circulating strains. Adaptive mutations respond to human immune challenge and occur primarily in antigenic epitopes, the antibody-binding domains of the viral surface protein haemagglutinin. Here we develop a fitness model for haemagglutinin that predicts the evolution of the viral population from one year to the next. Two factors are shown to determine the fitness of a strain: adaptive epitope changes and deleterious mutations outside the epitopes. We infer both fitness components for the strains circulating in a given year, using population-genetic data of all previous strains. From fitness and frequency of each strain, we predict the frequency of its descendent strains in the following year. This fitness model maps the adaptive history of influenza A and suggests a principled method for vaccine selection. Our results call for a more comprehensive epidemiology of influenza and other fast-evolving pathogens that integrates antigenic phenotypes with other viral functions coupled by genetic linkage.

At a glance

Figures

  1. Evolution of influenza clades.
    Figure 1: Evolution of influenza clades.

    The figure shows a partial influenza strain tree, which is based on strains observed in years 2002 and 2003 (bullets and circles). Each strain i has a frequency xi in its season’s strain population. Our units of prediction are clades, which are defined as sets of strains descending from recent last common ancestors. For one of these clades, we mark its strain content in winter seasons t = 2002 and t + 1 (light-colour bullets) and its last common ancestor (blue diamond). Each clade is linked by a set of mutations to the last common ancestor of all strains in year t (black diamond); codon position and target amino acid of these mutations are indicated for the marked clade. A clade ν observed in season t has a frequency , which is the sum of the frequencies of its strains in season t. The marked clade grows substantially from = 0.08 to = 0.86.

  2. Year-to-year predictions of HA evolution.
    Figure 2: Year-to-year predictions of HA evolution.

    a, Wrightian fitness: the predicted frequency ratio is plotted against the posterior ratio for 188 influenza HA clades with initial frequency > 0.15 observed since 1993 (error bars due to tree reconstruction and sampling are given in Extended Data Fig. 1). The predicted frequency is indicated by colour; clades destined for fixation are shown in red. Clade growth ( >1) is correctly predicted in 113 of 121 cases, clade decline in 51 of 67 cases. b, Yearly numbers of HA nucleotide fixations: predicted numbers are compared to posterior numbers. c, Dynamics on the influenza strain tree: for each clade originating between 1993 and 2010, the ancestor node is coloured according to the maximum of the predicted frequency changes, . Our model correctly predicts expansion along the trunk (thick line) and loss of branches off trunk.

  3. Vaccine selection.
    Figure 3: Vaccine selection.

    Optimal vaccine strains predicted by our model (diamonds) and actual vaccine strains used in the Northern Hemisphere27 (squares, listed in Supplementary Information) are compared to posterior cross-immunity centre-of-mass strains (bullets) for the winter seasons from 1994 to 2012. Model predictions are obtained by maximizing the predicted cross-immunity overlap between the vaccine strain and the circulating strains, which amounts to maximizing the predicted reduction of infections (see text and Methods). Insert: yearly epitope amino acid distances of the model-selected vaccine strain (diamonds) and the actual vaccine strain (squares, update years marked by filled squares) to the posterior cross-immunity centre-of-mass strain.

  4. Adaptation map of influenza.
    Figure 4: Adaptation map of influenza.

    The fitness flux , computed from the fitness model (2) and observed frequency changes, is shown for 234 clades on a tree between 2003 and 2008; see Methods for the definition and Extended Data Fig. 3a for an illustration of fitness flux. Top graph: strains within these clades are ordered by year and, within each year, by mutational distance to the last common ancestor. The mean cumulative fitness flux is shown as dashed line; see also Extended Data Fig. 3b. This map displays a travelling fitness flux wave. Bottom graph: the same map is shown with nonsynonymous epitope mutations marked by green triangles; these mutations are mostly beneficial7, 9, 10. This gives evidence of clonal interference: successful clades are driven to fixation by multiple beneficial mutations (large green triangles; origination and fixation of one such clade are marked by arrows), whereas other beneficial mutations are driven to loss (small green triangles).

  5. Statistical errors of predicted and posterior Wrightian fitness.
    Extended Data Fig. 1: Statistical errors of predicted and posterior Wrightian fitness.

    The frequency ratio plot ( , ) of Fig. 2a is shown together with the standard deviation of the predicted ratio in the ensemble of reconstructed trees (vertical bars) and the standard deviation of the posterior ratio due to sampling fluctuations of population frequencies (horizontal bars). See sections 1, 3 and 4 of Methods.

  6. Fitness predictions for human influenza A/H1N1.
    Extended Data Fig. 2: Fitness predictions for human influenza A/H1N1.

    a, Wrightian fitness: the predicted frequency ratio is plotted against the posterior ratio for 81 HA clades with initial frequency >0.1. To be compared with Fig. 2a. b, Dynamics of the influenza strain tree: for each clade, the ancestor node is coloured according to the maximum of the predicted frequency changes, . To be compared with Fig. 2c. The predictions are based on a sample of 2,136 unique HA1 genotypes observed between 1977 and 2009. We restrict predictions to years when this sample contains at least 12 unique HA1 strains in the winter seasons t and t + 1, which are the years 1990, 1995–1998 and 2005–2008 (see Methods, section 4). These predictions are statistically significant (P<10−18) but somewhat more noisy than for influenza A/H3N2 (clade growth is correctly predicted in 88% of the cases, decline in 63% of the cases). The reasons include a significantly smaller and more biased strain sample and a less comprehensive knowledge of antigenic epitope sites26. The prediction of influenza A/H1N1 evolution illustrates the broader applicability of our method and highlights the determinants of predictive power.

  7. Fitness flux in the evolution of influenza.
    Extended Data Fig. 3: Fitness flux in the evolution of influenza.

    a, Fitness flux measures adaptation (schematic, adapted from ref. 29). The cumulative flux Φ(t), as defined in equations (48) and (49), is an aggregate measure of fitness changes due to frequency changes in a population’s history up to a given clade ν at a given time t (shown by uphill arrows)28, 29, 58. Left: in a static fitness landscape F(x), the flux Φ(t) equals the fitness difference between the initial point and the final point of this history. Right: in a time-dependent fitness seascape F(x, t), the flux Φ(t) is still a typically positive, increasing function of time, even if the population fitness does not increase. b, Mean cumulative fitness flux Φ(t) as given by equation (48) for influenza from 1993 up to season t. The mean flux inferred from our fitness model (black line) shows a continuous increase. The flux for a null model with scrambled clade fitness values (grey lines) fluctuates around 0. Vertical bars indicate the root mean squared fitness (flux) in each year’s strain sample, , as given by equation (51).

  8. Strain tree with mutations.
    Extended Data Fig. 4: Strain tree with mutations.

    ac, Four classes of HA sequence mutations are mapped onto individual branches of the influenza strain tree: synonymous mutations (a, blue), nonsynonymous epitope mutations (b, green) and nonsynonymous non-epitope mutations (c, red). Each nonsynonymous mutation marks the origination of a clade in the population; each fixed clade (highlighted by bright colours) has an origination on the trunk of the tree (shown as thick line). The fixation probability, that is, the ratio of the number of fixations and the number of originations, is seen to be reduced for informative non-epitope changes and enhanced for nonsynonymous epitope changes compared to the baseline of synonymous changes; cf. Extended Data Fig. 5. The underlying tree (shown here from 1993 to 2012) is a sample from our ensemble of strain trees, which are constructed by maximum likelihood from the HA sequence of 3,944 strains (other equiprobable trees differ only in peripheral branches). The horizontal coordinate D of a node is its mutational distance from the root of the tree. The trunk of the tree (thick line) is the single lineage connecting past and future on timescales beyond the coalescence time.

  9. Selection on epitope and non-epitope changes.
    Extended Data Fig. 5: Selection on epitope and non-epitope changes.

    The frequency propagator ratio9 g(X), as defined in equation (7), is shown for several classes of nonsynonymous HA mutations: mutations in epitopes A–D (green bullets), mutations in epitope E (green circles), mutations in sialic receptor binding sites (green diamonds), informative non-epitope mutations (red bullets) and non-informative non-epitope mutations (red circles). Error bars indicate sampling uncertainties. Mutations in epitopes A–D, including those in epitopic receptor binding sites, reach values g(X)2.5 for large frequencies, signalling substantial positive selection. Mutations in epitope E are under weaker positive selection, with g(X) 1.5 for large frequencies. Informative non-epitope changes drop to g(X) = 0.6, signalling predominantly negative selection. Non-informative non-epitope changes evolve near the neutral baseline (g = 1, blue line), indicating weak or heterogeneous selection. See section 2 of Methods.

  10. Evolution of glycosylation.
    Extended Data Fig. 6: Evolution of glycosylation.

    a, Number of epitopic glycosylation sites, nep, in the influenza A/H3N2 strain sample between 1968 and 2012 (green lines): population mean value (thin line), root mean squared variation (error bars), and value for trunk lineages (thick line). The same data are shown for non-epitope glycosylation sites (red lines). Epitope sites show substantial changes with a net increase in number and substantial natural variation in some years, whereas non-epitope sites had only one fixation of a glycosylation site. b, Evolution of nep on the influenza strain tree between 1993 and 2012. Trunk strains show a rapid increase to nep = 5 between 1995 and 1997 and maintain this value in later years; the mean nep shows a slower increase between 1995 and 2001. Off-trunk clades drop below nep = 5 also in later years, and there are compensatory mutations back to nep = 5. The data suggest an adaptive increase of nep up to a saturation value nep = 5 after 1996. These observations inform the glycosylation fitness component (equation (22)), which is used to test the predictive value of glycosylation. See section 2 of Methods.

Tables

  1. Ranking of fitness models
    Extended Data Table 1: Ranking of fitness models

References

  1. Bao, Y. et al. The Influenza Virus Resource at the National Center for Biotechnology Information. J. Virol. 82, 596601 (2008)
  2. Wiley, D. C., Wilson, I. A. & Skehel, J. J. Structural identification of the antibody-binding sites of Hong Kong influenza haemagglutinin and their involvement in antigenic variation. Nature 289, 373378 (1981)
  3. Bush, R. M., Bender, C. A., Subbarao, K., Fox, N. J. & Fitch, W. M. Predicting the evolution of human influenza A. Science 286, 19211925 (1999)
  4. Plotkin, J. B., Dushoff, J. & Levin, S. A. Hemagglutinin sequence clusters and the antigenic evolution of influenza A virus. Proc. Natl Acad. Sci. USA 99, 62636268 (2002)
  5. Koelle, K., Cobey, S., Grenfell, B. & Pascual, M. Epochal evolution shapes the phylodynamics of interpandemic influenza A (H3N2) in humans. Science 314, 18981903 (2006)
  6. Wolf, Y. I., Viboud, C., Holmes, E. C., Koonin, E. V. & Lipman, D. J. Long intervals of stasis punctuated by bursts of positive selection in the seasonal evolution of influenza A virus. Biol. Direct 1, 34 (2006)
  7. Shih, A. C.-C., Hsiao, T.-C., Ho, M.-S. & Li, W.-H. Simultaneous amino acid substitutions at antigenic sites drive influenza A hemagglutinin evolution. Proc. Natl Acad. Sci. USA 104, 62836288 (2007)
  8. Bhatt, S., Holmes, E. C. & Pybus, O. G. The genomic rate of molecular adaptation of the human influenza A virus. Mol. Biol. Evol. 28, 24432451 (2011)
  9. Strelkowa, N. & Lässig, M. Clonal interference in the evolution of influenza. Genetics 192, 671682 (2012)
  10. Illingworth, C. J. R. & Mustonen, V. Components of selection in the evolution of the influence virus: linkage effects beat inherent selection. PLoS Pathog. 8, e1003091 (2012)
  11. Meyer, A. G., Dawson, E. T. & Wilke, C. O. Cross-species comparison of site-specific evolutionary-rate variation in influenza haemagglutinin. Phil. Trans. R. Soc. B 368, 20120334 (2013)
  12. Smith, D. J. et al. Mapping the antigenic and genetic evolution of Influenza virus. Science 305, 371376 (2004)
  13. Bloom, J. D. & Glassman, M. J. Inferring stabilizing mutations from protein phylogenies: application to influenza hemagglutinin. PLOS Comput. Biol. 5, e1000349 (2009)
  14. Wylie, C. S. & Shakhnovich, E. I. A biophysical protein folding model accounts for most mutational fitness effects in viruses. Proc. Natl Acad. Sci. USA 108, 99169921 (2011)
  15. Holmes, E. C. et al. Whole-genome analysis of human inuenza A virus reveals multiple persistent lineages and reassortment among recent H3N2 viruses. PLoS Biol. 3, e300 (2005)
  16. Grenfell, B. T. et al. Unifying the epidemiological and evolutionary dynamics of pathogens. Science 303, 327332 (2004)
  17. Rambaut, A. et al. The genomic and epidemiological dynamics of human influenza A virus. Nature 453, 615619 (2008)
  18. Russell, C. A. et al. The global circulation of seasonal influenza A (H3N2) viruses. Science 320, 340346 (2008)
  19. Gog, J. R. & Grenfell, B. T. Dynamics and selection of many-strain pathogens. Proc. Natl Acad. Sci. USA 99, 1720917214 (2002)
  20. Tria, F., Lässig, M., Peliti, L. & Franz, S. A minimal stochastic model for influenza evolution. J. Stat. Mech. P07008 (2005)
  21. Kryazhimskiy, S., Dieckmann, U., Levin, S. A. & Dushoff, J. On state-space reduction in multi-strain pathogen models, with an application to antigenic drift in influenza A. PLOS Comput. Biol. 3, e159 (2007)
  22. Minayev, P. & Ferguson, N. Improving the realism of deterministic multi-strain models: implications for modelling influenza A. J. R. Soc. Interface 6, 509518 (2009)
  23. Rasmussen, D. A., Ratmann, O. & Koelle, K. Inference for nonlinear epidemiological models using genealogies and time series. PLOS Comput. Biol. 7, e1002136 (2011)
  24. Kryazhimskiy, S., Dushoff, J., Bazykin, G. A. & Plotkin, J. B. Prevalence of epistasis in the evolution of influenza A surface proteins. PLoS Genet. 7, e1001301 (2011)
  25. Bogner, P. et al. A global initiative on sharing avian flu data. Nature 442, 981 (2006)
  26. Huang, J.-W., Lin, W.-F. & Yang, J.-M. Antigenic sites of H1N1 influenza virus hemagglutinin revealed by natural isolates and inhibition assays. Vaccine 30, 63276337 (2012)
  27. WHO Recommendations for Influenza Vaccine Composition. Retrieved from http://www.who.int/influenza/vaccines/vaccinerecommendations1/en
  28. Mustonen, V. & Lässig, M. From fitness landscapes to seascapes: non-equilibrium dynamics of selection and adaptation. Trends Genet. 25, 111119 (2009)
  29. Mustonen, V. & Lässig, M. Fitness flux and ubiquity of adaptive evolution. Proc. Natl Acad. Sci. USA 107, 42484253 (2010)
  30. Desai, M. M. & Fisher, D. S. Beneficial mutation selection balance and the effect of linkage on positive selection. Genetics 176, 17591798 (2007)
  31. Rouzine, I. M., Brunet, E. & Wilke, C. O. The traveling-wave approach to asexual evolution: Mullers ratchet and speed of adaptation. Theor. Popul. Biol. 73, 2446 (2008)
  32. Schiffels, S., Szollosi, G. J., Mustonen, V. & Lässig, M. Emergent neutrality in adaptive asexual evolution. Genetics 189, 13611375 (2011)
  33. Good, B. H., Rouzine, I. M., Balick, D. J., Hallatschek, O. & Desai, M. M. Distribution of fixed beneficial mutations and the rate of adaptation in asexual populations. Proc. Natl Acad. Sci. USA 109, 49504955 (2012)
  34. Neher, R. A. & Hallatschek, O. Genealogies of rapidly adapting populations. Proc. Natl Acad. Sci. USA 110, 437442 (2013)
  35. Gerrish, P. J. & Lenski, R. E. The fate of competing beneficial mutations in an asexual population. Genetica 102–103, 127144 (1998)
  36. Miralles, R., Gerrish, P. J., Moya, A. & Elena, S. F. Clonal interference and the evolution of RNA viruses. Science 285, 17451747 (1999)
  37. Ghedin, E. et al. Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution. Nature 437, 11621166 (2005)
  38. Bush, R. M., Smith, C. B., Cox, N. J. & Fitch, W. M. Effects of passage history and sampling bias on phylogenetic reconstruction of human influenza A evolution. Proc. Natl Acad. Sci. USA 97, 69746980 (2000)
  39. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010)
  40. Swofford, D. L. PAUP*: Phylogenetic Analysis Using likelihood (and Other Methods) 4.0 Beta (Sinauer Associates, 2002)
  41. Weis, W. et al. Structure of the influenza virus haemagglutinin complexed with its receptor, sialic acid. Nature 333, 426431 (1988)
  42. Wilson, I. A. & Cox, N. J. Structural basis of immune recognition of influenza virus hemagglutinin. Annu. Rev. Immunol. 8, 737787 (1990)
  43. Skehel, J. J. & Wiley, D. C. Receptor binding and membrane fusion in virus entry: the influenza hemagglutinin. Annu. Rev. Biochem. 69, 531569 (2000)
  44. Lin, Y. P. et al. Evolution of the receptor binding properties of the influenza A(H3N2) hemagglutinin. Proc. Natl Acad. Sci. USA 109, 2147421479 (2012)
  45. Andreasen, V., Lin, J. & Levin, S. A. The dynamics of cocirculating strains conferring partial cross-immunity. J. Math. Biol. 35, 825842 (1997)
  46. Gog, J. R. & Swinton, J. A status-based approach to multiple strain dynamics. J. Math. Biol. 44, 169184 (2002)
  47. Koelle, K., Khatri, P., Kamradt, M. & Kepler, T. A two-tiered model for simulating the ecological and evolutionary dynamics of rapidly evolving viruses, with an application to influenza. J. R. Soc. Interface 7, 12571274 (2010)
  48. Molinari, N. M. et al. The annual impact of seasonal influenza in the US: measuring disease burden and costs. Vaccine 25, 50865096 (2007)
  49. Myers, J. L. et al. Compensatory hemagglutinin mutations alter antigenic properties of influenza viruses. J. Virol. 87, 1116811172 (2013)
  50. Schulze, I. T. Effects of glycosylation on the properties and functions of influenza virus hemagglutinin. J. Infect. Dis. Aug. 176, (suppl. 1)S24S28 (1997)
  51. Blackburne, B. P., Hay, A. J. & Goldstein, R. A. Changing selective pressure during antigenic changes in human influenza H3. PLoS Pathog. 4, e1000058 (2008)
  52. Cui, J., Smith, T., Robbins, P. W. & Samuelson, J. Darwinian selection for sites of Asn-linked glycosylation in phylogenetically disparate eukaryotes and viruses. Proc. Natl Acad. Sci. USA 106, 1342113426 (2009)
  53. Zhang, M. et al. Tracking global patterns of N-linked glycosylation site variation in highly variable viral glycoproteins: HIV, SIV, and HCV envelopes and influenza hemagglutinin. Glycobiology 14, 12291246 (2004)
  54. Arinaminpathy, N. & Grenfell, B. Dynamics of glycoprotein charge in the evolutionary history of human influenza. PLoS ONE 5, e15674 (2010)
  55. Ampofo, W. K. et al. Improving influenza vaccine virus selection. Report of a WHO informal consultation held at WHO headquarters, Geneva, Switzerland, 14–16 June 2010. Influenza Other Respir. Viruses 6, 147152 (2010)
  56. Osterholm, M. T., Kelley, N. S., Sommer, A. & Belongia, E. A. Efficacy and effectiveness of influenza vaccines: a systematic review and meta-analysis. Lancet Infect. Dis. 12, 3644 (2012)
  57. Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P. Numerical Recipes: the Art of Scientific Computing 3rd edn (Cambridge Univ. Press, 2007)
  58. Mustonen, V. & Lässig, M. Adaptations to fluctuating selection in Drosophila. Proc. Natl Acad. Sci. USA 104, 22772282 (2007)
  59. Tsimring, L. S., Levine, H. & Kessler, D. A. RNA virus evolution via a fitness-space model. Phys. Rev. Lett. 76, 44404443 (1996)

Download references

Author information

Affiliations

  1. Institute for Theoretical Physics, University of Cologne, Zülpicher Strasse 77, 50937 Köln, Germany

    • Marta Łuksza &
    • Michael Lässig
  2. Biological Sciences, Columbia University, 607D Fairchild Center, New York, New York 10027, USA

    • Marta Łuksza

Contributions

Both authors designed research, developed methods, analysed data, interpreted results and wrote the paper.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: Statistical errors of predicted and posterior Wrightian fitness. (133 KB)

    The frequency ratio plot ( , ) of Fig. 2a is shown together with the standard deviation of the predicted ratio in the ensemble of reconstructed trees (vertical bars) and the standard deviation of the posterior ratio due to sampling fluctuations of population frequencies (horizontal bars). See sections 1, 3 and 4 of Methods.

  2. Extended Data Figure 2: Fitness predictions for human influenza A/H1N1. (221 KB)

    a, Wrightian fitness: the predicted frequency ratio is plotted against the posterior ratio for 81 HA clades with initial frequency >0.1. To be compared with Fig. 2a. b, Dynamics of the influenza strain tree: for each clade, the ancestor node is coloured according to the maximum of the predicted frequency changes, . To be compared with Fig. 2c. The predictions are based on a sample of 2,136 unique HA1 genotypes observed between 1977 and 2009. We restrict predictions to years when this sample contains at least 12 unique HA1 strains in the winter seasons t and t + 1, which are the years 1990, 1995–1998 and 2005–2008 (see Methods, section 4). These predictions are statistically significant (P<10−18) but somewhat more noisy than for influenza A/H3N2 (clade growth is correctly predicted in 88% of the cases, decline in 63% of the cases). The reasons include a significantly smaller and more biased strain sample and a less comprehensive knowledge of antigenic epitope sites26. The prediction of influenza A/H1N1 evolution illustrates the broader applicability of our method and highlights the determinants of predictive power.

  3. Extended Data Figure 3: Fitness flux in the evolution of influenza. (154 KB)

    a, Fitness flux measures adaptation (schematic, adapted from ref. 29). The cumulative flux Φ(t), as defined in equations (48) and (49), is an aggregate measure of fitness changes due to frequency changes in a population’s history up to a given clade ν at a given time t (shown by uphill arrows)28, 29, 58. Left: in a static fitness landscape F(x), the flux Φ(t) equals the fitness difference between the initial point and the final point of this history. Right: in a time-dependent fitness seascape F(x, t), the flux Φ(t) is still a typically positive, increasing function of time, even if the population fitness does not increase. b, Mean cumulative fitness flux Φ(t) as given by equation (48) for influenza from 1993 up to season t. The mean flux inferred from our fitness model (black line) shows a continuous increase. The flux for a null model with scrambled clade fitness values (grey lines) fluctuates around 0. Vertical bars indicate the root mean squared fitness (flux) in each year’s strain sample, , as given by equation (51).

  4. Extended Data Figure 4: Strain tree with mutations. (313 KB)

    ac, Four classes of HA sequence mutations are mapped onto individual branches of the influenza strain tree: synonymous mutations (a, blue), nonsynonymous epitope mutations (b, green) and nonsynonymous non-epitope mutations (c, red). Each nonsynonymous mutation marks the origination of a clade in the population; each fixed clade (highlighted by bright colours) has an origination on the trunk of the tree (shown as thick line). The fixation probability, that is, the ratio of the number of fixations and the number of originations, is seen to be reduced for informative non-epitope changes and enhanced for nonsynonymous epitope changes compared to the baseline of synonymous changes; cf. Extended Data Fig. 5. The underlying tree (shown here from 1993 to 2012) is a sample from our ensemble of strain trees, which are constructed by maximum likelihood from the HA sequence of 3,944 strains (other equiprobable trees differ only in peripheral branches). The horizontal coordinate D of a node is its mutational distance from the root of the tree. The trunk of the tree (thick line) is the single lineage connecting past and future on timescales beyond the coalescence time.

  5. Extended Data Figure 5: Selection on epitope and non-epitope changes. (245 KB)

    The frequency propagator ratio9 g(X), as defined in equation (7), is shown for several classes of nonsynonymous HA mutations: mutations in epitopes A–D (green bullets), mutations in epitope E (green circles), mutations in sialic receptor binding sites (green diamonds), informative non-epitope mutations (red bullets) and non-informative non-epitope mutations (red circles). Error bars indicate sampling uncertainties. Mutations in epitopes A–D, including those in epitopic receptor binding sites, reach values g(X)2.5 for large frequencies, signalling substantial positive selection. Mutations in epitope E are under weaker positive selection, with g(X) 1.5 for large frequencies. Informative non-epitope changes drop to g(X) = 0.6, signalling predominantly negative selection. Non-informative non-epitope changes evolve near the neutral baseline (g = 1, blue line), indicating weak or heterogeneous selection. See section 2 of Methods.

  6. Extended Data Figure 6: Evolution of glycosylation. (271 KB)

    a, Number of epitopic glycosylation sites, nep, in the influenza A/H3N2 strain sample between 1968 and 2012 (green lines): population mean value (thin line), root mean squared variation (error bars), and value for trunk lineages (thick line). The same data are shown for non-epitope glycosylation sites (red lines). Epitope sites show substantial changes with a net increase in number and substantial natural variation in some years, whereas non-epitope sites had only one fixation of a glycosylation site. b, Evolution of nep on the influenza strain tree between 1993 and 2012. Trunk strains show a rapid increase to nep = 5 between 1995 and 1997 and maintain this value in later years; the mean nep shows a slower increase between 1995 and 2001. Off-trunk clades drop below nep = 5 also in later years, and there are compensatory mutations back to nep = 5. The data suggest an adaptive increase of nep up to a saturation value nep = 5 after 1996. These observations inform the glycosylation fitness component (equation (22)), which is used to test the predictive value of glycosylation. See section 2 of Methods.

Extended Data Tables

  1. Extended Data Table 1: Ranking of fitness models (173 KB)

Supplementary information

Text files

  1. Supplementary Data (80 KB)

    This file contains the GenBank and Gisaid accession numbers of the influenza strains used in the study.

Additional data