Abstract

Understanding the spatiotemporal patterns of emergence and circulation of new human seasonal influenza virus variants is a key scientific and public health challenge. The global circulation patterns of influenza A/H3N2 viruses are well characterized1,2,3,4,5,6,7, but the patterns of A/H1N1 and B viruses have remained largely unexplored. Here we show that the global circulation patterns of A/H1N1 (up to 2009), B/Victoria, and B/Yamagata viruses differ substantially from those of A/H3N2 viruses, on the basis of analyses of 9,604 haemagglutinin sequences of human seasonal influenza viruses from 2000 to 2012. Whereas genetic variants of A/H3N2 viruses did not persist locally between epidemics and were reseeded from East and Southeast Asia, genetic variants of A/H1N1 and B viruses persisted across several seasons and exhibited complex global dynamics with East and Southeast Asia playing a limited role in disseminating new variants. The less frequent global movement of influenza A/H1N1 and B viruses coincided with slower rates of antigenic evolution, lower ages of infection, and smaller, less frequent epidemics compared to A/H3N2 viruses. Detailed epidemic models support differences in age of infection, combined with the less frequent travel of children, as probable drivers of the differences in the patterns of global circulation, suggesting a complex interaction between virus evolution, epidemiology, and human behaviour.

Main

Owing to the frequency and severity of human seasonal influenza A/H3N2 virus epidemics, recent work has focused on the global circulation dynamics of H3N2 viruses1,2,3,4,5,6,7. Studies have shown that, each year, H3N2 epidemics worldwide result from the introduction of new genetic variants from East and Southeast (E-SE) Asia, where viruses circulate via a network of temporally overlapping epidemics1,2,4,5, rather than local persistence1,3,6,7. In addition to H3N2, H1N1 viruses and two antigenically diverged lineages of influenza B viruses, B/Victoria/2/1987-like (Vic) and B/Yamagata/16/1988-like (Yam), circulate among humans with lower but substantial disease burdens8,9. Despite their importance, the global circulation dynamics of former seasonal H1N1 viruses (preceding the 2009 pandemic) and B viruses have been largely neglected.

Given that influenza A and B viruses cause similar symptoms and evolve by similar mechanisms of immune escape, we hypothesized that each would follow similar patterns of global circulation, with new genetic variants originating in East and Southeast Asia that rapidly replace existing genetic variants. To test this hypothesis we compared the global circulation patterns of the haemagglutinin (HA) genes of H3N2, former seasonal H1N1, Vic, and Yam viruses. We assembled data sets of HA sequences with complete HA1 domains for each subtype from the World Health Organization Global Influenza Surveillance and Response System and the Influenza Research Database10, covering 2000–2012. To reduce the impact of surveillance biases, we subsampled these data to more equitable spatiotemporal distributions, resulting in data sets comprising 4,006 H3N2, 2,144 H1N1, 1,999 Vic, and 1,455 Yam HA sequences (Extended Data Fig. 1). Although deficient in viruses from Africa and Eastern Europe, to our knowledge these are the most geographically and temporally comprehensive seasonal influenza virus data sets assembled to date.

By estimating temporally resolved phylogenetic trees for each subtype, we revealed faster rates of nucleotide mutation and amino acid substitution in H3N2 and H1N1 than in the B viruses (consistent with previous work11,12), but more genealogical diversity in B viruses than in A viruses (Extended Data Table 1). This inverse relationship between evolutionary rate and genealogical diversity is expected if increased mutation rate correlates with antigenic drift13 and drives increased adaptive evolution, thus purging HA genetic diversity14. By inferring geographic ancestry using Bayesian phylogeographic methods15, we found a consistent pattern for H3N2 viruses (Fig. 1a) in which viruses worldwide rapidly coalesce to the trunk of the tree (average time to trunk = 1.42 years), with trunk viruses mostly originating from East and Southeast Asia (Extended Data Fig. 2a). This finding is consistent with previously reported patterns1,2,4,5, with East and Southeast Asia acting as the source population for epidemics worldwide.

Figure 1: Maximum clade credibility trees.
Figure 1

ad, Trees created with primary data sets of 4,006 H3N2 viruses (a), 2,144 H1N1 viruses (b), 1,999 Vic viruses (c) and 1,455 Yam viruses (d). Branch tips are coloured by geographic region of virus collection; internal branches are coloured by geographic region as inferred by Bayesian phylogeographic methods (region colours in persistence insets). In b, nodes 1–3 indicate co-circulating clades that diverged in 2004. In c, nodes 1 and 2 indicate divergent clades of viruses from Asia, coloured vertical bars indicate antigenic variants shown in Extended Data Fig. 5a (green, B/Malaysia/2506/2004-like; red, B/Hubei Songzi/52/2008-like; other post-2008 viruses, B/Brisbane/60/2008-like). The inset to the top left of each tree shows duration of region-specific persistence measured as the waiting time in years for a virus to leave its geographic region of origin. Circles represent mean persistence across sampled viruses, while lines show the interquartile range of persistence across sampled viruses. Region ‘China’ shows the combined persistence estimate for North China and South China together.

In addition to China and Southeast Asia, India frequently contributed viruses to the trunk of the tree, suggesting that the global circulation of H3N2 viruses is maintained by an East and Southeast Asian network that includes India. India’s role in the global dissemination of H3N2 viruses may have been similar historically, but India-wide influenza surveillance only began in 2004. There were brief periods, notably the 2007–2008 Northern Hemisphere winter, when regions outside East and Southeast Asia contributed to the trunk of the H3N2 tree. However, these instances were rare and trunk viruses from outside East and Southeast Asia descended directly from viruses within East and Southeast Asia (Fig. 1a). Quantifying the average ancestry of strains from each geographic region in the 3 years before sampling showed prominent roles for China, India, and Southeast Asia in seeding epidemics in all regions (Extended Data Fig. 3).

Surprisingly, the global circulation patterns of former seasonal H1N1 viruses differed substantially from those observed for H3N2 viruses (Fig. 1). Like H3N2, most lineages of H1N1 viruses eventually coalesced with viruses from East and Southeast Asia and India. However, this coalescence was slower than for H3N2 viruses with prolonged co-circulation of geographically segregated H1N1 lineages (Fig. 1b, Extended Data Figs 3 and 4). Geographic segregation of H1N1 viruses was particularly pronounced beginning in 2004/2005, with the emergence of three co-circulating genetic lineages (Fig. 1b, nodes 1–3) that each independently acquired HA mutations leading to antigenic evolution from the A/New Caledonia/20/1999-like phenotype to the A/Solomon Islands/3/2006-like phenotype. These lineages circulated in Southeast Asia (node 1), China (node 2) and India (node 3), with the Indian lineage eventually spreading worldwide before the emergence of H1N1pdm09 viruses.

Phylogeographic analyses of B Vic and Yam viruses revealed further differences from H3N2 viruses with lineages frequently circulating outside of East and Southeast Asia for several years without evidence of seeding from East and Southeast Asia (Fig. 1c, d). Prominent examples include the seeding of the North American 2006/2007 Vic season directly from 2005/2006 North American viruses and the seeding of the North American 2001/2002 Yam season directly from 2000/2001 North American viruses (Extended Data Fig. 4). Similarly, lineages of viruses within East and Southeast Asia commonly circulated exclusively in East and Southeast Asia for more than 1 year. These long circulating East and Southeast Asian lineages were most apparent for Vic viruses where two lineages (Fig. 1c, nodes 1 and 2) persisted independently in China and Southeast Asia for over 5 years without spreading to other regions and led to the co-circulation of three distinct Vic antigenic variants in different parts of the world during 2007–2008 (Extended Data Fig. 5a).

Patterns of persistence of genetic variants differed by subtype and region, with H3N2 viruses persisting regionally for an average of 6 months, H1N1 for 9 months, Vic for 13 months and Yam for 12 months. H3N2 viruses showed comparably short durations of persistence across the world (Fig. 1), with the exceptions of India and China. Patterns within China were characterized by North and South lineages contributing jointly to persistence, as combining North and South phylogeny nodes resulted in substantially greater persistence estimates than from North or South lineages alone (Fig. 1). For H3N2, evidence for joint contributions to persistence by region pairs that exclude China is comparatively weak (Extended Data Fig. 6a, Supplementary Information). For Vic and Yam, the mean duration of persistence was longer than for H3N2 or H1N1 in most regions, particularly in India and China where mean durations were >2 years (Fig. 1, Extended Data Fig. 4). Duration of regional persistence correlated with the proportion of virus originating from that region (Extended Data Fig. 6b) and observed phylogeographic patterns were robust to subsampling assumptions (Supplementary Information, Extended Data Table 2).

To investigate differences in the global migration patterns of H3N2, H1N1 and B viruses, we used the spatiotemporally resolved phylogenies to estimate the amounts of virus movement between regions (Fig. 2). Rates of movement between pairs of regions were highly correlated between viruses with Spearman correlation coefficients ranging from 0.65 (H3N2 vs Yam) to 0.75 (H3N2 vs H1N1), suggesting similar global connectivity networks for all viruses. However, while the overall structure of the migration network was similar, H3N2 viruses moved between regions more frequently than H1N1 and B viruses (migration events per lineage per year H3N2 = 1.96, H1N1 = 1.27, Vic = 0.93, Yam = 0.97, Extended Data Table 1).

Figure 2: Estimates of mean pairwise virus migration rate.
Figure 2

Line thickness between regions indicates average number of migration events per lineage per year. Arrowhead size indicates the strength of directionality of migration. For clarity, only arrows corresponding to migration rates greater than 0.25 events per lineage per year are shown. Circle area indicates the global proportion of ancestry deriving from each region.

We hypothesized a relationship between rates of global movement and rates of antigenic drift: although rates of genetic evolution were similar for H3N2 and H1N1 viruses, both H1N1 and B viruses evolved antigenically more slowly than H3N2 viruses13 (Extended Data Table 1). We also hypothesized that lower rates of immune escape for B and H1N1 compared with H3N2 would lead to younger average ages of infection, as children increasingly comprise the largest pool of susceptible individuals, and smaller, less frequent epidemics owing to smaller populations of susceptible individuals13. These differences are consistent with results from several community-based cohort studies that found that children were more frequently infected with B viruses than adults8,16,17. Age of infection data covering 2002–2011 from Australia show that H1N1 and B viruses infect younger individuals than H3N2 viruses (Extended Data Fig. 5b–d, median age of infection H3N2 = 30 years, H1N1 = 20 years, B = 16 years) and epidemiological data from Australia and the United States show reduced size and frequency of H1N1 and B epidemics compared to H3N2 (Extended Data Fig. 5f–i).

Differences in age of infection may explain differences in global circulation as children travel long distances much less frequently than adults (Extended Data Fig. 5e). A previous study hypothesized that age-specific patterns of infection could lead to differences in contact rates and the spread of influenza types within the United States over the course of a single season18. Here, we hypothesized that differential global air travel provides a plausible mechanism by which H1N1 and B viruses show increased genetic differentiation and reduced rates of global migration across multiple seasons, compared to H3N2 viruses.

To test the impact of differences in age distribution of infection on global patterns of virus movement, we constructed a multi-patch transmission model. We modelled two scenarios for host movement: (1) age-independent mixing between patches; (2) age-stratified mixing with host movement derived from air travel passenger age data (Extended Data Fig. 5e). In the age-independent scenario, model parameters only differed in rate of antigenic mutation, leading to differences in observed rates of antigenic drift among viruses and hence epidemic size and frequency (Extended Data Fig. 7). Faster antigenic drift resulted in greater incidence and more adult infections (Fig. 3a, b), but only modest differences in virus lineage movement (Fig. 3c, B-like viruses differ from H3-like viruses by a factor of 1.2), consistent with slightly faster spread of antigenically novel strains. However, age-stratified mixing between patches intensified the effect of antigenic drift on migration rate and created differences in rates of movement between patches more consistent with those observed for H3N2 vs H1N1 and B (Fig. 3c, B-like viruses differ from H3-like by a factor of 1.6). In the scenario with faster antigenic drift, infections were more mobile owing to greater frequency of adult infection, causing a knock-on effect on rates of viral movement. The model also suggests that the differences in patterns of regional persistence observed in the phylogenies might be shaped by a combination of differences in rates of antigenic evolution and variation in amplitude of epidemic seasonality, with slowly evolving viruses persisting longer than rapidly evolving viruses at low amplitudes of seasonal forcing (Extended Data Fig. 8a, Supplementary Information).

Figure 3: Relationship of antigenic drift to incidence (a), proportion of childhood infections (b), and geographic migration rate (c), in a multi-strain multi-region model of influenza transmission.
Figure 3

Black points represent outcomes from a model in which children and adults travel between regions at equal rates. Red points represent outcomes from a model in which adults travel between regions at 5.26× the rate of children (Extended Data Fig. 5e). Solid black and red lines represent LOESS fits to the data. With 2 travel scenarios, 7 mutation rates and 8 replicates, there are 112 individual stochastic simulations (Extended Data Fig. 7). Antigenic drift was measured in cartographic units13 per year (see Methods). In a, attack rate was measured as proportion of the total population infected yearly. In c, migration rate was measured in terms of migration events per lineage per year.

In the model, varying transmission rate rather than antigenic mutation rate also resulted in differences in the observed rate of antigenic drift, with higher transmission resulting in faster drift (Extended Data Fig. 8b). The relationship between antigenic drift rate and migration rate is similar, regardless of whether drift is modulated by mutation rate or transmission rate (Extended Data Fig. 8b). This finding is in line with theoretical work showing that epidemiological processes can influence influenza virus evolution19,20. However, there are important virological differences between influenza viruses that are likely to affect the efficiency and tempo at which antigenic variation is generated and fixed, which could in turn affect epidemiology21,22,23,24 (Supplementary Information).

Regardless of the underlying drivers, there is a remarkable correspondence in model behaviour, quantified as a stable relationship between observable rate of antigenic drift and global circulation patterns. The patterns of epidemic spread observed here suggest that differences in ages of infection could explain patterns of global circulation across a variety of human viruses.

Methods

Sequence data

Haemagglutinin (HA) coding sequences for influenza A H3N2 viruses, former seasonal H1N1 viruses (preceding the 2009 pandemic), and influenza B virus lineages Victoria (Vic) and Yamagata (Yam) collected by the World Health Organization (WHO) Global Influenza Surveillance and Response Network including the National Institute of Virology, Pune, India between 2000 and 2012 were combined with human seasonal influenza virus sequences (minimum length = 984 base pairs) covering 2000 to 2012 from the Influenza Research Database10. After removing duplicate strains and strains overly divergent based on root-to-tip distances, the data set contained 9,139 H3N2 sequences, 3,789 H1N1 sequences, 2,577 Vic sequences and 1,821 Yam sequences. Sampling locations for these sequences were parsed from strain names. Sequences were grouped into 9 geographic regions: USA/Canada, South America, Europe, India, North China, South China, Japan/Korea, Southeast Asia and Oceania. Specifics of this partitioning are shown in Extended Data Fig. 1. Groups were chosen to maximize available sequences within each region while still providing enough geographic diversity to ensure nearly global coverage. Sequences from Africa, Central America, the Middle East and Russia were excluded because of a lack of sufficient numbers of sequences to provide comparable estimates to other regions.

In the raw sequence data, some regions, such as the USA, were over-represented. Additionally, more recent years were over-represented compared to years at the start of the study period. In order to control for these sampling biases, we subsampled the raw data randomly by location and time to create a more equitable spatiotemporal distribution. The USA had consistently more sequences available every year from 2000 to 2012, thus in order to maintain similar total numbers of sequences for each region across the entire study period it was necessary to sample fewer sequences per year from the USA. We selected 50 sequences per region per year (40 for USA/Canada) for H3N2 and 80 sequences per region per year (45 for USA/Canada) for H1N1, Vic and Yam. This subsampling resulted in largely similar sequence counts across years and across regions for each virus, but overall more H3N2 sequences than H1N1 or B sequences, with 4,006 H3N2 sequences, 2,144 H1N1 sequences, 1,999 Vic sequences and 1,455 Yam sequences (Extended Data Fig. 1). When selecting subsampled sequences we first selected sequences with full day-month-year collection dates and then longer sequences over sequences with less precise dates or shorter sequences. HA sequence data for 1,630 H3N2 isolates, 1,600 H1N1 isolates, 1,394 Vic isolates and 881 Yam isolates have been deposited in the Influenza Research Database10 and accession numbers for all sequences used provided as Supplementary Information.

Phylogeographic inference

Time-resolved phylogenetic trees were estimated for H3N2, H1N1, Vic and Yam using BEAST v1.8.125 and incorporated the SRD06 nucleotide substitution model26, a coalescent demographic model with constant effective population size and a strict molecular clock across branches. A strict molecular clock was chosen based on finding strong correlations between date of sampling and evolutionary distance in all data sets, as estimated by Path-O-Gen v1.4 (http://tree.bio.ed.ac.uk/software/pathogen/). Using a strict clock also reduced the risk of model over-parameterization (for example, for the complete H3N2 data set with a relaxed clock, there would be 2 × 4,006 – 2 = 8,010 branch-specific rates). Samples with imprecise dates (known only to the month or to the year) had their dates of sampling estimated assuming a uniform prior within the known temporal bounds27. Markov Chain Monte Carlo (MCMC) was run for 600 million steps and trees were sampled every 5 million steps after allowing a burn-in of 100 million steps, yielding a total sample of 100 trees for H1N1, Vic and Yam. With significantly more samples, H3N2 required a longer chain to converge. Here, MCMC was run in parallel for 2 chains, each with 650 million steps sampled every 3 million steps with a burn-in of 500 million steps and samples across chains combined, yielding a total of 100 sampled trees. These trees were treated as independent draws from the posterior space of trees when subsequently used in the robust counting and phylogeographic analyses28. Evolutionary rates in Extended Data Table 1 were estimated using the ‘renaissance’ counting methods of Lemey et al.29.

Phylogeographic patterns were estimated using a discrete-state continuous time Markov chain (CTMC) model, in which transition rates were estimated between each pair of regions15. We assumed a non-reversible transition model30 consisting of 72 separate rate parameters, each with a Bayesian stochastic search variable selection (BSSVS) indicator variable, and a separate overall rate of geographic transition. We assumed an exponential prior with mean of 1 for each transition rate, a negative binomial prior with mean of 9 and standard deviation of 9 for the total number of non-zero rates and an exponential prior with mean of 1 migration event per lineage per year for the overall geographic transition rate. MCMC was run for 12 million steps with a burn-in of 2 million steps, and parameters sampled every 10,000 steps and trees sampled every 100,000 steps, yielding a total sample of 1,000 parameter states and 100 trees on which estimates were based. Pairwise migration rate estimates had an effective sample size (ESS) of 350 at the minimum and most had ESS greater than 500.

This procedure yielded posterior trees with the geographic states of internal nodes resolved. We analysed these posterior trees using the program PACT v0.9.5 (https://github.com/trvrb/PACT) to compute the following summary statistics: (1) genealogical diversity14, measuring the average time it takes for two randomly chosen contemporaneous lineages to coalesce, (2) time to the most recent common ancestor (TMRCA)14, measuring the average time it takes for all contemporaneous lineages to find a common ancestor, (3) genealogical FST, measuring the degree of population structure in contemporaneous lineages calculated as FST = (πbπw)/πb, where πw is genealogical diversity between randomly sampled lineages from the same geographic region and πb is genealogical diversity between randomly sampled lineages from different geographic regions, (4) persistence, measuring the average number of years for a tip to leave its sampled location, walking backwards up the phylogeny, (5) migration rate, measuring the average number of migration events over the phylogeny divided by total tree length to give migration events per lineage per year, (6) trunk location through time4, measuring the posterior distribution across sampled phylogenies of the trunk geographic state, where the trunk is defined as all branches ancestral to viruses sampled within 1 year of the most recent sample, (7) region-specific ancestral geographic history, measuring the distribution of geographic locations of tips belonging to a particular region traced backwards in time through the phylogeny averaged across sampled phylogenies. Statistics (1), (2), (3), (6), and (7) were calculated across 0.1 year genealogical windows. These procedures gave an estimate of credible intervals for inferred ancestral locations across posterior phylogeographic reconstructions.

Code and data availability

Sequence data has been deposited with the Influenza Research Database10 and accession numbers provided as Supplementary Data. The entire bioinformatic pipeline, including data subsampling, preparing XML files for BEAST, setting up PACT analyses and rendering figures is available at https://github.com/blab/global-migration. Analysis and data files are archived on the Dryad Digital Repository under DOI http://dx.doi.org/10.5061/dryad.pc641.

Surveillance, travel and age-structure data

We investigated epidemic size and frequency using virological isolation data between 2000 and 2012 collected by the WHO Collaborating Centre for Reference and Research on Influenza at the Victorian Infectious Diseases Reference Laboratory (VIDRL), Melbourne, Australia and the Centers for Disease Control and Prevention, Atlanta, USA (Extended Data Fig. 5f–i). These isolations were categorized by date of sampling and by virus type: H3N2, H1N1, Vic, or Yam. The data from VIDRL also contained information on patient age. The age structure of incidence was estimated by constructing a distribution of age of infection from individuals > 5 years (owing to the overrepresentation of < 5 year old patients for all subtypes) (Extended Data Fig. 5b–d). Median age of infection was 30 years (H3N2), 20 years (H1N1) and 16 years (B) and mean age of infection was 33.9 years (H3N2), 23.1 years (H1N1) and 23.2 years (B). Median age of infection was significantly different for H3N2 vs H1N1 (P = 4.6 × 10−29, Mann–Whitney U test), H3N2 vs B (P = 1.2 × 10−62) and H1N1 vs B (P = 0.041). The patient age data from VIDRL were potentially biased by testing strategy and the generally higher severity of H3N2 virus infections. Children and working age adults were more likely to be tested than the elderly but the greater severity of H3N2 virus infections might spread and flatten the patient age distribution. For this reason we additionally tested excluding individuals > 65 years and recalculating summary statistics, finding median ages of infection of 27 years (H3N2), 19 years (H1N1) and 15 years (B) and mean age of infection as 28.0 years (H3N2), 22.2 years (H1N1) and 20.3 years (B). We classified children as 0–15 years and adults as 16 years and older, and estimated proportion of childhood infections as 30% (H3N2), 52% (H1N1) and 60% (B). There are potentially other biases specific to individual sentinel physicians and hospitals that could affect sample collection. However, the estimate derived from the VIDRL data that 60% of influenza B virus infections occur in children is consistent with other estimates (reviewed in Glezen et al.8). Other studies similarly corroborate the estimates of lower age of infection for H1N1 viruses as compared to H3N231,32.

Additionally, we analysed the distribution of ages of 102.5 million air passengers travelling through London Heathrow and London Gatwick airports in 2011 (Extended Data Fig. 5E) reported by Civil Aviation Authority of the UK (http://www.caa.co.uk/docs/81/2011CAAPaxSurveyReport.pdf). Assuming that children of ages 0 to 15 make up 17% of the UK population (Office of National Statistics), this distribution suggests that children engage in air travel at 19% the rate of adults.

For the modelling described below, we estimated age-structured contact rates following the empirical mixing data provided by Mossong et al.33. These contact matrices were previously validated in modelling pertussis epidemiology34. We simplified the Mossong et al. mixing matrices to record child-to-child contacts, child-to-adult contacts, adult-to-child contacts and adult-to-adult contacts, where children were defined to be 0 to 15 and adults to be 16 or over. This resulted in the following mixing matrixwhere rates are relative to child-to-child contact rates.

Epidemiological modelling

An individual-based model of influenza evolution and epidemiology was constructed following methods presented in Bedford et al.35. The model used here is identical to Bedford et al. except where specified below. The present implementation used a linear-strain space36,37, in which virus phenotype is represented by a continuous variable and cross-immunity between viruses is a function of distance between viruses in this space. We parameterized the model to compare scenarios of age-structured mixing between regions and to compare viruses with different rates of antigenic drift.

The model was simulated for 120 years with daily time steps and the first 100 years discarded to allow equilibrium to be reached. We modelled a metapopulation with individuals equally divided into three regions (North, Tropics, South). Individual’s ages were tracked throughout the simulation and those less than 16 years old were classified as children and those 16 or older were classified as adults. Transmission occurred by mass action, with transmission rates modified by regional compartment and by age compartment. Thus, for example, the force of infection into children in the Tropics followedwhere βj is the seasonally forced contact rate in region j, αac represents adult-to-child transmission, mi represents between-region transmission in age class i, Iij represents the number of persons infected in age class i in region j, Sij represents the number of susceptible persons in age class i in region j, and Nj represents the total number of hosts in region j. The northern and southern regions were seasonally forced in opposite phase with a sinusoidal function following ε, while the tropics had no seasonal forcing.

Each virus possessed a one-dimensional antigenic phenotype ϕν and after recovery a host ‘remembered’ its infecting phenotype. For each contact event, the Euclidean distance from infecting phenotype ϕν was calculated to each of the phenotypes in the host immune history . Here, one unit of antigenic distance was designed to roughly correspond to a twofold dilution of antiserum in a haemagglutination inhibition (HI) assay38. The probability ρ that infection occurred after exposure was proportional to the distance d to the closest phenotype in the host immune history, following ρ = min{d s, 1}. Each day there was a chance μ that an infection mutates to a new phenotype. This mutation rate represents a phenotypic rate, rather than genetic mutation rate, and can be thought of as arising from multiple genetic sources. When a mutation occurred, the virus’s phenotype was moved either left or right randomly and mutation size sampled from an exponential distribution with mean step size σavg. Epidemiological parameters for the baseline epidemiological scenario with notation following Bedford et al.35 were:

˙ Base transmission rate β = 0.88 per day

˙ Duration of infection 1/ν = 5 days

˙ Birth/death rate = 1/50 years

˙ Total population size N = 45 million

˙ Seasonal forcing in north and south ε = 0.15

˙ Antigenic scaling s = 0.07

˙ Antigenic mutation rate μ = 0.5 to 6.5 × 10−4 per day

˙ Average mutation size σavg = 0.3 units

˙ Child-to-child transmission αcc = 1.00

˙ Child-to-adult transmission αca = 0.21

˙ Adult-to-child transmission αac = 0.21

˙ Adult-to-adult transmission αaa = 0.26

˙ Child between-region transmission mc = 0.0020

˙ Adult between-region transmission ma = 0.0020

In the model with age-stratified mixing with host movement derived from air travel passenger age data, child between-region transmission mc was 0.0011 and adult between-region transmission ma was 0.0060.

In the course of the simulation, the underlying infection history of who infects whom was recorded and output as a complete infection tree. Without ample within-host diversity owing to chronic infection, the complete infection tree also generated a fully observed phylogenetic tree. Examining geographic location across the phylogenetic tree allowed us to directly calculate migration rate as total migration events observed (transitions from one region to another) divided by total opportunity (tree length).

The simulation was parameterized to model H3-like, H1-like and B-like behaviour (Extended Data Fig. 7) by modulating antigenic mutation rate μ in the primary analysis (Fig. 3) or transmission rate β as a secondary analysis (Extended Data Fig. 8b). Values for μ and β were chosen based on observed attack rate, proportion of childhood infections, and antigenic drift rate.

Source code for the simulation is available at https://github.com/trvrb/antigen/tree/global-migration and parameter and results files are available at https://github.com/blab/global-migration/tree/master/model.

References

  1. 1.

    et al. The global circulation of seasonal influenza A (H3N2) viruses. Science 320, 340–346 (2008).

  2. 2.

    et al. Unifying viral genetics and human transportation data to predict the global transmission dynamics of human influenza H3N2. PLoS Pathog. 10, e1003932 (2014).

  3. 3.

    et al. The genomic and epidemiological dynamics of human influenza A virus. Nature 453, 615–619 (2008).

  4. 4.

    , , & Global migration dynamics underlie evolution and persistence of human influenza A (H3N2). PLoS Pathog. 6, e1000918 (2010).

  5. 5.

    , & Network analysis of global influenza spread. PLOS Comput. Biol. 6, e1001005 (2010).

  6. 6.

    , , , & Phylogenetic analysis reveals the global migration of seasonal influenza A viruses. PLoS Pathog. 3, e131 (2007).

  7. 7.

    et al. Stochastic processes are key determinants of short-term evolution in influenza a virus. PLoS Pathog. 2, e125 (2006).

  8. 8.

    , , , & The burden of influenza B: a structured literature review. Am. J. Public Health 103, e43–e51 (2013).

  9. 9.

    et al. Influenza-associated hospitalizations in the United States. J. Am. Med. Assoc. 292, 1333–1340 (2004).

  10. 10.

    et al. Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza Other Respir. Viruses 6, 404–416 (2012).

  11. 11.

    & The evolutionary dynamics of human influenza B virus. J. Mol. Evol. 66, 655–663 (2008).

  12. 12.

    & The molecular epidemiology of influenza viruses. Semin. Virol. 6, 359–370 (1995).

  13. 13.

    et al. Integrating influenza antigenic dynamics with molecular evolution. eLife 3, e01914 (2014).

  14. 14.

    , & Strength and tempo of selection revealed in viral gene genealogies. BMC Evol. Biol. 11, 220 (2011).

  15. 15.

    , , & Bayesian phylogeography finds its roots. PLOS Comput. Biol. 5, e1000520 (2009).

  16. 16.

    , , & Influenzavirus infections in Seattle families, 1975–1979. I. Study design, methods and the occurrence of infections by time and age. Am. J. Epidemiol. 116, 212–227 (1982).

  17. 17.

    , , & Estimating household and community transmission parameters for influenza. Am. J. Epidemiol. 115, 736–751 (1982).

  18. 18.

    et al. Synchrony, waves, and spatial hierarchies in the spread of influenza. Science 312, 447–451 (2006).

  19. 19.

    , , & The generation of influenza outbreaks by a network of host immune responses against a limited set of antigenic types. Proc. Natl Acad. Sci. USA 104, 7711–7716 (2007).

  20. 20.

    , , & The roles of competition and mutation in shaping antigenic and genetic diversity in influenza. PLoS Pathog. 9, e1003104 (2013).

  21. 21.

    & Comparison of the mutation rates of human influenza A and B viruses. J. Virol. 80, 3675–3678 (2006).

  22. 22.

    et al. Immunogenicity and reactogenicity of 1 versus 2 doses of trivalent inactivated influenza vaccine in vaccine-naive 5–8-year-old children. J. Infect. Dis. 194, 1032–1039 (2006).

  23. 23.

    et al. Probing of the receptor-binding sites of the H1 and H3 influenza A and influenza B virus hemagglutinins by synthetic and natural sialosides. Virology 196, 111–121 (1993).

  24. 24.

    et al. Hemagglutinin receptor binding avidity drives influenza A virus antigenic drift. Science 326, 734–736 (2009).

  25. 25.

    , , & Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012).

  26. 26.

    , & Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol. Biol. Evol. 23, 7–9 (2006).

  27. 27.

    et al. A Bayesian phylogenetic method to estimate unknown sequence ages. Mol. Biol. Evol. 28, 879–887 (2011).

  28. 28.

    , & Bayesian estimation of ancestral character states on phylogenies. Syst. Biol. 53, 673–684 (2004).

  29. 29.

    , , , & A counting renaissance: combining stochastic mapping and empirical Bayes to quickly detect amino acid sites under positive selection. Bioinformatics 28, 3248–3256 (2012).

  30. 30.

    et al. Ancient hybridization and an Irish origin for the modern polar bear matriline. Curr. Biol. 21, 1251–1258 (2011).

  31. 31.

    , , , & Epidemiological characteristics of pandemic influenza H1N1 2009 and seasonal influenza infection. Med. J. Aust. 191, 146–149 (2009).

  32. 32.

    , , & Differences in patient age distribution between influenza A subtypes. PLoS ONE 4, e6832 (2009).

  33. 33.

    et al. Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Med. 5, e74 (2008).

  34. 34.

    , & Contact network structure explains the changing epidemiology of pertussis. Science 330, 982–985 (2010).

  35. 35.

    , & Canalization of the evolutionary trajectory of the human influenza virus. BMC Biol. 10, 38 (2012).

  36. 36.

    & Dynamics and selection of many-strain pathogens. Proc. Natl Acad. Sci. USA 99, 17209–17214 (2002).

  37. 37.

    , , & Traveling waves in a model of influenza A drift. J. Theor. Biol. 222, 437–445 (2003).

  38. 38.

    et al. Mapping the antigenic and genetic evolution of influenza virus. Science 305, 371–376 (2004).

Download references

Acknowledgements

We thank National Influenza Centres worldwide for their contributions to influenza virus surveillance. T.B. was supported by a Newton International Fellowship from the Royal Society and through National Institutes of Health (NIH) U54 GM111274. S.R. was supported by Medical Research Council (UK, Project MR/J008761/1), Wellcome Trust (UK, Project 093488/Z/10/Z), Fogarty International Centre (USA, R01 TW008246-01), Department of Homeland Security (USA, RAPIDD program), National Institute of General Medical Sciences (USA, MIDAS U01 GM110721-01) and National Institute for Health Research (UK, Health Protection Research Unit funding). The Melbourne WHO Collaborating Centre for Reference and Research on Influenza was supported by the Australian Government Department of Health and thanks N. Komadina and Y.-M. Deng. The Atlanta WHO Collaborating Center for Surveillance, Epidemiology and Control of Influenza was supported by the US Department of Health and Human Services. NIV thanks A.C. Mishra, M. Chawla-Sarkar, A. M. Abraham, D. Biswas, S. Shrikhande, B. AnuKumar, and A. Jain. Influenza surveillance in India was expanded, in part, through US Cooperative Agreements (5U50C1024407 and U51IP000333) and by the Indian Council of Medical Research. M.A.S. was supported through National Science Foundation DMS 1264153 and NIH R01 AI 107034. Work of the WHO Collaborating Centre for Reference and Research on Influenza at the MRC National Institute for Medical Research was supported by U117512723. P.L., A.R. & M.A.S were supported by EU Seventh Framework Programme [FP7/2007-2013] under Grant Agreement no. 278433-PREDEMICS and ERC Grant agreement no. 260864. C.A.R. was supported by a University Research Fellowship from the Royal Society.

Author information

Author notes

    • Alexander Klimov

    Deceased.

Affiliations

  1. Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA

    • Trevor Bedford
  2. MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London SW7 2AZ, UK

    • Steven Riley
  3. Fogarty International Center, National Institutes of Health, Bethesda, Maryland 20892, USA

    • Steven Riley
    •  & Andrew Rambaut
  4. World Health Organization (WHO) Collaborating Centre for Reference and Research on Influenza, Melbourne, Victoria 3000, Australia

    • Ian G. Barr
    • , Aeron C. Hurt
    •  & Anne Kelso
  5. SGT Medical College, Hospital and Research Institute, Village Budhera, District Gurgaon, Haryana 122505, India

    • Shobha Broor
  6. National Institute of Virology, Pune 411001, India

    • Mandeep Chadha
    •  & Varsha Potdar
  7. WHO Collaborating Center for Reference and Research on Influenza, Centers for Disease Control and Prevention, Atlanta, Georgia 30329, USA

    • Nancy J. Cox
    • , Alexander Klimov
    •  & Xiyan Xu
  8. WHO Collaborating Center for Reference and Research on Influenza, Medical Research Council National Institute for Medical Research (NIMR), London NW7 1AA, UK

    • Rodney S. Daniels
    •  & John W. McCauley
  9. King Institute of Preventive Medicine and Research, Guindy, Chennai 600032, India

    • C. Palani Gunasekaran
  10. Melbourne School of Population and Global Health, University of Melbourne, Parkville, Victoria 3010, Australia

    • Aeron C. Hurt
  11. Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK

    • Nicola S. Lewis
    • , Eugene Skepner
    •  & Derek J. Smith
  12. WHO Collaborating Center for Reference and Research on Influenza, National Institute for Viral Disease Control and Prevention, China CDC, Beijing 102206, China

    • Xiyan Li
    • , Yuelong Shu
    •  & Dayan Wang
  13. WHO Collaborating Center for Reference and Research on Influenza, National Institute of Infectious Diseases, Tokyo 208-0011, Japan

    • Takato Odagiri
    •  & Masato Tashiro
  14. Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, UK

    • Andrew Rambaut
  15. Centre for Immunology, Infection and Evolution, University of Edinburgh, Edinburgh EH9 3FL, UK

    • Andrew Rambaut
  16. Department of Viroscience, Erasmus Medical Center, 3015 Rotterdam, The Netherlands

    • Derek J. Smith
  17. Department of Biostatistics, UCLA Fielding School of Public Health, University of California, Los Angeles, California 90095, USA

    • Marc A. Suchard
  18. Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, California 90095, USA

    • Marc A. Suchard
  19. Department of Human Genetics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, California 90095, USA

    • Marc A. Suchard
  20. Department of Microbiology and Immunology, Rega Institute, KU Leuven – University of Leuven, 3000 Leuven, Belgium

    • Philippe Lemey
  21. Department of Veterinary Medicine, University of Cambridge, Cambridge CB3 0ES, UK

    • Colin A. Russell

Authors

  1. Search for Trevor Bedford in:

  2. Search for Steven Riley in:

  3. Search for Ian G. Barr in:

  4. Search for Shobha Broor in:

  5. Search for Mandeep Chadha in:

  6. Search for Nancy J. Cox in:

  7. Search for Rodney S. Daniels in:

  8. Search for C. Palani Gunasekaran in:

  9. Search for Aeron C. Hurt in:

  10. Search for Anne Kelso in:

  11. Search for Alexander Klimov in:

  12. Search for Nicola S. Lewis in:

  13. Search for Xiyan Li in:

  14. Search for John W. McCauley in:

  15. Search for Takato Odagiri in:

  16. Search for Varsha Potdar in:

  17. Search for Andrew Rambaut in:

  18. Search for Yuelong Shu in:

  19. Search for Eugene Skepner in:

  20. Search for Derek J. Smith in:

  21. Search for Marc A. Suchard in:

  22. Search for Masato Tashiro in:

  23. Search for Dayan Wang in:

  24. Search for Xiyan Xu in:

  25. Search for Philippe Lemey in:

  26. Search for Colin A. Russell in:

Contributions

C.A.R. and T.B. conceived the research. C.A.R. and T.B. drafted the manuscript with substantial support from P.L. and S.R. I.G.B., S.B., M.C., N.J.C., R.S.D., C.P.G., A.C.H., A.K., A.Kl. X.L., J.W.M., T.O., V.P., Y.S., M.T., D.W. and X.X. coordinated and produced the influenza surveillance data. T.B. performed the modeling and data analyses along with C.A.R., S.R., P.L., M.A.S. and A.R. T.B. created the figures. All authors discussed the results and contributed to the revision of the final manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Colin A. Russell.

Extended data

Extended data figures

  1. 1.

    Spatial distribution of 4,006 H3N2, 2,144 H1N1, 1,999 Vic and 1,455 Yam samples.

  2. 2.

    Inferred location of the trunk of H3N2 tree through time in the primary data set (a) and in a smaller secondary data set (b).

  3. 3.

    Average inferred geographic history of region-specific samples for H3N2, former seasonal H1N1, Vic and Yam viruses from 2000 to 2012.

  4. 4.

    Maximum clade credibility (MCC) trees for region-specific samples from USA/Canada, India and South China for H3N2, H1N1, Vic and Yam viruses.

  5. 5.

    Antigenic map of Vic viruses primarily collected in 2008 (a), age distribution of infections for H3N2 (b), H1N1 (c) and B (d) in Australia 2000–2011, age distribution of 102.5 million passengers at London Heathrow and London Gatwick airports during 2011 (e), time series of virological characterizations from 2000 to 2012 of viruses from the USA by US CDC and from Australia by VIDRL for H3N2 (f), H1N1 (g), Vic (h) and Yam (i).

  6. 6.

    Combined persistence estimates across pairs of regions for H3N2, H1N1, Vic and Yam (a) and Spearman correlation of a region’s persistence vs the region’s contribution to phylogenetic ancestry for H3N2, H1N1, Vic and Yam (b).

  7. 7.

    Simulation results for a model parameterized for slow antigenic drift (a), moderate antigenic drift (b), and fast antigenic drift (c).

  8. 8.

    Simulation results showing relationship between antigenic drift and persistence as a function of seasonality (a) and simulation results showing the effects of modulating transmission rate β on model behaviour (b).

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains a Supplementary Discussion and additional references.

Excel files

  1. 1.

    Supplementary Data

    This file contains the genetic sequence accession numbers.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature14460

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.