Main

Major public health threats can occur when a virus that is endemic in a reservoir species is transmitted in a new host population, as occurred with severe acute respiratory syndrome-coronavirus (SARS-CoV). The need to assess the risk of such outbreaks in the future has led to calls for genetic surveillance of high-risk viruses (for example, lyssaviruses and avian influenza viruses) in their reservoir hosts (for example, bats, poultry and wild birds)1,2,3. Such approaches assume that certain genetic variants, or combinations of them, are more likely to emerge, and that these can be recognized before a host jump. For example, genomic analysis of the influenza A virus that was linked to the 1918 Spanish flu pandemic, may reveal the genetic factors that made Spanish flu so transmissible and lethal, and allow real-time assessment of the risk for a pandemic and potentially prevention of avian influenza in poultry and wild birds through genetic surveillance. If ten amino acid changes were required to jump from avian hosts to humans in 1918, as proposed4, then the risk of emergence would be higher when nine of these mutations are present in viruses isolated from poultry than when none is (although note that the nature and origin of genetic changes in the 1918 pandemic remain under debate5,6).

To predict host jumps it is necessary to identify the causal genetic markers, which is not a trivial task. As host jumps can be entirely due to ecological change, a causal genetic change in the virus is not actually necessary7,8,9,10,11. Even if viral adaptation is responsible for the host jump, genomic information can be used to predict future risk only if genetic markers of adaptation to new hosts can be identified and their influence on viral spread can be characterized. As G. C. Williams first pointed out nearly 50 years ago12, analyses of adaptation are extremely difficult because of the risk of 'adaptive storytelling' (Refs 13,14): it is extremely easy to infer adaptation when it does not exist. In this Review, we discuss what is required to show that particular viral genetic changes are responsible for host jumps. Microbiological experiments are essential to distinguish between the host jump-associated changes in the viral genome that are caused by adaptation and those that are not, particularly because quantifying viral fitness is the gold standard for rigorously demonstrating adaptation15.

Host-jump mechanisms

There are different ecological and evolutionary processes involved in viral host jumps (Fig. 1). If the primary factor causing emergence is ecological, and adaptation is not required for the jump to occur, the cause of the host jump is known as an ecological driver (Fig. 1a,b). However, if genetic change in the virus is required for emergence in a new host, the cause is termed an adaptive driver (Fig. 1c,d), although an ecological driver is likely to be present in this situation as well. An adaptive driver is required when a selective 'sieve' is present in the new host population immediately after cross-species transmission, so that at least some of the virus genotypes are excluded from emergence by selection rather than by chance. More precisely, the term adaptive driver is used to describe situations in which specific viral genotypes capable of sustained spread in the new host species are selected over other genotypes that are doomed to fail. A central question for the field is how often adaptation is required for a host jump to occur.

Figure 1: Mechanisms of viral emergence in new hosts.
figure 1

Host jumps and associated genetic diversity can arise through a range of ecological and evolutionary mechanisms. a | Ecological driver with founder effects. Any of the viral genotypes circulating in the reservoir are already competent for transmission in the new host; the basic reproductive number (R0) of the virus strains in the reservoir and the new host are > 1. Following host jump, neutral mutations occur during replication and transmission in the new host population. The combination of founder effects and neutral mutation result in a shifted distribution of viral genotypes in the new host, which each have R0 > 1. b | Ecological driver with adaptive fine-tuning. The circulating viral genotypes that spill over are already adapted for transmission to the new host (R0 > 1), but adaptive substitutions occur owing to long-term selection in the new host even though they were not required for the initial emergence, such that the R0 of the adapted virus is greater than the R0 of the virus that initially spilled over. c | Adaptive mutation in an unadapted genotype. The reservoir strain that causes emergence has R0 < 1 in the new host and requires additional genetic change to reach R0 > 1. The change could occur in the initial host individual in whom spillover occurred or during stuttering transmission. Some genotypes will possess a genetic or phenotypic predisposition to acquire the necessary adaptive mutations and achieve R0 > 1, which makes them more likely to emerge. These strains are predictors of emergence risk. d | Spillover of a fortuitously adapted genotype. The genotype that spills over and causes emergence already has R0 > 1 in the new host, unlike other viral genotypes circulating in the reservoir population. Scenarios b and d differ because in d some genotypes in the reservoir host can establish in the new host, whereas others cannot; in b, the genotype composition in the reservoir host is not a predictor of emergence.

When adaptation is required for the viral host jump (Fig. 1c,d), the adaptive genetic changes can originate either in the new host (known as 'tailor-made'; Fig. 1c) or in the reservoir host (known as 'off-the-shelf'; Fig. 1d)16. If the crucial mutations occur in the new host, a continuum of scenarios can occur; at one end, genotypes circulating in the reservoir population may be genetically close to genotypes that could emerge successfully in the new host, whereas at the other end the host genotypes could be distant. The definition of close and how many close genotypes may exist, and at what frequencies, are key issues to be resolved for predicting the risk of emergence. Alternatively, all the necessary genetic material required for the viral host jump may arise in the reservoir host, in which case risk prediction depends on understanding which conditions produce the genetic variants that are ready to jump, how often they may be produced and whether they can be maintained in the reservoir host by selection (which could give rise to a particularly high risk of emergence if ecological conditions allow). Regardless of whether the final adaptive steps are taken in the reservoir or new host, at least some information about the risk of emergence can in principle be inferred from viral genetic data sampled from the reservoir population (in contrast to the processes depicted in Fig. 1a,b, in which genetic surveillance of reservoir populations is not informative about risk) — the hard part is to deduce which markers predict risk.

Identifying adaptation

Given the range of processes that can drive a host jump, the question is how to distinguish adaptive genetic changes (which are potentially predictive of emergence risk) from those derived from other evolutionary processes. This requires the demonstration that a particular genetic change affects fitness, which is challenging from the standpoint of fundamental evolutionary biology12,13,14 and applied eco-epidemiology17. Convergent evolution can be used to distinguish adaptation from neutral genetic variation on the basis of sequence data18,19 (although the absence of convergence, as has been observed for avian influenza viruses that circulate in swine20, does not rule out adaptation). The logic for this is that given the number of different possibilities for mutations and of opportunities for stochastic changes in any genotype, only a strong selective force is likely to cause multiple occurrences of the same genetic makeup originating from different starting points. In the context of host jumps, evolutionary convergence would mean the repeated evolution of the same or functionally related genetic changes associated with independent jumps of a virus into the same new host species.

One remarkable example of convergent evolution occurred during the early spread of SARS-CoV in humans in 2003. Viral sequences from patients in unconnected outbreaks in Beijing and Guangdong province, China, showed that a five-step mutational pathway had occurred at least twice independently following spillover from the reservoir host21,22. Although this is consistent with adaptation, these data alone do not imply that the changes were required for the host jump, as the observed evolutionary convergence occurred after cross-species transmission and could therefore indicate adaptive fine-tuning. However, evidence from an impressive series of studies provides further hints. Although numerous independent cases of SARS-CoV transmission from the reservoir host occurred, most died out after a few human cases, indicating that the introduced strain was not fit for human–human transmission23. The putative adaptive mutations, including non-synonymous changes in the spike protein responsible for host cell receptor binding, were found only in viruses transmitted between humans and not in those circulating in palm civets (the reservoir host) or in viruses that were transmitted from palm civets to humans in later spillover events that did not result in human–human transmission24,25. A representative viral strain isolated from palm civets could not replicate in cultured human airway epithelial cells, but the introduction of a single amino acid change (Lys479Asn), which is observed in all human isolates, substantially boosted this measure of fitness24. Structural analysis of this and another mutation nearby (Ser487Thr) revealed the molecular mechanism underlying this effect26. These studies illustrate the power of integrating surveillance, molecular epidemiology, bioinformatics and microbiology in making a compelling argument that the SARS-CoV host jump required viral adaptation.

Unfortunately, the type of data needed for testing for genetic convergence (for example, replicate host jumps linked to contact-tracing data) can be difficult to obtain. Thus, searches for genetic markers of host jumps often begin with large-scale genetic analyses of viral samples from many host species27,28,29,30. This identifies genetic differences of viruses in different host species but does not give information about which markers may be linked to adaptation: genetic differences between viral strains can arise in the new host as a consequence of founder effects and neutral evolution (Fig. 1a). Thus, the next step is to test whether amino acids at marker sites are the product of positive selection in the new host species31,32,33,34,35 to predict which markers are linked to adaptation (Fig. 1b–d). However, these predictions require experimental validation because most sequence-based methods of identifying positive selection rely on the assumption that non-synonymous mutations have much larger fitness effects than synonymous mutations, which can be false, especially in viruses. Indeed, viruses are known to adapt to selective pressures such as tRNA levels, polymerase efficiency controlled by secondary structuring in viral genomic RNA, RNA folding energy and changes in promoter sequences, through positive selection on synonymous mutations36,37,38,39,40,41. Thus, although a ratio of non-synonymous to synonymous mutations that is greater than 1 may indeed be indicative of positive selection, the opposite ratio is not necessarily indicative of negative selection, but could be the result of positive selection of a synonymous mutation (depending on the type of selective pressure). Screens for genetic markers of host adaptation should therefore include investigation of the primary sequence. Protein-based methods are a promising alternative to avoid reliance on this assumption35 but they are less well developed, and some require detailed organism-specific knowledge of protein evolutionary patterns, making them less accessible. Furthermore, a recent analysis showed that bioinformatic methods for identifying adaptive substitutions can have high rates of false positives or miss important adaptive changes42. Thus, experiments that directly measure the viral fitness effects of mutations are important to show adaptation, and, as discussed below, even more data are needed to confirm whether particular genetic markers of adaptation are drivers of host jumps (and therefore that the host jumps are not due to adaptive fine-tuning, disconfirming the mechanism in Fig. 1b in favour of those in Fig. 1c,d).

In addition to confirming whether host species-associated genetic markers are linked to selection, it is crucial to experimentally determine adaptive evolutionary pathways and constraints. In host jumps such as the emergence of feline panleukopenia virus in dogs (now considered a separate viral species, canine parvovirus), cell culture experiments indicate that multiple genetic changes need to occur together in the receptor binding site of the viral capsid to yield viable viruses in the new host species43. In cases in which intermediate viral genotypes have low fitness in the new host, as suspected in the emergence of canine parvovirus, important genetic markers may be undetectable by sequence-based screens of adaptive sites and therefore not available for risk assessment. Identifying how mutations interact in different genetic backgrounds is thus important for predicting the risk of emergence from genetic data. Site-directed mutagenesis, a process in which specific mutations are engineered into different genetic backgrounds, can serve to obtain fitness measures of the unobserved genotypes44. By reconstructing possible evolutionary pathways, this type of reverse genetic technique has been used to reveal the number of potential adaptive trajectories that can occur by a seven-step mutational pathway during the adaptation of HIV to alternative co-receptors45, an adaptive challenge analogous to the host jumps considered here. To effectively assess emergence risk in these cases, in which intermediate genotypes have low fitness, even more emphasis must be placed on identifying how such combinations can overcome the adaptive constraints; for example, by high mutation rates (such that double mutants are likely), by recombination or reassortment (that is, mixing of genetic material)46, by the use of intermediate hosts47 in which the mutation does not have low fitness or by increased contact of the reservoir host with the new host species48.

Measuring fitness

Although central to understanding natural selection and adaptation, fitness is frustratingly difficult to measure in natural settings. A common approach is to use surrogate measures of fitness (known as fitness components). Demonstrations of improved trait performance in the new host, or that an observed mutation improves trait performance or within-host fitness, provide compelling support that a particular substitution is adaptive. However, the relationship between individual traits, within-host fitness and virus transmission are complex and poorly understood49. For example, receptor binding affinity does not necessarily predict within-host fitness, and viral titres may not be directly proportional to transmission probability. Such measures provide valuable information, but they should be treated as surrogate measures of viral fitness. Successful emergence at the host population scale requires sustained transmission, and theoretical work has confirmed the importance of integrating within-host and between-host processes for explaining viral evolution50,51,52,53,54. Some theoretical advancements have been made towards integrating within-host microbiological insights with their population-scale genetic consequences for the virus53,54; however, these insights have yet to be applied in an empirical context. Key questions remain that must be addressed by experiments that directly measure viral transmission.

At the epidemiological scale, the basic reproductive number (R0) of a virus55 is probably a good approximation of viral fitness in the endemic case in a reservoir host56. However, during an outbreak, the number of secondary cases per case per unit time is a more appropriate measure of viral fitness than simply the total number of secondary cases per case. Strains with a greater ability to infect new hosts can be favoured as an epidemic builds even if the number of successful infections during their lifetime is lower57 (a parallel argument holds for viral replication within hosts58). On the basis of this, R0 can be used to measure viral fitness, as an R0 < 1 indicates that a strain is poorly adapted to its host and will die out59. Unfortunately, distinguishing whether R0 is above the threshold value of 1 is difficult, as viral strains with R0 < 1 can occasionally cause small outbreaks before dying out (owing to stuttering transmission60), which gives the appearance, at least temporarily, of R0 > 1 (Refs 61,62) Moreover, if a host jump arises because the virus adapts to the new host during this stuttering transmission (Fig. 1c), which may have occurred during the 2002–2003 SARS epidemic23, then evidence that the virus was initially not adapted to the new host and had R0 < 1 is essentially erased. To show that R0 < 1 for the original virus, multiple cross-species transmission events of the same viral strain, including multiple failed spillovers, have to be analysed63. Ideally, data would include contact-tracing information and host immune history to estimate the effect of host heterogeneities in host susceptibility to the virus or the virus infectivity on R0 (Ref. 64) (for example, the number of individuals that have been in contact with the host or the host immune status). This is obviously difficult but not impossible for diseases in which active surveillance efforts have identified hundreds of failed introductions, such as monkeypox virus65, H5N1 influenza virus66 and Nipah virus67. For example, H5N1 influenza transmission from avian species to humans has caused 498 cases (294 deaths) in 15 countries worldwide since 2003, but no sustained chains of human–human transmission have been observed66.

Distinguishing adaptive drivers of host jumps

Different mechanisms have been detected for several host jumps (Table 1). To provide sufficient evidence that adaptation caused a host jump (distinguishing the mechanisms in Fig. 1a,b from those in Fig. 1c,d), adaptation must be shown together with evidence that some genotypes circulating in the reservoir host have R0 < 1 in the new host and that the adaptive mutations to some of the circulating reservoir-derived strains lead to R0 > 1 in the new host. The most conceptually straightforward approach to identify the origin of the adaptive substitutions is to carry out fine-scale longitudinal sampling just before and after the host jump, followed by genome sequencing to identify new mutations and fitness assays of particular genotypes in both the reservoir and new hosts. Such data would be extremely valuable and, indeed, unique. However, there are still caveats. Although the identification of the emerged strain in the reservoir host shows that it originated there, it is challenging to draw firm conclusions if it is absent from the putative reservoir host because the identification of rare viral genotypes require extensive sampling. In the case of segmented viruses, which frequently undergo reassortment with strains from other host species, such as influenza A5, sampling from alternative host species may be required to determine the origin of the adaptive genetic material and the mechanism by which it arises.

Table 1 The role of adaptation in host jumps to humans for selected zoonotic viruses

Experiments can also ascertain whether the strain is likely to have been derived from the reservoir or new host (distinguishing scenarios in Fig. 1c and Fig. 1d). For example, if the fitness of the emerged strain is low in the reservoir species, the adaptive change probably arose in the new host (supporting the scenario in Fig. 1c) as such a strain would be present at low frequencies in the reservoir host owing to mutation–selection balance (although note that frequent co-infection and relaxed selection could allow low-fitness variants to persist by genetic complementation68). However, if strains that are closely related to the emerged strain have high fitness in the reservoir host or are observed at high frequencies in reservoir populations, it is more likely that the emerged strain arose in the reservoir (supporting the scenario in Fig. 1d). This hypothesis could be corroborated by the demonstration that the adaptive genetic material can be produced in the reservoir host, either by gene flow, reassortment or recurrent mutation, and fitness assays would give insight into whether they could be sustained at significant frequencies. Thus, distinguishing adaptive drivers of host jumps relies on microbiologal techniques, including reverse genetics, experimental measures of fitness and experimental evolution. Site-directed mutagenesis as well as recombination and reassortment studies in reservoir host models are particularly useful approaches to determine whether specific genotypes can be produced and selected in the reservoir, and replicate experimental evolution lines can assess how likely these events are and identify adaptive constraints and sources of selection (Table 2).

Table 2 Steps of investigation of examining viral host jumps*

Empirical foundation of the field

We have carried out a systematic review of the published literature of four viruses for which host-jump mechanisms are well studied: influenza A virus, SARS-CoV, canine parvovirus and Venezuelan equine encephalitis virus (VEEV) (184 publications) (see Supplementary information S1 (table), Supplementary information S2 (box) and Supplementary information S3–S7 (figures)). The survey revealed that research on viral host jumps typically unfolds in four steps (sampling of the virus, in vitro culture of the virus, studies in animal models and experiments in reservoir and new hosts (Table 2)) and that increased efforts on integrating surveillance practices and bioinformatics with microbiological experiments accelerate our ability to confirm genetic markers of viral adaptation.

Sampling and genetic analyses. To date, disease surveillance practices typically are limited to opportunistic sampling, with more systematic protocols coming after host jumps have occurred. There are three scales at which the collection of multiple samples is important: within host individuals, within host species and between host species (including both the reservoir and new host species). In existing studies, strains were most often sampled from many individuals of the host species and/or from individuals of multiple host species (Fig. 2), although many of these studies were limited to bioinformatic analyses of viral genetic data (Supplementary information S1 (table)). By contrast, a remarkably high number of experimental studies focused on detailed analyses of a single strain (and recombinants of that strain), which in practice is often a necessary trade-off to obtain the data needed to validate adaptation. Studies of strains that had previously not been investigated should be prioritized because it is crucial to examine many strains and to develop high-throughput methods for screening putative genetic markers for fitness effects. Multiple samples from a single host individual were almost never studied by any approach (Fig. 2), indicating a data gap. Cross-sectional and longitudinal sampling across both reservoir and new host species is essential for mapping where and when genetic changes take place. Not only do these samples provide the material for phenotypic assays, they are important for decreasing errors that are associated with estimates of evolutionary rates and for linking experimental results to disease characteristics and viral fitness. Experiments could improve our understanding of the propensity for viral adaptation by examining genetic variation throughout experimental infections and its effects on adaptation rates and epidemiological parameters such as infectious period.

Figure 2: Origin of strains studied in the surveyed literature.
figure 2

Articles describing evolutionary data of transmission of influenza virus, severe acute respiratory syndrome-coronavirus (SARS-CoV), canine parvovirus and Venezuelan equine encephalitis virus were grouped into one of the eight categories on the basis of the host origin of the viral strains used for experimentation and/or analyses. Inclusion of multiple strains from multiple hosts is important for a comprehensive experimental design. Numbers indicate the total number of published papers surveyed.

Integrating sampling efforts in the field with the design of host-jump studies is important because the ecological and epidemiological context inherently determines changes in viral genetic variation and fitness. More than half of the studies we surveyed focused only on genetic sequence analyses and did not carry out experiments (Supplementary information S5 (figure)), and most did not integrate ecological, epidemiological and host response data (Fig. 3). This is a lost opportunity, because without data on ecological context it is extremely difficult to identify the signature of specific evolutionary drivers from sequence data and to separate changes in transmission that are due to ecology from those that are due to evolutionary processes. Similarly, a lack of epidemiological information, such as direct measures of virus incidence or data on host population serology, hampers the correlation between genetic patterns and factors such as disease virulence or selection by host immune status. For example, recent experiments on a phage–bacterium model system showed that adaptation to new hosts is extremely sensitive to contact rates between the original and new host species69. Thus, to obtain the greatest insight into viral evolutionary patterns from sequence data, information about host demographics and contact patterns should be a component of strain sampling protocols (see Ref. 70 for a detailed review on current gaps in strain sampling and sequencing strategies).

Figure 3: Types of relevant data collected by surveyed articles.
figure 3

The Y axis indicates the number of published papers. Data for all viruses are shown on the left. Dashed lines in right plot indicate the total number of papers for each virus shown. Sequence data refer to sequence analysis of genes, genome or proteins; molecular data refer to viral phenotypes, including antigenicity, receptor binding, genome replication, virion packaging and polymerase binding; kinetic data measure the time course of virion production, reflecting within-host fitness; epidemiological data measure incidence of the virus in the host population; and ecological data refer to environmental conditions, including host movement or contact patterns. Note that articles can fall into more than one category. SARS-CoV, severe acure respiratory syndrome-coronavirus; VEEV, Venezuelan equine encephalitis virus.

Reverse genetics. The three fundamental components of adaptive evolution are genetic inputs, phenotypic variation and selection by the environment. To identify and explain the forces that underlie evolutionary change in newly emerged viral populations, the three components need to be measured concurrently (Supplementary information S5 (figure)). For influenza A, site-directed mutagenesis and plasmid-mediated transfection of all eight genetic segments have been successfully applied to identify virulence-enhancing mutations that are selected in new hosts71,72. More recent studies have emphasized the importance of repeating these reverse genetic assays in other epidemic strains of the virus. For example, although a Glu627Lys substitution in influenza polymerase basic protein 2 is a virulence-enhancing marker in H5N1 (Ref. 71) and H7N7 (Ref. 72), it has no effect in the recent H1N1 pandemic strain73,74. Other challenges for future research are to expand these types of study so that multiple strains from the reservoir host and the new host are examined in both hosts, and to decrease the use of virus strains that have been heavily passaged in the laboratory (Fig. 3) (Supplementary information S4 (figure)). By testing the phenotypic effects of putative host-specificity mutations in more than one strain, other mutagenesis studies have shown that the genetic background of the virus influences interactions between mutations (known as epistasis75), the number of traits affected by a mutation (known as pleiotropy49) and the adaptability of the virus76, even when the genetic distance between strains is minimal. Measurement of mutational effects in multiple genetic backgrounds and many host models is important to identify molecular mechanisms of viral host jumps and draw general conclusions.

Beyond receptors and antigens. A further priority is to expand the scope of phenotypic analyses. Decades of research have revealed that receptor binding can be an important barrier for a host jump to overcome, and phenotypic analyses of some viruses in this Review were biased towards measuring this and other simple traits such as antigenicity (which is a logical and practical place to start) (Supplementary information S5 (figure)). In the more frequently studied viruses, other traits encoded in viral non-structural proteins (for example, replication rate, modulation of host cell factors that regulate the host innate immunity and RNA folding free energy) have been measured and found to be crucial determinants of within-host viral fitness in new hosts41,46,71,77,78 (Supplementary information S5 (figure)). Future research should be broadened to include other traits that influence within-host fitness, such as those involved in cell egress or nuclear entry79, as well as those that determine transmission directly (as done recently for influenza A80) and indirectly, including transmission-related traits such as stability outside the host, alternative routes of transmission and dissemination rates to routes of transmission; all are major determinants of viral fitness landscapes. Moreover, as experimental transmission is used to assess transmissibility of newly emerged strains in their reservoirs81, it is becoming apparent that within-host replication and between-host infectivity do not necessarily correlate82, which again raises concerns about the use of within-host fitness to measure viral population fitness.

Host models. The impact of host genotype (and more broadly, host species) on measures of viral phenotype must not be underestimated. Single viral mutations can affect viral fitness differently depending on host genotype49,83,84,85, and amino acid substitution rates and evolutionary patterns depend on both host genotype and the patterns of contact that determine how the virus encounters different host types86,87,88,89. Thus, when it is necessary to use model hosts, it is important to measure differences in viral transmission in both the reservoir and new hosts using models that mimic the natural hosts as closely as possible. In the case of influenza A, it is feasible to carry out experiments in poultry, waterfowl, passerine birds and swine, and these systems can be used for understanding strain-specific infection characteristics of new epidemic strains90,91 and the within-host evolutionary processes that generate them92. The combined results of three recent studies of H1N1 highlight the importance of using natural host systems whenever possible to assess viral characteristics93,94,95 (see ProMED mail post on influenza A H1N1 pandemic). Studies of influenza A viruses from human cases during the recent H1N1 pandemic of (from May 2009 to January 2010) found that the Asp222Gly mutation in haemagglutinin caused significantly more severe disease. Earlier studies had used a glycan array to show that this substitution expanded tissue tropism in humans, which could potentially increase disease severity by allowing it to infect more cell types. However, the association between disease severity and the Asp222Gly substitution was not found in ferrets, a commonly used model species for mammalian influenza A93. Furthermore, infection of different bird species with H5N1 results in a wide range of morbidity, mortality and virus shedding patterns both within and between taxonomic groups91,96,97.

Replication. Distinguishing processes that drive host jumps from patterns that occur owing to chance replication of host-jump events is important, and two parameters need investigating: between host-jump events (examining multiple host-jump events for a single virus strain) and within an event (examining multiple strains from the same host-jump event). Replication between events identifies potential generalities of host-jump mechanisms, whereas replication within events helps to quantify the likelihood of re-occurrence, both of which can be used to quantify risk. Although desirable, replication at both levels is a challenge as host jumps are usually infrequent and unpredictable. Despite the logistical challenges involved, more than half of the published studies include some level of replication (see Supplementary information S6 (figure)). Greater investment in experiments that compare strains from independent emergence events under the same experimental conditions will help to elucidate host-jump traits that can be used as markers of emergence risk. This approach has been used to identify genetic markers of host-jump potential of VEEV. First, phylogenetic analyses of VEEV that included multiple strains from independent, successful jumps from rodents to horses (epizootic strains)98 were used to identify mutations that were unique to the epizootic strains (that is, potential genetic markers of epizootic risk). One of these mutations, Glu117Lys, alters the charge of the surface-exposed E2 glycoprotein, suggesting that it could also have fitness effects in the new host (horses). Subsequent infection experiments in horses with chimaeric viruses showed that this and other charge-altering mutations in the virus capsid are both necessary and sufficient to produce high viral titres in horses99, thus allowing for transmission to the mosquito vector and increased epizootic potential. Two charge-altering mutations (Thr213Arg and Thr213Lys) have even been repeatedly associated with VEEV host jumps98,99,100 (convergent evolution); however, whether this substitution alone can produce the epizootic phenotype seems to depend on the genetic background in which the substitution occurs98,99,100, and host jumps have been observed without this particular change98.

Experimental evolution. Experimental evolution is a powerful technique to determine the role of different evolutionary processes in defining genetic patterns101,102,103. The procedure typically involves the propagation of an isolated genotype under controlled conditions followed by genetic sequencing and, ideally, fitness measures of evolved genotypes. Viruses are ideal for this because their genomes are small enough that they can be fully sequenced many times, and they replicate fast enough that high rates of evolution are observable on short timescales (days to weeks). In phages, replication of experimental evolution (testing independent lineages of the same strain) has revealed high levels of parallel evolution104,105,106, which is not only strong evidence of adaptation but also allows quantification of the role of random variation in virus evolution — a necessary but elusive ingredient for predicting evolutionary outcome and understanding adaptive constraints. Similarly, experimental evolution of avian influenza strains in mice has revealed convergent evolution with strains isolated from humans, providing a useful way to identify host-jump markers of adaptation from large pools of genetic data107,108,109. Such experiments can also address important knowledge gaps, such as how often adaptive mutations can be produced in the reservoir host or in new hosts, which selective pressures promote or prevent fixation of adaptive mutations, the frequency of parallel and convergent evolution, and the trajectory of adaptive evolution. Intensive sequencing of replicate viral strains and evolutionary lines can reveal the extent of adaptive constraints and the impact of genetic diversity and population bottlenecks on evolutionary trajectories.

The design of experimental evolution experiments is challenging. In a recent experimental evolution study of influenza H3N2, the combination of viral gene sequence data of evolved viral populations with measures of virulence, tissue tropism and pathology revealed adaptive mutations that enabled the evolved population to replicate faster and to higher titres in mouse lungs and other tissues and cell types110. This evolution experiment involved serial lung–lung passaging in mice, starting with intranasal infection, harvesting the virus from the lungs of infected animals and subsequent infection of naive mice with homogenized lung tissue. This method was appropriate for the goals in this study, but the homogenization procedure reduces the evolution experiment to a single lineage, which does not allow drawing any conclusions about the adaptation or adaptive constraints of other lineages. It is also noteworthy that the evolved viral population did not transmit well in experimental transmission assays, illustrating that the selection procedure (extracting the virus from lungs and administering it directly to naive mice) did not impose selection on transmission traits and, again, that within-host fitness may not predict between-host fitness. Although this type of experimental design is useful for identifying adaptive mutations and linking them to specific functions, it does not impose selective pressures that are likely to be present in nature and thus does not facilitate the identification of genetic markers that would confer adaption in the wild.

Conclusion

The increasing ease of large-scale genomic sequencing, together with advances in bioinformatics, molecular evolutionary theory and new statistical tools for linking viral genetic variation with epidemiology and phylogeography111,112, is providing valuable means to visualize viral emergence and generating hypotheses about evolutionary mechanisms. However, full analysis of the resulting hypotheses must also involve biological measurements. Accessible databases have become available to support the growing pool of genetic data that have resulted from increased surveillance. Alongside the genetic data, these databases should include ecological, epidemiological and phenotypic data (an effort that has been piloted for influenza A113). This would also help to establish research design standards for studying pathogen emergence so that key public health gaps can be addressed. One approach that would provide balance in data collection and integrative analyses is to develop disease-emergence funding programmes that require interdisciplinary teams (including field ecologists, microbiologists, immunologists, epidemiologists, bioinformaticians and evolutionary biologists) using multiple approaches (field sampling, laboratory experiments, data analysis and theoretical modelling). Such programmes would encourage greater balance in the types of data collected, would help to ensure that data collection is structured in a way that is conducive to analytical goals and would promote the broad collaborations needed to address overarching questions about disease emergence. Detecting adaptation is a great challenge in any context, and the case of viral host jumps is no exception. But understanding the adaptive genetic change involved in host jumps could yield large gains. Not only would we have a more complete account of the role of natural selection in host jumps, we could also generate genetic markers for future risk of epidemic or pandemic disease.