To counter innate and adaptive immune responses, microbial pathogens have developed multiple strategies to limit immune detection or to manipulate the host immune responses to their advantage. It is thought that pathogens that elicit an adaptive immune response evade recognition and elimination by the host immune system by varying their antigenic targets. This creates selection for antigenic variation because hosts who develop immunity to one variant of an antigen may still be susceptible to infection with pathogens that lack that antigen or express a different allelic variant of it1,2,3,4,5,6,7,8,9,10,11,12,13,14. Consistent with this paradigm, antigenic determinants of viral, bacterial and protozoan pathogens are often highly variable15,16,17. However, the population structure of this antigenic diversity exhibits different patterns in space and time18. For some chronic pathogens — including HIV, the human malaria parasite Plasmodium falciparum and the trypanosome parasites — diversity occurs within a host during the course of a single infection owing to active switching between antigenic variants or serial selection resulting from immune evasion. This within-host selection can result in extreme diversity in the circulating pathogen population (for a more in-depth discussion of this topic, see Box 1). More commonly, antigenic diversity is observed at the level of the pathogen population; for these so-called multi-strain pathogen species, such as influenza virus and the bacterial pathogen Streptococcus pneumoniae, multiple clones or strains of pathogen co-circulate in the host population, each of which is defined by a particular set of antigenic determinants.

It is widely accepted that multi-strain pathogens experience immune-mediated selection pressure to diversify. Despite there being other potential causes of antigenic diversity — such as variability stimulated by host shifts, which is particularly relevant for zoonotic viruses19,20 — it has become a truism that this host immune pressure selects for pathogen antigenic diversity. Nonetheless, some aspects of antigenic diversity are not fully explained. In particular, many models of multi-strain pathogen systems assume that the immune response is directed at a single antigenic target, such that diversification of that target offers a clear fitness advantage, which may require compensatory pathogen mutations to repair the fitness defects resulting from a loss of function of the pathogen target caused by the initial change21,22,23,24,25. For most infections, however, host immunity involves multiple effectors targeting multiple antigens and even multiple epitopes within a single antigen. Similarly, most pathogens have complex patterns of antigenic diversity at the population level, sharing some but not all antigenic targets among strains. These observations suggest two closely related puzzles, the first of which relates to mechanisms of immune escape by the pathogen and the second of which concerns the causes of the observed patterns of pathogen diversity at the population level. First, if specific immune responses directed against any one of multiple pathogen antigenic targets are sufficient to prevent infection, how does an individual pathogen gain an advantage by varying only one or a few of these targets when this would leave it vulnerable to clearance by immune responses directed at the other targets? Second, why does a single pathogen species often have antigenic diversity at multiple loci?

These questions are striking by analogy to another selective pressure on pathogens, namely, antimicrobial treatment. It is clear how pathogen mutants resistant to a single antimicrobial agent used for treatment can rise to high frequency in a treated host. However, multiple drugs are used to treat infections such as tuberculosis precisely because mutants resistant to any one of these drugs will still be killed by the other drugs26,27,28,29,30. By analogy, one might expect that a multi-targeted host immune response would preclude any selective advantage for a pathogen variant that differs at a single locus, because responses directed at the other targets would be sufficient to eliminate the pathogen. In this scenario, host immunity targeting multiple antigens and multiple epitopes on each antigen is analogous to combination therapy with a large number of drugs. Indeed, this comparison has been invoked to explain why resistance to vaccines is relatively uncommon, whereas resistance to drugs is common31. In the context of naturally acquired immune responses, however, the existence of high levels of sequence diversity in pathogens suggests that there is some benefit of immune escape at each of the individual loci to which hosts develop immune responses. What is the mechanism by which such diversity arises and is maintained in the context of multi-faceted immune responses?

We consider five hypotheses, each of which may partially help to resolve the question of how pathogens escape from multiple immune responses and maintain diversity at multiple antigenic loci. Here, we consider the case of acute infection, where reinfection depends on how distinct the new infecting strain is from the strain of previous exposure. Different models, not discussed here, may apply to chronic infections.

Hypothesis 1 posits that pathogens do not vary only one or a few important antigenic loci at a time but rather that individual pathogen strains differ from one another at nearly all of the important antigenic loci. Hypothesis 2 questions the assumption that immune responses directed at any individual pathogen antigen are fully protective, proposing instead an additive model whereby each antigenic locus at which a pathogen can escape a pre-existing immune response of the host will quantitatively improve the ability of the pathogen to infect that host. Hypothesis 3 suggests that varying one antigenic locus can render responses to other antigenic loci less effective, thereby creating escape from all immune responses with a single change to a pathogen antigen. Hypothesis 4 suggests that hosts are heterogeneous for various genetic and non-genetic reasons, so that an immune response targeting a particular antigen that is strongly protective in one host may be absent or less protective in another host. Hypothesis 5 emphasizes that antigens are not only targets of immunity but also functional molecules that are often involved in host–pathogen interactions; it posits that the puzzling degree of sequence variation in some pathogen antigens may reflect selection for functional variability rather than antigenic escape.

These hypotheses are not mutually exclusive. For any particular pathogen species, several of these mechanisms may be operative. Although theoretical work has explored some of these ideas, most empirical studies of this issue have focused on a single hypothesis in a single pathogen. By bringing together the theoretical and empirical evidence for each hypothesis, we seek to stimulate further work to understand the phenomenon more deeply in individual pathogen species, as well as to inspire additional theoretical and comparative studies that may reveal the general mechanisms at work across host–pathogen systems.

Each hypothesis proposes that one of the common modelling assumptions about immune selection is incorrect. These hypotheses are not novel in themselves. Indeed, immunologists already know that many of these modelling assumptions do not fit some of the experimental data they observe. Rather, our goal in discussing these hypotheses as explanations for the polymorphism of multiple antigens in a single pathogen species is to suggest further directions to exploit the demonstrated synergy between pathogen population biology and immunology. In particular, immunological findings about the antigenicity of particular pathogen moieties have stimulated advances in modelling the evolutionary epidemiology of those pathogens14,32,33,34,35,36, and population-level observations about patterns of variability in pathogen antigens have stimulated mechanistic studies to understand the causes and functional consequences of those variations37,38,39 and their implications for vaccination40. With this Opinion article, we seek to highlight productive directions for future research along these lines that is directed at understanding multiple antigens and their roles in pathogen fitness, immunogenicity and immune clearance.

Multi-locus antigenic diversity

Hypothesis 1: non-overlapping antigenic repertoires

We refer here to two studies that were among the first to highlight the puzzle of antigenic diversity at multiple pathogen loci and to propose a solution12,41. The authors noted the existence of antigenic diversity at multiple loci within two microbial populations — those of the malaria parasite P. falciparum and the bacterium Neisseria meningitidis. They proposed a conceptual model in which a host is completely immune to any pathogen strain that shares an antigenic variant with a strain that previously infected the same host. Under this model, if we consider three antigenic loci, A, B and C, each having two alleles (Fig. 1a), a host who has previously been infected by a pathogen of genotype A1B1C1 would be immune to infection with any pathogen carrying variant A1 or B1 or C1 (Fig. 1b). In this setting, pre-existing immunity at each antigenic locus would prevent a host from being infected by any pathogen genotype that shared any allele with a previous infection. Thus, if the pathogen cannot escape responses directed at all loci, what is the benefit of escaping at one of the antigenic loci? The resolution offered by the original authors12,41 is that selective pressure from host immunity of this sort leads to pathogen populations within which several pathogen strains coexist, each with completely non-overlapping antigenic repertoires. Thus, strain 1 may have variant 1 of each locus A, B and C (A1B1C1), and hosts immune to this strain can be reinfected by strain A2B2C2 only, as they would be immune to strains with any shared antigenic variants, such as strain A1B2C2 (Fig. 1b). At the population level, this leads to the emergence and coexistence of pathogen strains that are completely distinct in their antigenic epitopes, and it selects against strains with partially overlapping antigens.

Fig. 1: Hypothesis 1: non-overlapping antigenic repertoires.
figure 1

a | We consider three antigenic loci, A, B and C, each of which has two variants, such that A1, A2, B1, B2, C1 and C2 together make up the antigenic space for the example pathogen. b | A host who has previously been infected by a pathogen of genotype A1B1C1 generates immune responses against A1, B1 and C1 antigens and is therefore immune to infection with any pathogen with a genotype carrying A1 or B1 or C1. Therefore, such a host would be subsequently susceptible to only a pathogen expressing a different variant at each antigenic locus — namely, A2B2C2.

Some pathogens may indeed conform to this prediction. In particular, N. meningitidis has patterns of non-overlap between different antigenic loci that are consistent with the prediction3,14. However, it is challenging to evaluate the generality of this hypothesis because, for most pathogen species, the relative importance and identity of antigenic loci remain poorly defined. Also, the hypothesis predicts that recombination and mutation will constantly generate some pathogens that deviate from the non-overlapping pattern (although these would be eliminated by selection), which raises the practical question of what degree of deviation from purely non-overlapping antigenic repertoires would be required to falsify the hypothesis. For example, in S. pneumoniae, another bacterial pathogen with multiple antigens, there is evidence of association (linkage disequilibrium) between alleles at different antigenic loci, but this association is much weaker for some pairs of antigens than that observed in N. meningitidis42. A final challenge arises because in pathogens with low or moderate rates of recombination, some nonrandom associations between alleles at different loci may exist because of clonal descent rather than because of selection acting to structure such variation42, although the non-overlapping pattern of antigens would be unlikely to be generated by chance among antigenic loci with many allelic variants. Stochastic mathematical models suggest that complex antigen patterns of overlapping structure are expected under conditions of moderate immune selection when there is competition for susceptible hosts or antigenic loci have variable levels of diversity35,43. However, more theoretical work is needed to quantify how much non-overlap of antigen repertoires is expected under different scenarios and to definitively separate clonal descent from selection as an explanation.

Hypothesis 2: additive effects of host immunity

The variation of multiple pathogen antigens to simultaneously escape from multiple immune responses is most puzzling if each immune response targeting a single antigen is individually lethal — that is, if a pathogen expressing one or more antigenic variants that have been seen before by a host is unable to infect that host. For several pathogen species, this seems not to be the case. Empirical evidence from pneumococcal colonization indicates that a single exposure to a particular strain does not provide sterilizing immunity to other strains that share one or more antigenic variants with the first strain, although it may reduce the rate of acquisition and/or increase the rate of clearance44,45,46. Thus, part of the answer to the puzzle of multi-locus antigenic diversity may be that varying each antigen individually provides an incremental advantage to a pathogen strain encountering a host who has previously developed immunity to a strain that shares several alleles with the present strain. To illustrate, we consider three antigenic loci, A, B and C, each of which has two variants (Fig. 2a). In this setting, diversification at each of these loci increases the antigenic distance between strains, such that A1B1C1 and A2B2C2 are the farthest apart (Fig. 2a). Under this hypothesis, increasing the antigenic distance incrementally increases the probability that a pathogen strain can infect a previously infected host (Fig. 2b). Intuitively, it seems that this could support antigenic variation at multiple sites, as variation at each site would contribute to pathogen fitness in a previously exposed host.

Fig. 2: Hypothesis 2: additive effects of host immunity.
figure 2

a | We consider three antigenic loci, A, B and C, each of which has two variants, such that A1, A2, B1, B2, C1 and C2 together make up the antigenic space for the example pathogen. Sequence diversification of these loci increases the antigenic distance between strains, such that a pathogen of genotype A2B2C2, which is marked by antigenic diversity in all three loci, has the greatest antigenic distance from a pathogen of genotype A1B1C1. b | Increases in antigenic distance between strains incrementally increase the probability that a pathogen strain can infect a previously infected host. Although the relationship shown here between antigenic distance and probability of infection is linear, it could also be nonlinear — for example, as a result of a synergistic relationship between immune effectors such as antibodies101.

Hypothesis 3: variation at one locus affects responses to other loci

As stated, the puzzling observations that motivate this article presume that variation at one antigenic locus decreases the effectiveness of immune responses directed at that locus but not at other loci. However, if changes at one locus could reduce the effectiveness of immune responses directed at several loci, that single change would have a clear selective advantage even when responses are still directed at other loci. Under this model, we again consider three antigenic loci, A, B and C. For simplicity, locus A has two variants (A1 and A2), whereas locus B and locus C each have a single variant (B1 and C1, respectively) (Fig. 3a). A host who has previously experienced infection by a pathogen of genotype A1B1C1 generates specific immunity against loci A1, B1 and C1, thereby becoming immune to subsequent infection by a pathogen with the same genotype (Fig. 3b). Under hypothesis 3, variation of the antigenic locus A reduces the effectiveness of immune responses against additional loci by interfering with the access or effectiveness of antibodies against locus B and locus C. In these circumstances, a host who has been previously infected with a pathogen of genotype A1B1C1 would be susceptible to a pathogen of genotype A2B1C1, despite pre-existing immunity to loci B1 and C1 (Fig. 3b).

Fig. 3: Hypothesis 3: variation at one locus affects responses to other loci.
figure 3

a | We consider three antigenic loci, A, B and C. Locus A has two variants (A1 and A2), whereas locus B and locus C have a single variant each (B1 and C1, respectively). Therefore, A1, A2, B1 and C1 together make up the antigenic space for the example pathogen. b | A host who has previously been infected by a pathogen of genotype A1B1C1 generates specific immunity against loci A1, B1 and C1 and is therefore immune to subsequent infection by a pathogen with the same genotype. Under the model of hypothesis 3, the single variable antigenic locus A with two variants (A1 and A2) could reduce the effectiveness of immunity against additional loci (locus B and locus C), thereby rendering the A1B1C1-immune host susceptible to any pathogen with a genotype containing A2, despite pre-existing immunity to loci B and C.

Studies of the haemagglutinin (HA) protein of influenza A virus provide an example of how this may occur47. The HA protein, which is displayed on the surface of influenza A virus particles, binds sialic acid on the host cell surface and thereby promotes viral entry into the host cell48. HA is the most important target of antibody-mediated immunity (both natural and vaccine-induced immune responses), and it is also the fastest-evolving antigen of influenza A virus. Mutational changes within the HA protein become fixed in the virus population between successive years of influenza A virus epidemics, which enables individuals to be repeatedly infected with the same virus subtype49. It is typically posited that these amino acid changes in the HA protein enable repeated infection by allowing the virus to escape from neutralizing antibodies that can occlude the receptor-binding site and impede viral entry into the host cell50. However, data suggest that virus escape from the polyclonal antibody response that is observed in human hosts would be unusual, as the number of virus mutants with simultaneous point mutations at multiple epitopes of HA seems to be low51. By what mechanism, then, does sequence diversification in the HA protein of influenza A virus promote immune escape and enable repeated human infection?

A possible explanation is that antibody escape can occur without individually changing the antibody-binding affinity of each targeted pathogen epitope, by instead altering the kinetics of host receptor binding by the virus. The immunogenic regions of HA fall within the globular domain of the protein, a 3D structure that is crucial for its biological function of binding to the sialic acid receptor on host cell surfaces50. Mutations that increase the avidity of HA for the sialic acid receptor could facilitate rapid binding to host cells before antibodies can access the epitope regions of the molecule. Consequently, a single change to HA that promotes high receptor avidity would enable escape from the effects of polyclonal antibodies even in the absence of diversification in other epitopes47. Indeed, the data are consistent with this idea, as they show a positive correlation between receptor–HA binding avidity and escape from antibody responses47. If this model is generally applicable, it could solve the paradox of escaping from multiple antibody responses because a single change to HA that increases receptor-binding avidity could lead to a simultaneous reduction in the effectiveness of antibodies directed at all epitopes of HA. Little is known about the evolutionary or physical constraints on the receptor avidity of HA, but these would undoubtedly have an important role in limiting the extent of immune escape via this mechanism52. One could imagine in other systems that a mutation that downregulates expression of a target could have a similar function of reducing the effectiveness of all immune responses directed at that target. We are unaware of examples or tests of this hypothesis related to natural immunity, although one report suggests that downregulated expression of meningococcal capsule may be a means of escape from vaccine-induced, polyclonal antibodies directed at the capsule53.

Hypothesis 4: host heterogeneity

Host heterogeneity in terms of the pathogen antigens that are targeted by immune responses may provide another answer to the puzzle of multi-locus antigenic diversity. If individual hosts tend to respond primarily to one or a few pathogen antigens but different hosts focus responses on different antigens, the variability of pathogens at multiple loci may reflect adaptation to their heterogeneous hosts, with variation at each locus being most important for adaptation to the hosts who most strongly respond to the antigen encoded by that locus. An example of host heterogeneity that is relevant in this context could be diversity within the HLA locus and differences in the host immune repertoire54,55,56,57. This pattern of host heterogeneity in immune recognition has been found in human antibody responses during tuberculosis infection, where the identity of serologically reactive antigens differs greatly from individual to individual58. It has been proposed that mechanisms regulating antigen availability and antigen presentation, as well as the process of immune exhaustion, underlie these between-host differences and shape the dynamics of individual host immune responses59,60,61,62,63 (for a more in-depth discussion of this topic, see Box 2). Another example of host heterogeneity is the existence of predominant germline mutations that are shared among some individuals and that specifically target functional antigens. For example, the IGHV1-69 germline mutation is associated with broadly neutralizing antibodies against the HA stem region of the influenza A virus, and the IGHV1-2*02 germline mutation is associated with antibodies against the gp120 CD4-binding site of HIV-1 (refs64,65,66).

Other, non-genetic factors, such as age, may add further diversity to the selective environments that are imposed by hosts on pathogens. For example, very young, old or immunocompromised hosts typically mount weak immune responses, whereas other age groups generate immunologically more robust and efficacious responses that clear the infection quickly67,68,69. To the extent that age-related changes affect immune responses to different antigens in different hosts, this could be an additional source of host heterogeneity that provides pressure to vary different antigenic loci of the pathogen.

Hypothesis 5: selection for functional diversity

A final candidate hypothesis to resolve these puzzles is that antigenic diversity of pathogens reflects selection for functional diversity rather than, or in addition to, immune escape. This alternative hypothesis addresses some examples of antigens that undergo sequence diversification without measurably escaping immunity70. A good example of this is the pneumococcal surface protein C (PspC) of S. pneumoniae — a highly polymorphic protein that exists as multiple antigenic variants, each of which is marked by extensive sequence polymorphism and structural variation71,72. Like other surface antigens, PspC is commonly assumed to be under diversifying selection to escape antibody-mediated immune responses. This conjecture seems to be supported by recent findings that individuals with strong antibody responses to a particular variant of PspC are more likely to carry a pneumococcal strain expressing a different PspC variant44. Despite this, mechanistic studies do not always show measurable specificity of PspC-targeted antibody responses. For example, in a study of three recombinant PspC immunogens, PspC-binding antibodies were highly variant specific in one case but substantially less so in others70. Antibodies to PspC8 were shown to bind, by western blot and flow cytometry, only to a bacterial strain containing that allele. By contrast, antibodies to PspC3 also bound to multiple other PspC alleles, and antibodies to PspC4 also showed binding to PspA, a different antigenic protein with strong homology to PspC70. These findings of cross-reactive antibody responses against different PspC variants and the homologous PspA protein cast doubt on the classical immune selection hypothesis as being the sole explanation for antigenic diversity in S. pneumoniae.

If immune escape is not the only mechanism driving diversification of PspC, then an alternative mechanism is selection for functional diversity. This hypothesis is supported by data showing that variation across PspC alleles reflects between-variant differences in the ability to bind the host immunomodulatory factor H complement protein73,74,75. Similarly, for influenza virus HA, selection for functional diversity in receptor binding activity may be involved because single mutations in HA have been shown to affect both antigenicity and receptor-binding avidity76. If selection is maintaining functional diversity in pathogen molecules that are also antigenic, then what mechanisms underlie this selection? At present this is unknown, but some form of host heterogeneity — now referring to host heterogeneity in factors that would favour particular functional variants, rather than particular immune variants of pathogen antigens — must underlie such selection.

Pathogens with low antigenic diversity

The hypotheses presented so far have all dealt with antigenically diverse pathogen species, and our questions have related to why such high levels of diversity should emerge and be maintained. However, given the many advantages of evading host immune responses that are discussed, an important part of the puzzle also relates to the other end of the spectrum in considering why some successful pathogens have almost no antigenic diversity. There are a few notable examples of antigenically stable pathogens, including viruses such as those causing smallpox, mumps, rubella and measles and several important bacterial pathogens. The basis for the absence of antigenic diversity in this group is incompletely understood. We discuss here two pathogens of public health importance in which antigenic conservation is common, namely, Mycobacterium tuberculosis and the measles virus.

M. tuberculosis does not seem to use antigenic diversity as a mechanism of immune evasion. Two recent studies confirm that antigenic diversity is almost absent in M. tuberculosis, even among phylogenetically divergent strains77,78. A possible explanation for this is that there is positive selection for epitope conservation, probably driven by a fitness benefit to epitope recognition by the host immune system for M. tuberculosis79. One possibility is that this fitness benefit is mediated through increased transmission rates as an outcome of immune-driven lung tissue damage in the presence of activated T cells.

Similarly, the measles virus is antigenically stable, and a single exposure to measles virus confers lifelong immunity80. However, in vitro experiments show that neutralizing monoclonal antibodies can select for escape variants in measles virus and suggest that, in laboratory settings at least, the virus has the capacity to diversify in the presence of immune pressure81. Why, then, does measles virus not exhibit antigenic diversity during natural infection? A possible explanation is that the polyclonal nature of host immunity, which simultaneously targets multiple antigens, diminishes the benefit of antigenic diversity in measles virus80. Effectively, it seems that measles virus does indeed suffer from being targeted by multiple immune responses that are individually highly effective, which means that it cannot escape them all. It is perhaps the best example of why the highly variable nature of other pathogens is indeed a puzzle in terms of immune escape.

From an evolutionary perspective, an alternative explanation is that antigenic diversity may be less beneficial for highly transmissible viruses such as measles and that, instead, a high reproductive number may be more important for these pathogen groups82. Therefore, it is plausible that in a trade-off between antigenic diversity and high reproductive number, such viruses may favour specialization in terms of the latter. A theoretical study has suggested that certain features of the transmission network, as well as overall infectiousness, may promote such specialization in measles virus and other childhood infections9. An important limitation of that model, however, is that it does not explicitly consider reinfection of partially immune hosts but focuses on the pattern of within-host diversification of the virus and its impact on transmission in a naive population as measured by the basic reproductive number.

Other mechanisms of immune evasion

Pathogens use various strategies to evade or subvert the host immune response. Antigenic variation, latency, resistance to immune effector mechanisms and suppression of the immune response all contribute to immune evasion and, indeed, many pathogens use one or a combination of these strategies to subvert host immunity. From the point of view of the host, each of these mechanisms contributes to the same result — that the pathogen avoids normal host defence mechanisms and causes recurrent or persistent disease. For example, S. pneumoniae and N. meningitidis use strategies to circumvent complement-mediated immunity by limiting complement-mediated opsonization and subsequent phagocytosis by host immune cells83,84, which probably allows these bacteria to persist during nasopharyngeal carriage and thus increases their fitness. Similarly, P. falciparum uses additional immune evasion strategies (other than antigenic variation) to subvert host immunity; these include dendritic cell modulation, apoptosis of parasite-specific T cells and activation of T cell inhibitory pathways85. It is worth pointing out that some of these other immune mechanisms may affect some of the observations made in relation to pathogen antigenic variation that we have discussed here. Thus, any of these processes may directly or indirectly promote genetic or phenotypic variability in pathogen antigens and favour different variants with improved functionality. In this context, antigenic diversity may be an unintentional outcome of host–pathogen interaction rather than of immune pressure.

Concluding remarks

More work remains to be done on evolutionary models of antigenic variability to understand the set of biologically plausible assumptions under which multiple antigens can experience diversifying selection even in the presence of multiple immune responses. For example, many of the classical models of immune selection make the assumption that a pathogen expressing one or more antigenic variants that have been seen before by a host is unable to infect that host or has a much reduced probability of doing so. However, for many pathogens, as we discuss in relation to hypothesis 2, a host with immunity to several antigenic variants may still be readily infected or colonized by a strain expressing these variants because immunity is partial even to homologous variants. We have found, for example, that mice exposed to pneumococci can be recolonized by the identical strain despite having generated an immune response; the duration of that colonization episode declines with repeated exposures but is not zero even after repeated colonization with the identical strain86. In epidemiological data, immunity to repeated colonization with identical serotypes is only partial, perhaps reducing the probability of acquisition by ~30% for many serotypes34,45. Similarly, weak epidemiological protection seems to occur for variant-specific responses to PspC44. Future mathematical modelling should explore the evolutionary dynamics of multiple antigens under weak diversifying selection, whereby each variant that escapes existing immune responses modestly increases the fitness of the strain in an additive manner.

From a practical perspective, a better understanding of cross reactivity in the context of variants of a single antigen can be valuable in advancing our efforts to find associations between the expression of specific antigenic variants and the generation of protective immune responses. In turn, this association could inform the future design and development of efficacious vaccines against highly variable pathogens. To uncover nuances of cross reactivity, we argue for the use of truncated proteins as immunogens to reflect the variable and unique antigenic portions of each antigen. This is a promising approach that has already been successfully applied in some contexts and permits the generation of variant-specific antibodies that can distinguish between antigenic variants in both binding and functional assays73,87. Epitope mapping could be another valuable approach as highly parallel assays for serological reactivity to individual protein fragments and peptides become more accessible42,88. In addition, more detailed mapping of antibody binding could help to disentangle epitope-specific immune responses from cross reactivity against whole antigens. A better understanding of antibody function and the design of functional assays would further address these limitations.

Another challenge in understanding antibody cross reactivity lies in finding the right tools to study it. Typically, in vitro techniques, such as western blotting, are used to measure binding between antigenic variants and their antibodies to evaluate patterns of cross reactivity. These assays are fairly crude, use denatured proteins and assess binding in isolation, away from the true physiological environment in which these interactions occur. Therefore, these in vitro techniques may be too blunt to detect biologically meaningful specificity in immune responses, and results should be critically evaluated on a case-by-case basis.

A strong body of research suggests that host immunity has a role in the maintenance of antigenic diversity. However, some questions remain about the interplay between host immune pressure and sequence diversification in variable antigens. From the host side, it would be interesting to dissect the differences, if any, between the repertoires of antibody responses to different antigenic variants. From the pathogen side, it would be useful to know how much distance in terms of sequence diversity between antigenic variants translates into immunologically distinct antigenic targets. Many questions remain but, going forward, both experimental work and theoretical models can be invaluable tools in deciphering these puzzles and advancing our knowledge of antigenic diversity and its biological principles.