Individuals display variable ability to fight infections, as well as variable susceptibility to inflammatory and autoimmune diseases. Evidence that has accumulated since the 1950s suggests that such heterogeneity partly reflects differences in the genetic make-up of the human host1,2,3. Particularly in the past decade, genetic studies have provided numerous examples of genes accounting for differences in susceptibility to rare or common infectious diseases. The advent of new technologies, such as DNA microarrays and next-generation sequencing, has greatly accelerated the field of human genetics, making it possible to evaluate the contribution of genetic diversity to differences in immunity to infection at the level of the entire genome. For example, recent studies have revealed the power of whole-exome sequencing for dissecting the immunological mechanisms that underlie the pathogenesis of severe, rare infectious diseases4,5,6. Likewise, some genome-wide association studies on viral, bacterial and parasitic infections have increased knowledge on the genetic basis of susceptibility to common infectious diseases at the population level3,7. Although these studies have provided the proof-of-concept that increased susceptibility to infectious diseases may result from various types of inborn errors of immunity, only a few cases are fully understood at the genetic level1.

Population genetics can be applied to investigate how natural selection has shaped the variability of host defence genes in the human population8,9,10. The fields of comparative immunology and evolutionary immunobiology have greatly improved our knowledge of the origins and evolution of innate and adaptive immune systems in multiple organisms11,12. Here, we do not attempt to review these topics, but instead focus on how humans have adapted to the pressure imposed by microorganisms through variation in their genomes. Among the different types of adaptation, those affecting immune function are among the most dynamic because the counter-evolution of pathogens drives the need for continuous adaptive change (Box 1). It is precisely because of this strong and ever-renewing demand for adaptation that genes in the immune system acquire unique and clear signatures of natural selection.

Innate immunity is at the front line of host defence against pathogens and is also important for the symbiotic partnership between the host and its microbiota. Thus, innate immunity-associated genes provide an excellent model for the study of the selective pressure that is exerted by microorganisms on the host genome. The innate immune systems of plants, invertebrates and vertebrates involve genes that can be classified into various groups on the basis of their function in immune surveillance, signal transduction or effector response to microorganisms13,14,15,16, and these genes have evolved under the constant environmental pressure of microorganisms. In addition, in vertebrates, innate immunity precedes and shapes adaptive immune responses, so variations in vertebrate innate immunity genes may have important consequences for the quantity and quality of downstream B and T cell responses. Thus, the crucial roles of the proteins encoded by innate immunity genes make them ideal targets of natural selection and valuable tools for population genetics studies.

With the emergence of data sets of genomic variation in an increasing number of human populations17,18, we can now scan genomes for signatures of selection and exploit them to identify the innate immune mechanisms that have a major biological role in host defence. This Review summarizes some of the major findings regarding how selection has shaped the evolution of several families of innate immunity genes in humans and underscores how selection has refined human antimicrobial defences. We also describe how population genetics studies have provided important insights in terms of the biological relevance of the mechanisms triggered by innate immunity molecules, and we discuss the contrasting selection patterns detected in innate immune genes involved in single-gene and complex disease risk.

Natural selection in the human genome

The different modes of selection. There are different forms of selection (Fig. 1), and each of them leaves a distinctive molecular signature in the targeted genomic region. Such signatures can be detected by an increasing number of statistical tests10,19 (Box 2). Selection often maintains the prototype gene by eliminating mutations that compromise organism fitness, as most mutations are more likely to perturb rather than improve protein function. The process by which deleterious mutations are culled from the population is called purifying selection or negative selection and is the most pervasive form of selection acting on genomes. At the population level, genes tend to carry the same number of synonymous mutations (no amino acid change) and non-synonymous mutations (amino acid change), despite the fact that random mutagenesis actually generates nearly twice as many non-synonymous changes. The reduced number of non-synonymous single-nucleotide polymorphisms (SNPs) observed as compared with the non-synonymous mutation rate indicates the elimination of many non-synonymous mutations through purifying selection20.

Figure 1: Types of selection and their legacy on the human genome.
figure 1

The evolutionary fate of different types of mutations is represented in a sample of eight chromosomes. Blue circles indicate neutral polymorphisms. a | Purifying selection removes deleterious alleles (indicated by black circles) from the population. The pace at which deleterious mutations are purged from the population depends on their effect on host survival, which can range from lethal (immediately removed from the population) to mildly deleterious (tolerated but kept at low population frequencies). These mutations tend to be associated with rare, severe disorders (for example, Mendelian susceptibility to infection at the individual level). b | Positive selection increases the frequency of an advantageous mutation (indicated by a red circle) in the population. Advantageous mutations can be fixed (completed selective sweep) or polymorphic (ongoing selective sweep; not shown) in the population. Positively selected mutations are often associated with common traits (for example, higher resistance to infection at the population level), which present complex modes of inheritance. c | Balancing selection maintains polymorphism in the population as a result of heterozygote advantage and frequency-dependent advantage (not shown). In the illustrated example, a mutation (indicated by a purple circle) confers a selective advantage at the heterozygote state, so individuals who are heterozygous at this particular position (for example, individuals who possess the anaemia-associated haemoglobin D (HbS) allele sickle-cell variant and are exposed to Plasmodium falciparum) have a greater fitness than homozygous individuals.

Selection may also occur when a novel mutation is favourable in a population, which results in an increase in the frequency of the mutation. This is referred to as positive Darwinian selection, and it is thought to be one of the ways in which adaptive evolution occurs. The spread of the lactase allele, which is associated with the ability to digest milk as adults, in various human populations that adopted dairying practices is an emblematic example of positive selection21. It is also possible that variants have an advantage when rare in the population (frequency-dependent selection) or that heterozygotes for a specific mutation have the highest fitness (heterozygote advantage). These two situations give rise to balancing selection22: a selective regime that maintains variation in the population. A prime example of balancing selection is provided by the MHC region, in which high levels of polymorphism are preserved through various mechanisms, including sexual selection and pathogen diversity23,24,25,26.

The legacy of selection on our genomes. Scanning the human genome for balancing selection events is challenging and, except for the MHC locus, few genes show the molecular signature of this form of selection27.

Interspecies and intraspecies studies of protein-coding sequences have helped to identify proteins that are rapidly evolving as a result of positive selection. One of the first genome-wide selection scans compared protein-coding sequences from the human, chimpanzee and mouse genomes, and found that immunity-related genes are among the functional classes that display the clearest evidence of positive selection28. Subsequent studies have confirmed these early observations and indicated essential immunity-related genes that have undergone strong recurrent changes in both humans and non-human primates29,30,31. Similarly, interspecies studies have established that extensive purifying selection has prevented the accumulation of functional diversity on genes throughout the genome, including many genes involved in innate immunity31.

In contrast to interspecies studies, which detect old selective events, population genetic approaches (that is, intraspecific studies) detect more recent selective events within a species by making use of polymorphism data. The advantage of intraspecific approaches is that they study selection within human populations and pinpoint functionally important genomic regions that may account for phenotypic variation in health and disease. The need to secure a large collection of genome-wide polymorphism data gave rise to the International HapMap Project17 and the more recent launch of the 1000 Genomes Project18,32, which enable the detection of selection with the greatest resolution.

Numerous genome-wide selection scans in humans have been performed to date (reviewed in Ref. 33). Similarly to interspecies studies, intraspecific approaches have shown that positively selected regions of the genome are often involved in immunity34,35,36,37. More than 300 immunity-related genes were detected as putative targets of positive selection8, although a fraction of them could be spurious signals and further validation is needed33. Nevertheless, these observations suggest that functional variations in a large proportion of immunity genes have conferred a specific selective advantage for host survival, including protection from pathogens and tolerance to microbiota.

Selection and innate immunity in humans

Besides genome-wide scans, another highly effective means of gaining insight into the role of selection on immune processes is to take a targeted approach that focuses on genes with particular immune functions: such an approach would make subsequent functional investigations more feasible. Innate immunity involves the coordinated action of families of receptors, known as pattern-recognition receptors (PRRs) or microbial sensors, that respond to a wide range of microorganisms through the detection of specific, conserved microbial patterns or molecules14,15,38. The biological importance of these microbial sensors is highlighted by the fact that many of them share remarkable structural and functional similarities among vertebrates, invertebrates and plants16 (Box 3). In humans, several receptor families are involved in the cellular arm of innate immunity, including the membrane-bound Toll-like receptors (TLRs) and C-type lectin receptors (CLRs), and the cytosolic RIG-I-like receptors (RLRs), NOD-like receptors (NLRs) and other DNA sensors39,40,41,42,43. In addition, complement receptors and ficolins are circulating proteins that constitute the humoral arm of innate immunity44. Following ligand binding, receptors induce the activation of distinct signalling pathways that involve adaptor molecules, which mediate signalling, and effector molecules, such as interferons (IFNs) or antimicrobial peptides (AMPs), which are required for the eradication of pathogens or danger signals.

In this section, we discuss the major differences in the evolution of some families of innate receptors and their downstream molecules. We focus on the genes that have been most studied by population genetic analyses in humans and closely related primate species. Moreover, we explain how the observed differences in the intensity and form of selection can inform us about the functional relevance of these innate immunity genes8,9,10,45. Notably, genes targeted by strong purifying selection are likely to fulfil functions that are essential. By contrast, genes evolving under more relaxed constraints (such as weaker purifying selection or neutrality) may have variably redundant functions in immune signalling. Finally, genes targeted by positive or balancing selection are possibly involved in mechanisms in which genetic variation has conferred a selective advantage to the population.

Membrane-bound receptors: the case of TLRs. Population genetics studies have provided important insights in terms of the evolution and function of TLRs45,46,47,48,49,50,51,52,53. In humans, TLRs are broadly subdivided into two groups: those primarily expressed at the cell surface (TLR1, TLR2, TLR4, TLR5, TLR6 and TLR10), which detect predominantly pathogen-associated molecular patterns from bacteria, fungi and protozoa, and those expressed within endosomes (TLR3, TLR7, TLR8 and TLR9), which primarily recognize nucleic acids from viruses and bacteria14,38,39.

Comparative interspecies studies have shown that species-wide positive selection has accelerated the divergence of some TLRs among primates, with the strongest evidence obtained for TLR1 and TLR4 (Refs 48, 51, 53). These two TLRs contain the highest number of positively selected codons, several of which are involved in ligand binding or in interaction with the TLR4 co-receptor MD2 (also known as LY96)51. This high interspecies divergence suggests that the presence or absence of pathogens that are sensed by TLR1 and TLR4 may be one of the most important variants in the ecological niches of the different primate species.

Within species, and in humans in particular, balancing selection was initially proposed to be pervasive among innate immunity genes52, including some TLRs. However, most intraspecific studies thus far converge towards the opposite conclusion: TLRs, taken as a set, have mainly evolved under the action of purifying selection46,47,49,51, which highlights their indispensability. More precisely, intracellular TLRs evolve under the strongest purifying selection (Fig. 2), and neither nonsense nor damaging missense mutations are tolerated at these genes46. By contrast, the selective constraints on cell-surface TLRs are much more relaxed, and these TLRs present high levels of genetic and functional diversity across populations46,54. In the human population, up to 23% of individuals have damaging missense mutations and up to 16% present a nonsense mutation in at least one cell-surface TLR46,50, suggesting higher redundancy. The case of TLR5 — the receptor of flagellin — is particularly extreme, as the R392X nonsense variant, which has a dominant-negative effect55, is present at high frequencies (up to 23%) in Europeans and South Asians46,50. This suggests that TLR5 is not crucial for immunity to infection against at least some flagellated bacteria, and that accessory mechanisms of flagellin recognition, such as those involving the cytosolic receptor Ice protease-activating factor (IPAF; also known as NLRC4)56, may provide sufficient protection.

Figure 2: Evolutionary dynamics and biological relevance of innate immunity genes.
figure 2

Genes that have undergone purifying selection are shown in red, and those that have evolved under weaker selective constraints are shown in blue. Genes for which no significant evidence of a selective constraint was observed are shown in grey. Colours reflect the intensity of the selective constraints on amino acid-altering variation, as obtained by the McDonald–Kreitman Poisson random field method McDonald-Kretiman Poisson random field method (omega and gamma)31 (Box 2). Genes presenting robust signatures of positive selection in all humans are outlined with a thick black line, whereas genes presenting robust signatures of positive selection that are restricted to specific human populations are outlined with a dashed black line. Endosomal Toll-like receptors (TLRs) show signs of stronger purifying selection than do cell-surface TLRs. Myeloid differentiation primary response protein 88 (MYD88) is the TLR adaptor molecule that has evolved under the strongest purifying selection, which indicates its central role as a pan-adaptor molecule. Furthermore, all adaptors have been targeted by positive selection, either in the entire human lineage (MYD88 and sterile alpha and TIR motif-containing protein (SARM)) or in specific human populations (TIR domain-containing adapter molecule 1 (TRIF), TIR domain-containing adapter molecule 2 (TRAM) and Toll/interleukin-1 receptor domain-containing adapter protein (MAL)), suggesting advantages in terms of immunity that are shared by all humans or that are due to geographically restricted microbial exposure, respectively. Purifying selection has driven the evolution of most NACHT, LRR and pyrin domain-containing proteins (NALPs), whereas other cytosolic microbial sensors, such as the NOD-like receptors (NLRs), Ice protease-activating factor (IPAF) and MHC class II transactivator (CIITA) and most NOD subfamily members, as well as the RIG-I-like receptors (RLRs), have evolved under weaker constraints. NLR family, apoptosis inhibitory protein (NAIP) is not represented in the figure as no population genetic data are available. DAMPs, damage-associated molecular pattern molecules; HA, haemagglutinin; IFNs, interferons; IRFs, IFN-regulatory factors; LAM, lipoarabinomannan; LPS, lipopolysaccharide; LTA, lipoteichoic acid; MAPK, mitogen-activated protein kinase; MDA5, melanoma differentiation-associated protein 5; NF-κB, nuclear factor-κB; NLRC, NLR family CARD domain containing; NLRX1, NLR family member X1; NOD, nucleotide-binding oligomerization domain-containing; PG, peptidoglycan; PLM, phospholipomannan; RIG-I, retinoic acid-inducible gene I protein; tGPI, Trypanosoma cruzi-derived glycosylphosphatidylinositol.

The evolutionary relaxation characterizing some TLRs appears to be specific to humans when compared with other primates51. This observation has been associated with reductions in the size of human populations (during the out-of-Africa migration), which compromised the efficiency of purifying selection and resulted in an increased frequency of deleterious alleles57. However, it is tempting to speculate that such an evolutionary relaxation might also indicate that pathogens recognized by cell-surface TLRs have imposed a milder burden on humans than on other primates. In some cases, genetic variation at cell-surface TLRs may be advantageous for the host, as attested by the signature of positive selection detected in the TLR6TLR1TLR10 cluster (Box 4).

Together, the major differences in the intensity of selection between intracellular and cell-surface TLRs indicate that the two groups differ in their biological relevance, with intracellular TLRs being essential and cell-surface TLRs being more redundant. This evolutionary dichotomy is likely to reflect the differences of the two TLR groups in terms of their subcellular localization, the type of microorganisms and ligands that they target, and their self-reactive potential14,38,39. Cell-surface TLRs detect microbial molecules, such as bacterial cell-surface lipopolysaccharides or flagellin, which are usually very distinct from host molecules. By contrast, intracellular TLRs principally sense nucleic acids, such as double-stranded RNA of viruses or the unmethylated CpG islands of bacterial and viral DNA. Although these microbial nucleic acids can be distinguished from host nucleic acids on the basis of specific chemical modifications58, there is still the risk that intracellular TLRs could be stimulated 'inappropriately' by self-derived nucleic acids, which may lead to autoimmunity59,60,61,62. Thus, the extreme conservation of intracellular TLRs might reflect the very narrow window they have to ensure effective pathogen sensing while preventing dangerous recognition of host nucleic acids and thereby minimizing the risk of autoimmunity.

Cytosolic receptors: the RLRs and NLRs. Recent population genetics data have increased our understanding of the relative importance of the different members of the RLR and NLR families63,64,65. Similarly to intracellular TLRs, RLRs and NLRs detect signs of infection or danger within the cell14,40,41,42. The three RLR members, retinoic acid-inducible gene I protein (RIG-I; also known as DDX58), melanoma differentiation-associated protein 5 (MDA5; also known as IFIH1) and LGP2 (also known as DHX58), detect viral RNA and induce inflammatory cytokines and type I and type III IFNs41. RIG-I and MDA5 sense directly viral molecules, whereas LGP2 acts as a regulator of RIG-I and MDA5 signalling. The NLR proteins, which have a structure that closely resembles that of the resistance proteins (also known as R proteins) in plants16, are encoded by a family of 22 genes, which include the 14 members of the large NACHT, LRR and pyrin domain-containing protein (NALP) subfamily (NALP1–NALP14), the five nucleotide-binding oligomerization domain-containing (NOD) proteins (NOD1, NOD2, NLR family CARD domain containing 3 (NLRC3), NLRC5 and NLR family member X1 (NLRX1)) and other proteins, such as MHC class II transactivator (CIITA), IPAF and NLR family, apoptosis inhibitory protein (NAIP)40,42. NLRs either activate nuclear factor-κB and mitogen-activated protein kinases to induce inflammatory responses or participate in large cytoplasmic protein complexes called inflammasomes40,42.

RLRs evolve under weaker constraints (weak purifying selection or neutrality) compared with other families of innate receptors, and this suggests that there is some degree of redundancy in their roles (Fig. 2). Within the RLR family, RIG-I displays the most constrained amino acid-altering variation, particularly in the carboxy-terminal and helicase domains (which are involved in RNA recognition)65. This may reflect the fact that RIG-I senses a broader range of RNA viruses than MDA5, which only recognizes picornaviruses. Moreover, the particular structure of RIG-I viral substrates (short double-stranded RNAs and 5′-triphosphate single-stranded RNAs)41 may have stricter binding requirements than MDA5 substrates (long double-stranded RNAs).

Among NLRs, most NALPs have evolved under strong purifying selection (10 of the 14 NALPs)64 (Fig. 2), reflecting the rapid elimination from the population of mutations at these genes as a result of their highly deleterious effects. These observations are consistent with the idea that NALPs have acquired a function that is essential and non-redundant in host survival. By contrast, the weaker selective constraints characterizing IPAF, CIITA and most NOD subfamily members64 testify to their higher degree of redundancy.

In some cases, certain genetic variants at some RLRs or NLRs (for example MDA5, LGP2, NALP1, NALP14 or CIITA) have been targeted by positive selection in specific human populations, suggesting that functional variation at these genes and the resulting changes in the downstream immunological mechanisms have allowed for increased host survival under specific environmental pressures63,64,65. Taken together, population genetics studies have shown that the NALPs have evolved under much stronger selective constraints than other NLR members and RLRs, and the extreme NALP conservation may reflect the type of the ligands that are likely to be sensed by these molecules and/or the important nature of the responses triggered by them. These hypotheses need now to be experimentally validated.

Soluble receptors: the humoral arm. In contrast to the cellular receptors described above, there are only scarce population genetics studies on secreted receptors, such as complement receptors, collectins, ficolins and pentraxins, which are all involved in highly diverse functions (for example, phagocytosis, opsonization and apoptosis)44,66,67. However, the case of the mannose-binding lectin (MBL) — an extensively-studied collectin that binds a broad range of microorganisms and activates the lectin–complement pathway66,68 — neatly illustrates the value of the population genetics approach for determining the ecological relevance of genes and mechanisms in immunity to infection. Conflicting results have been obtained by clinical and epidemiological genetic studies as to the detrimental or beneficial effects of MBL deficiency69,70. MBL deficiency has been associated, although not conclusively, with increased susceptibility to several infectious diseases, such as meningococcal infection or HIV infection71. However, MBL-deficiency alleles are common worldwide (they occur in up to 30% of individuals)69,72,73, suggesting that they may also have a protective effect, as reported for infections with intracellular pathogens such as Mycobacterium tuberculosis and Leishmania spp.74,75.

Some population genetics studies have proposed that balancing selection has driven MBL2 diversity47,72, but others have failed to detect any evidence of selection favouring the increased frequency of deficiency alleles73,76. Indeed, when analysing larger population data sets, the pattern of MBL2 variation appears to be consistent with neutral evolution73. These population genetics observations provide novel insights that inform the long-standing and highly controversial debate as to whether MBL is protective, deleterious or both, and they suggest instead that the immunological mechanisms triggered by this lectin are largely redundant. So, other molecules or pathways, such as the ficolins or the C1q-dependent classical pathway, may compensate for MBL deficiency. Future studies exploring whether the genes involved in these pathways have compensatory mutations in MBL-deficient individuals should help to substantiate this hypothesis.

Adaptor molecules: the TIR domain-containing adaptors. Linking the evolutionary fate of innate immune receptors with their corresponding adaptors, which associate with receptor proteins and initiate downstream signalling14, can further increase our knowledge of the biological relevance of specific immune pathways. This is best exemplified by population genetics data on the five Toll/IL-1R (TIR) domain-containing adaptors: myeloid differentiation primary response protein 88 (MYD88), Toll/interleukin-1 receptor domain-containing adapter protein (MAL; also known as TIRAP), TIR domain-containing adapter molecule 1 (TRIF; also known as TICAM1), TIR domain-containing adapter molecule 2 (TRAM; also known as TICAM2) and sterile alpha and TIR motif-containing protein (SARM), which are involved in the initiation of signalling downstream of TLRs and IL-1R-like receptors14,77. Across primates, the five adaptors have evolved under strong selective constraints48. In humans, MYD88 and TRIF display the highest degree of purifying selection (Fig. 2), indicating that the signals mediated by these two molecules are essential and non-redundant78. Conversely, MAL, TRAM and SARM are subject to more relaxed constraints, which indicates that their functions are more dispensable. The selective regime that has constrained the evolution of adaptor proteins to different extents has not prevented these genes from undergoing some episodes of positive selection78,79.

Collectively, the integration of population genetics data from the TLRs and TIR-containing adaptors has shed light on the mechanisms and pathways triggered by these molecules78 (Fig. 2). MYD88 has a pan-adaptor role77, so a strong constraint is expected, but TRIF has also evolved under strong purifying selection despite not functioning as a pan-adaptor. TRIF is involved in signalling downstream of TLR3 and TLR4 (Ref. 77), and of these two only TLR3 evolves under purifying selection46. Indeed, clinical genetic studies have revealed that rare mutations affecting the TLR3–TRIF pathway underlie herpes simplex virus 1 (HSV-1) encephalitis in childhood45,80,81. This suggests that the entire pathway activated by TLR3 in a TRIF-dependent manner is non-redundant in host defence, at least against HSV-1 (Refs 45, 78). Together with immunological, clinical and epidemiological approaches, future population genetics studies aiming to dissect the selective signatures at the level of entire signalling pathways will allow us to further delineate the most critical mechanisms involved in innate immunity to infection.

Effector molecules: interferons and AMPs. The signalling pathways initiated by adaptor proteins culminate in the expression of cytokines, including interleukins and IFNs, chemokines, members of the tumour necrosis factor (TNF) family and growth factors. Among these effector molecules, IFNs have been extensively studied82, but the biological relevance of the different IFN subtypes for host survival remains largely unknown. Like other families of innate immunity genes, IFNs belong to a multigene family of highly paralogous loci, so gene conversion can markedly alter their genetic diversity, making population genetics data more difficult to interpret83. Recent analyses have shown that the three IFN families and their individual members have followed very different evolutionary trajectories, ranging from highly constrained to redundant and expendable84,85.

Some type I IFNs, such as IFNα6, IFNα8, IFNα13 and IFNα14 and the type II IFNγ, have evolved under strong purifying selection, displaying very low levels or even a complete absence, as in the case of IFNγ, of amino acid-altering genetic variation (Fig. 3a). Clinical genetic studies have shown that immunodeficiencies due to defects in the type I IFN pathway strongly affect antiviral immunity, and disorders of the IFNγ circuit are associated with Mendelian susceptibility to mycobacterial disease (MSMD)82. The integration of population and clinical genetic data thus indicates that at least IFNγ and a subgroup of type I IFNs have a crucial, non-redundant role in antiviral and anti-mycobacterial immunity, respectively82,84,85.

Figure 3: Major differences in selective pressures characterize the evolution of the human interferon families.
figure 3

a | Type I, type II and type III interferons (IFNs) display different levels of functional diversity. The circles represent the proportion of chromosomes for each IFN subtype carrying different types of functional variants in the general population. For each circle, the proportion of chromosomes carrying at least one non-synonymous polymorphism is shown in red, and the proportion carrying at least one nonsense polymorphism is shown in black. The blue segment corresponds to the proportion of chromosomes carrying neither non-synonymous nor nonsense polymorphisms. The IFN subtypes in boxes are those for which statistical significance of strong purifying selection was obtained84,85. A schematic representation of the signalling pathways activated by the interaction of type I, type II and type III IFNs with their corresponding receptors is also presented. b | Type III IFNs are the only group of IFNs that have evolved under the action of positive selection, specifically in European and Asian populations. The scheme shows the distribution of the genetic variants under positive selection across the genomic region in which the three type III IFN genes are located. Note that one of the positively selected single-nucleotide polymorphisms (SNPs) in the interleukin-28B (IL28B) region (SNP −3180A>G, rs12979860) has been recently found to lay within the newly discovered IFNL4 gene, which is located upstream of IL28B123. Highlighted mutations result in amino acid changes, whereas the rest are non-coding SNPs. Signatures of positive selection were detected in Asia for the variants in IL28A and IL28B, and in both Asia and Europe for that in IL29. Figure is modified from Ref. 84. GAS, IFNγ-activated site; IFNAR, IFNα/β receptor; IFNγR, IFNγ receptor; IFNλR1, IFNλ receptor 1 (also known as IL-28Rα); IL-10Rβ, IL-10 receptor-β; IRF9, IFN regulatory factor 9; ISRE, interferon-stimulated response element; JAK, Janus kinase; STAT, signal transducer and activator of transcription; TYK, tyrosine kinase.

By contrast, other type I IFN subtypes (mainly IFNα10 and IFNɛ, but also IFNα1, IFNα4, IFNα7, IFNα16 and IFNα17) have accumulated missense or nonsense mutations at high population frequencies (Fig. 3a), suggesting that their functions largely overlap with those of other IFN subtypes84. The varying degrees of genetic diversity and redundancy displayed by type I IFNs suggest that there may be variability in the antiviral potencies and/or expression pattern of the different IFN subtypes. Finally, genetic variants at type III IFNs have been targeted by positive selection, conferring a selective advantage to Eurasian populations84 (Fig. 3b).

Another set of effector molecules that have been thoroughly studied with population genetics approaches are the AMPs. In particular, the defensins are an AMP family with broad antimicrobial activities and non-immune functions, such as in reproduction or cell signalling86. In humans, the α- and β-defensin multigene families contain multiple paralogous genes, with some genes displaying copy-number variability87,88. Comparative analyses of α- and β-defensins in human and non-human primates have revealed that episodes of purifying, positive and balancing selection have driven the evolution of these gene families88,89,90,91. These complex patterns may reflect the need to preserve the functional integrity of these molecules while favouring, in several cases, functional diversity to provide responses to a wide range of pathogens. An interesting example of increased functional diversity is provided by the human β-defensin gene DEFB1, which has evolved under the action of long-term balancing selection91, a process that is thought to be extremely rare outside the MHC genes. The balancing selective event has been localized to the promoter region of DEFB1, where increased functional variation (through heterozygosity) appears to have conferred a selective advantage, possibly against severe sepsis91,92.

Overall, population genetics studies of effector molecules such as IFNs or AMPs emphasize the complex events of selection characterizing these molecules. Although the nature of such variable pressures is not totally clear, the identification of individual genes, such as specific IFN subtypes, subject to strong constraints or to positive evolution paves the way for additional studies to evaluate the potential of these molecules for use in vaccination, diagnosis and treatment.

Selection patterns and immunological relevance

The examples discussed above illustrate how the delineation of the selection mode that has driven the diversity of innate immunity genes can complement immunological and clinical and epidemiological genetic studies. Receptors such as endosomal TLRs and most NALPs, adaptors such as MYD88 and TRIF or effector molecules such as a subset of type I IFNs and IFNγ have been targeted by strong purifying selection46,48,51,64,78,84,85, highlighting the essential and non-redundant nature of the immunological mechanisms involved. Moreover, the fact that selective constraints acting on RLRs are weaker than those acting on endosomal TLRs46,51,65, even though both types of receptors detect nucleic acids, might reflect some degree of redundancy for RLR-mediated antiviral immunity. Similarly, the weaker constraints acting on most NOD, IPAF and CIITA subfamily members and cell-surface TLRs with respect to NALPs46,64 suggest greater redundancy in the pathways triggered by the former molecules in response to microbial products and stress signals (Fig. 2). Extreme cases of redundancy are provided by molecules such as MBL or TLR5, for which the frequency of loss-of-function alleles can increase in the population by genetic drift46,50,73,76. This contrasts with other situations in which gene loss has conferred an advantage to the host. For example, the increased frequencies of loss-of-function alleles of DARC (which encodes Duffy antigen/chemokine receptor) or CASP12 (which encodes inactive caspase 12) is a result of positive selection that is likely to be due to their protective effects against Plasmodium vivax infection and sepsis, respectively93,94.

The occurrence of positive selection attests to more dynamic immunological mechanisms, variation in which has been beneficial for the host8,9,10,45. These positive selection events can be ancient and shared by all humans. Such is the case for the adaptor MYD88 (Ref. 78): a functional change in this protein may have favoured the survival of the entire human species. The human MYD88 differs from its primate orthologue by a single amino acid substitution (H80Q), which might represent the target of selection48,78. This variant is located within the DEATH domain, which is crucial for downstream protein–protein interactions, and mutations in this domain have been associated with defective immunity to infection95,96. Conversely, other events of positive selection seem to be more recent and restricted to specific populations, such as those detected at some RLRs, NLRs, TIR-containing adaptors and type III IFNs63,64,65,78,79,84. So, the advantage conferred by such events is more dependent on environmental variables, possibly related to exposure to pathogens. For example, a variant in MDA5 (R460H) has been targeted by positive selection in Europeans and Asians63,65, suggesting an advantage to host defence that could be related to the sensing of particular RNA viruses by this sensor. Population genetics thus helps us to differentiate between genes with a high degree of redundancy and genes that are essential and non-redundant. The identification of essential genes is particularly important for molecules with poorly described functions, such as most NALPs, as this helps to prioritize which genes should be studied from an immunological standpoint.

Does the evolution of innate immunity genes and their signatures of selection reflect a quest for improved host defence against pathogenic microorganisms? The answer is yes, but not exclusively. It is obvious that pathogens are likely to be a major force exerting pressure on host genes26,97. However, we coexist with millions of symbiotic, generally non-pathogenic microorganisms, and these can also be recognized by innate immunity receptors98,99,100,101,102. Furthermore, these receptors are not only involved in the mere sensing of microorganisms but also in functions that ensure tissue development and tissue homeostasis, including inflammatory control, autophagy and apoptosis. For example, some TLRs appear to be involved in central nervous system development, and some NALPs seem to have roles in intestinal homeostasis, early development and reproduction40,103,104. Moreover, innate receptors, including TLRs and RLRs, have also been implicated in autoimmune pathogenesis60,61,105,106,107. In light of this, the traditional view of pathogens as the only force driving the evolution of immunity genes may be too simplistic, as non-infection-related factors may have further contributed to the patterns of selection observed at these genes. For example, given the increasing range of functions attributed to NALP3, which acts as both a microbial sensor and a regulator of intestinal homeostasis40,104, its extreme conservation may not only attest to a major role in controlling infection but also reflect the evolutionary equilibrium reached by the host to maintain a peaceful coexistence with the symbiotic microbiota.

Selection and disease susceptibility

Unequal selective pressures are expected to be exerted on genes associated with Mendelian, single-gene disorders or with complex disease risks31,108. At the genome-wide level, genes associated with Mendelian disease are enriched in signs of purifying selection31,34,109, as their deleterious mutations are usually not transmitted to the next generation as a result of early death, and thus have low (<1%) population frequencies110 (Fig. 1).

In the context of innate immunity, for example, mutations in the strongly constrained TIR–MYD88 and TLR3–TRIF pathways have been associated with increased susceptibility to life-threatening infections by pyogenic bacteria96,111 and HSV-1 (Refs 80, 81), respectively. Likewise, mutations in NALP3, which is subject to the highest degree of purifying selection among NALPs64, have been associated with severe inflammatory diseases104,112,113. Finally, mutations in genes affecting the production or activity of IFNγ, which is subject to the strongest purifying selection of all IFNs84,85, have been associated with MSMD82,114. These clinical examples further support the notion that immunity-related genes that evolve under purifying selection are of major biological relevance in host survival, and their mutations are likely to predispose individuals to early-onset, life-threatening disease115.

Mutations associated with complex disease risk generally have a lower penetrance and are therefore able to reach higher frequencies in the population (>5–10%) than single-gene disease susceptibility alleles20,108. Genes with alleles that contribute to complex diseases in adults display signs of less pervasive purifying selection109,115. For example, although missense mutations are tolerated in some innate immunity genes (for example, TLR1, TLR4 and TLR10), weak negative selection keeps them at low population frequency (Fig. 2), attesting to their non-negligible impact on host fitness46,49. Likewise, mutations in weakly constrained or even neutrally evolving genes might subtly modulate complex susceptibility to disease, as illustrated by the case of MBL, variation in which may have an effect in particular, narrow conditions, such as co-existent morbidity116. Thus, weakly constrained genes generally have a more modest impact on host survival and may be involved in complex susceptibility to infection at the population level.

Finally, genes associated with complex disease susceptibility are generally enriched for signals of positive selection31,37,109,117. Specifically, positively selected variants in TLR1, MDA5, CIITA or NALP1 (Refs 46, 63, 64, 65) have been associated with infectious, autoimmune or inflammatory diseases106,118,119,120,121,122. Likewise, variation in type III IFNs, including five SNPs in IL28B (also known as IFNL3), two in IL28A (also known as IFNL2) and one in IL29 (also known as IFNL1), has been targeted by positive selection84 (Fig. 3b). Interestingly, the positively selected SNPs in the IL28B region, one of which now lies within the newly discovered IFNL4 gene123, have been associated with spontaneous clearance of hepatitis C virus (HCV) and a better response to treatment for chronic HCV infection123,124,125,126,127. This suggests that the nature of the selective advantage conferred by these IFNs is increased resistance to viral infection. Collectively, the overlap between the positive selection events detected at some genes and their association with susceptibility to complex disease provides an important proof-of-concept for the added, predictive value of population genetics. This is particularly important for evaluating the potential implications in human disease of other, as-yet-uncharacterized variants targeted by positive selection.

Furthermore, there is increasing evidence to suggest that, in some cases, positively selected mutations lead, directly or indirectly, to maladaptation and are associated with immune dysfunction, such as autoimmunity and inflammation8,128. For example, the positively selected MDA5 R460H variant seems to increase the risk of psoriasis106. Similarly, positive selection has increased the frequency of the V1059M mutation in NALP1 among Europeans64, despite also increasing the frequency of the linked L155H NALP1 mutation, which is associated with autoimmune disorders including Addison's disease, type 1 diabetes and vitiligo118,129. Likewise, DEFB1 haplotypes that have been proposed to be maintained by balancing selection because they confer protection against sepsis91,92 seem to predispose to asthma and atopy130. Although further clinical and epidemiological genetics work is needed, these examples provide additional evidence in favour of the hygiene hypothesis, according to which the current increased incidence of autoimmune and inflammatory disorders may result, at least partially, from past events of selection that increased host resistance to infection128.

The distinction between genes involved in single-gene disorders and complex diseases, as well as between the type of selective patterns that are expected to characterize them, is not always clear-cut. For example, genes associated with inflammatory bowel disease (IBD), the genetic basis of which is complex, are enriched in genes involved in Mendelian primary immunodeficiencies, including MSMD131. Likewise, genes can display complex signatures of selection, such as those detected at NOD2, variation in which has been associated with Crohn's disease131,132. At the gene level, the diversity of NOD2 is largely consistent with neutrality64, but some NOD2 Crohn's disease-risk alleles have been proposed to be maintained in the European population by balancing or recent positive selection131,133, although this remains highly controversial. Such conflicting signatures of selection may reflect the complex, multiple phenotypes with which NOD2, or its interacting gene networks, have been associated, including Crohn's disease, ulcerative colitis, Blau syndrome and mycobacterial infections131.

Future perspectives

Here, we have described specific examples of how population genetics studies can offer important functional insights into innate immunity. The evolutionary dissection of innate immunity genes allows us to rank them with respect to their biological relevance and also informs us about their respective contribution to immunity-related disorders. However, immune systems do not evolve one gene at a time; instead, selective pressures are likely to affect multiple interacting genes, resulting in polygenic adaptation134. Furthermore, such pressures can be numerous and related to various diseases simultaneously, as in the case of NOD2 alleles. Thus, genes in functional networks might show coordinated signatures of selection, as recently proposed for a group of antibacterial innate immunity genes (that is, some TLR pathways), in which the position of each gene in the network conditions its evolvability and therefore its adaptability135. The recent advent of whole-genome sequencing data sets from human populations provides us with tools to test these hypotheses and also enables the analysis of additional families of innate immunity molecules, including CLRs, cytosolic DNA sensors, chemokines and members of the TNF family. For this, improved statistical methods for detecting epistatic interactions or selection in gene networks are needed.

Population genetics studies of selection have traditionally followed a gene-centric view, focusing on qualitative changes at the protein level. However, selection can also target non-coding regions of the genome, such as regulatory elements136. For example, changes in gene expression levels can indeed constitute a target for adaptive evolution, as illustrated by the fact that polymorphisms in regulatory elements (that is, expression quantitative trait loci (eQTLs)) are enriched for signs of positive selection137. Furthermore, regulatory variation is increasingly documented as being associated with phenotypic variation, both benign and disease-associated136,138,139. Multidisciplinary approaches (such as those used by the ENCODE Consortium136) that combine whole-genome sequencing data, expression and proteomic profiles, chromatin accessibility and DNA methylation marks from different cell types challenged with different immune stimuli should facilitate an unbiased assessment of the immunological mechanisms favouring our past and present survival in the natural setting.

Finally, future population genetics studies may help us to understand diverse aspects of immune function, including interaction with the gut microbiota, interplay with viruses and transposable elements, and gene–environment interactions. Indeed, the extent to which host genetic diversity drives, or co-evolves with, changes in the microbiota remains to be explored in much further detail. Similarly, detailed genotype-to-phenotype studies of large population cohorts from different ethnic backgrounds and exposed to different environments will help us to delineate the relative contribution of the modern environment to the present-day natural variation of immune responses. Notably, the change in dietary habits and the use of domesticated animals since the advent of agriculture 10,000 years ago (which is an instant in evolutionary time), life in large cities and the extended use of modern antibiotics have radically altered our exposure to microorganisms. As our immune systems evolved in a totally different context of diet and microbial exposure, interaction of our genetic make-up with these new environmental factors may lead to phenotypes associated with disease states, such as inappropriate inflammatory responses or autoimmunity. So, the integration of all these data into a population genetics framework will help us to identify immunological mechanisms with adaptive or maladaptive roles in the immunity of modern human populations.

The challenge that lies ahead now is to apply knowledge of population genetics to infer molecular details of immune responses and design effective therapies.