Autoimmune diseases (ADs) are a family of more than 80 chronic, and often disabling, illnesses broadly characterized by immune system dysfunction leading to the loss of tolerance to self-antigens, presence of increased level of autoantibodies, inflammatory and mediatory cells and resulting chronic inflammation. The family of ADs is remarkable for its complexity and similar underlying mechanisms despite the involvement of multiple organ systems.1 Patients often endure lifelong debilitating symptoms, loss of organ function, reduced productivity at work and high medical expenses. Importantly here, as many ADs present before or during a woman’s reproductive years, they can have effects on fetal2, 3 and maternal outcomes, such as pregnancy loss in women with systemic lupus erythematosus,3, 4 vasculitis3 and type 1 diabetes,5 and infertility in women with rheumatoid arthritis.4 Collectively, ADs are common, affecting between five and nine percent of the population and are of considerable personal and public health burdens.1, 6 The reasons for their high prevalence, gender and ethnic disparities and rising incidence and prevalence remain unclear.1

The genetic basis of disease is influenced by individual and population variation. Population-level phenomena such as mutation, migration, genetic drift and natural selection, have left an imprint on genetic variation that is likely to influence phenotypic expression in specific populations.7 Given its role in driving genetic variation, population genetics can help elucidate human genetic diversity and, consequently, disease etiology.

Left untreated, most ADs affect the ability to raise offspring that successfully reproduce and result in reduced reproductive fitness. Thus, alternative forces must exist that permit the relative high frequency of risk alleles. As immune and inflammatory responses can be highly sensitive to environmental change,8 evolutionary adaptation to specific environments might have driven selection on immune-related genetic variants, impacting variant frequencies and leaving signatures of selection in the genome. Given that infectious organisms are strong agents of natural selection,9, 10 it is plausible that alleles selected for protection against infection confer increased risk of autoimmune and allergic diseases, or their complications,11 as the ‘hygiene hypothesis’12 postulates. It is thought that the adaptation to pathogen pressure through functional variation in immune-related genes conferred a specific selective advantage for host survival, including protection from pathogens and tolerance to microbiota.13 However, the emergence of such variation conferring resistance to pathogens is also influencing AD risk in specific populations.

In the past decade, multiple genome scans for signatures of selection on common variation have identified many immune-related loci.14, 15, 16, 17, 18 Similarly, over 130 genome-wide association studies have established AD-associated alleles. The evidence that AD-associated variants are also under selection is growing.18, 19, 20, 21, 22, 23 This review summarizes the evidence for AD-associated loci under selection and the candidate selective pressures. Collectively, these results support the hypothesis that AD is, in part, a pleiotropic consequence of alleles conferring pathogen resistance24 and help explain the number of shared autoimmune variants conferring both disease risk and protection, as well as the health disparities in individuals and populations. Given that genomic variation can have clinically important consequences,7 elucidating the patterns of variation and the functional role of the selective pressure might contribute to a better understanding of disease etiology and the development of new therapies for improved disease management.

Genetic etiology of ADs

The family of ADs is remarkable for its complexity and similar underlying mechanisms despite the involvement of multiple organ systems.1 Most ADs exhibit marked gender and ethnic disparities. They disproportionately afflict women and are among the leading causes of death for young and middle-aged women.1 African Americans are at higher risk than European Americans for systemic lupus erythematosus and scleroderma (systemic sclerosis), which they tend to develop earlier in life and experience more severe disease, but are at lower risk for type 1 diabetes, thyroiditis and multiple sclerosis.1 Although most of these diseases are individually rare, in the aggregate they affect between five and nine percent of the population,1, 6 and, for reasons that are poorly understood, their incidence and prevalence is rising.1 Despite the variation in prevalence, incidence and disease severity that are known to vary among ethnic groups, little is known about the genetic etiology of these diseases in the different populations and the reasons for the ethnic disparities remain elusive.

Multiple lines of evidence suggest some degree of common genetic etiology in ADs, including clustering of multiple ADs in families and in individuals, and the number of confirmed genetic regions predisposing to several ADs. This genetic overlap is exemplified by the well-known associations of the human leukocyte antigen (HLA) region with all ADs, as well as other loci associated with multiple AD, such as IL23R, TNFAIP3 and IL2RA.25 Nevertheless, the genetic heritability of ADs is extremely variable, ranging from very high in Crohn's disease or ankylosing spondylitis to almost negligible in systemic sclerosis.26 Genome-wide association studies have proved particularly powerful for ADs.27 Table 1 summarizes the ADs with published genome-wide association studies and the number of disease-associated loci uncovered from these genome-wide association studies.

Table 1 Autoimmune diseases with published GWAS and respective number of associated loci

It is the general consensus that there is a common genetic background predisposing to autoimmunity, and that further combinations of more disease-specific variation at HLA and non-HLA genes, in interaction with epigenetic and environmental factors, contribute to disease and its clinical manifestations. Results from the most comprehensive evaluation of shared genetic autoimmune loci to date suggest that instead of resulting from common risk factors, autoimmunity may result from specific and multiple different pleiotropic effects.25 This suggests that different population genetic factors (for example, natural selection with coevolution with pathogens, random mutation, isolations, migrations and interbreeding) in similar or distinct environments led to the establishment of the current plethora of loci that predispose to autoimmunity.25 As the authors conclude, these results generate an appreciation for potential interplay between population genetic factors and environmental factors.25 This appreciation has been renewed in a recent systematic analysis of loci shared among multiple ADs, confirming that SNP or haplotype sharing is complex, that often the most associated variant at a given locus differs and, even when shared, the same allele often has opposite associations.28 Population-level phenomena are the likely reason behind this complexity of gene effects in different ADs.

Natural selection and adaptation

Human genome variation at the population level is typically thought to be shaped by five evolutionary processes: sexual reproduction, mutation, migration, random genetic drift and natural selection. In addition, recombination is often an underappreciated driver of genomic diversity and genome evolution,29 as recombination events (the exchange of genetic material between homologous chromosomes) are accompanied by biased gene conversion, which causes allele frequency shifts that mimic the effects of natural selection,30, 31 shapes patterns of diversity in human genomes and contributes to substantially increased risks of hereditary disease.31 Natural selection is the process by which a trait becomes either more or less common in a population depending on the differential reproductive success of those with the trait. Natural selection drives adaptation, the evolutionary process whereby over generations the members of a population become better suited to survive and reproduce in that environment.

Negative (or purifying) selection decreases the prevalence of traits by purging deleterious alleles, and is the most common mechanism of selection. Negatively selected variants are usually associated with rare Mendelian disorders and have low population frequencies. As an example, endosomal Toll-like receptors (TLR3, TLR7, TLR8 and TLR9), which are involved in the sensing of nucleic acids, mostly from viruses, have been subject to negative selection.32

Positive selection increases the prevalence of adaptive traits by increasing the frequency of favorable alleles. Positively selected variation is often associated with common complex traits and can achieve higher population frequencies.33 A classic example of positive selection is the lactase (LCT) gene, in which independent variants associated with lactase persistence raised to high frequency in different populations owing to the strong selective force of adult milk consumption.34 The enrichment for signals of positive selection among genes associated with complex traits is well documented.15, 35, 36, 37

Balancing selection favors genetic diversity by retaining variation in the population as a result of heterozygote advantage and frequency-dependent advantage. Few loci show the signature of this type of selection. One classical example is the hemoglobin-β locus and the high prevalence of the hemoglobin-β mutation that causes sickle cell anemia in regions where malaria is endemic: while individuals homozygous for the wild-type hemoglobin-β are susceptible to malaria, and individuals homozygous for the hemoglobin-β mutation have sickle cell anemia, heterozygous individuals are resistant to malaria and otherwise healthy. Another pertinent example of balancing selection is the HLA (also known as major histocompatibility complex (MHC)) region,38, 39 where highly polymorphic loci have a central role in the recognition and presentation of antigens to the immune system. The high levels of polymorphism are the results of pathogen-driven balancing selection.40 The heterozygote advantage against multiple pathogens contributes to the evolution of HLA diversity, which confers resistance against multiple pathogens and explains the persistence of alleles conferring susceptibility to AD.41 Nevertheless, there is also recent evidence that positive selection might be acting on specific HLA alleles in a deme due to unique environmental pressures.42

Natural selection leaves a distinctive molecular signature in the targeted genomic region. Different statistical methods have been developed to detect signatures of selection. Quintana-Murci and Clark13 provide a comprehensive description of popular statistical methods to detect the different types of selection. There are three main classes of tests that detect regions under selection: (i) site frequency spectrum tests summarize the allele frequency distribution of polymorphisms, (ii) linkage disequilibrium tests uncover high-frequency haplotypes with extended linkage disequilibrium and (iii) population differentiation statistics detect local increases in the magnitude of population structure due to geographically restricted selection.

Allele frequency spectrum tests (for example, Tajima's D, Fay and Wu's H-test) can detect recent selective events (<250 000 years) and are powerful for detecting completed or quasi-completed selective sweeps; haplotype-based tests (for example, integrated haplotype score (iHS), cross population-extended haplotype homozygosity (XP-EHH) test) are very powerful to detect very recent events of positive selection (<30 000 years); population structure statistics (for example, FST test) are powerful for detecting selective differences among population, especially those that occurred after the out-of-Africa dispersals (<60 000 years).13 As recently reviewed,43 allele frequency spectrum and haplotype-based tests are powered to detect classic selective sweeps, in which a novel adaptive variant arises de novo in a population and rapidly increases in frequency to (or near) fixation, purging variation at linked sites as they spread. Soft sweeps in which selection arises from variants already polymorphic in a population (standing variation) are more difficult to detect using these tests, especially when an adaptive trait is influenced by multiple loci.43 However, in a geographically restricted population being subjected to adaptive pressures, selection is likely to favor a weaker signature of adaptation, which may include subtle shifts in allele frequencies at multiple loci, resulting in an excess of allele frequency differentiation.43 Recent studies suggest that such soft sweeps may have been the most common mechanism of adaptation in recent human evolution.44, 45 Interestingly, immune response pathways show evidence of polygenic adaptation,44 the process in which adaptation occurs by simultaneous selection on variants at many loci. Hence, methods based on population structure may be more useful in studies of recent human natural selection as they can detect more subtle changes in allele frequency.46

As revised, it has been hypothesized that, in addition to genetic (sequence) variation, heritable epigenetic modifications can affect rates of fitness increase, as well as patterns of genotypic and phenotypic change during adaptation.47 However, the role of epigenetic variation in the response to natural selection has not been formally assessed, as the methodology to test signatures of natural selection on epigenetic variation is just emerging.48

Natural selection and ADs

Given that, if untreated, ADs can diminish reproductive potential and impair the ability to raise offspring that successfully reproduce, some evolutionary process must sustain the relative high frequency of risk alleles seen in current populations around the world. As the human genome is shaped by adaptation to environmental pressures at the population level, one plausible reason for the higher frequency of disease risk alleles may be the direct effect of population-specific natural selection. This hypothesis is supported by the experimental evidence for MHC heterozygote superiority against multiple pathogens, a mechanism that would contribute to the evolution of HLA diversity and explain the persistence of alleles conferring susceptibility to disease.41

There is compelling evidence that natural selection is acting on a significant fraction of the human genome.16, 49, 50, 51, 52, 53 Immune function genes and pathways are consistently reported in tests for natural selection. As a result of several genome-wide scans, over 300 immune-related genes have been suggested as putative targets of positive selection.14, 15, 16, 17, 18 Although the challenge in validating the true signals remain,54 several genes involved in immune-related functions have been shown to be under selection.22, 55

Over 40 regions with evidence for selection and associated with at least one AD have been reported (Table 2). Although slightly over half of these regions seem associated with a single disease, close to half are shared among ADs, including the HLA region, PTPN22, TNFSF4, ARHGAP31-CD80, TNIP1 and TYK2. A few ADs have reported loci with signatures of positive selection: inflammatory bowel disease (Crohn's disease and ulcerative colitis),19 celiac disease18, 20 and systemic lupus erythematosus.21 Several of the genes in Table 2 show patterns of genetic variation that are consistent with evidence for recent positive selection, such as the PTPN22, ITPR3 and BLK regions,22 or SNPs in CLEC16A and UHRF1BP1.23 As discussed in the next section, resistance to protozoa and tuberculosis infection have been implicated as the selective pressures for PTPN22 and UHRF1BP1, respectively. Interestingly, the systemic lupus erythematosus susceptibility allele in UHRF1BP1 is associated with decreased UHRF1BP1 RNA expression in different cell subsets, suggesting that the disease risk allele under selection has a regulatory effect.23 It is noteworthy that alleles under selective pressure can confer increased risk of AD manifestations, as shown by the evidence that variants within the APOL1 gene associated with resistance to human African trypanosomiasis (aka sleeping sickness) in some African populations predispose to end-stage kidney disease in systemic lupus erythematosus.11

Table 2 Autoimmune disease regions with the evidence for selection and implicated agents of selection

Agents of selection

The wide variety of environments inhabited by human populations is likely exerting different selective pressures that lead to adaptation through natural selection. Climatic factors such as altitude, latitude, ultra-violet radiation levels, temperature, as well as diet and pathogens have been reported as agents of selection driving adaptations to these environments and lifestyles.

For example, several correlations between temperature, body mass and metabolic rates have been reported,56, 57, 58 correlations between latitude with skin pigmentation are well-known,59, 60 and latitude combined with specific genetic variation have been reported to increase susceptibility to hypertension.61, 62 Signals of natural selection driven by annual photoperiod variation have been reported for schizophrenia, bipolar disorder and restless leg syndrome risk variants.63 In a comprehensive genome-wide test for adaptations to continuous climate variables, Hanckock et al.24 found correlation between SNPs and climate variables, including an enrichment of individual SNPs involved in pigmentation and immune response, as well as for pathways related to UV radiation, infection and immunity, and cancer. They note that their enrichment for pigmentation and immune response phenotypes contrasts with the metabolic traits with common evidence for adaptations to diet, subsistence and ecoregion.24 In a recent analysis of correlations between genetic risk of multiple diseases and worldwide migration trajectories, the authors report that variants associated with primary biliary cirrhosis, alopecia areata, inflammatory bowel disease, systemic lupus erythematosus, systemic sclerosis, ulcerative colitis and vitiligo, have undergone genetic risk differentiation associated with migration.64

In addition to these general correlations between climatic variables and particular traits, examples of selective pressures driving natural selection of specific loci include malaria resistance and β-globin gene mutation,65 lactose tolerance and the lactase (LCT) gene variation,66, 67 skin pigmentation and the SLC24A5 gene variation,68 high-altitude tolerance and the EPAS1 gene variation,69 and adaptation to cold resistance and the uncoupling protein UCP3.70 Interestingly, expression QTLs (eQTLs) from immune function and metabolism genes are enriched in signals of environmental adaptation,71 which highlights the importance of regulatory variations in local adaptation.

Nevertheless, the strongest effect of climate is in shaping the spatial pattern and species diversity of human pathogens,9 which is directly relevant to AD predisposition. As recently reviewed,72 in the constant co-evolutionary battle between host and pathogen, pathogens that diminish reproductive potential, either through death or poor health, drive selection on genetic variants that affect pathogen resistance. According to the ‘hygiene hypothesis’ first proposed by Strachan in 1989,12 the increased disease prevalence of autoimmune and allergic diseases in industrialized countries may be due to modern society’s limited pathogen exposure. The hygiene hypothesis posits that humans have adapted to infectious exposures that were the norm in the past and that exposure was protective against AD. Over many generations environmental pressure may have favored alleles that affect resistance to pathogenic microorganisms, allowing humans to respond to immune system challenges differently but resulting in an increased risk of ADs. As Hancock et al.24 suggested, it is likely that selection signals in immune-related loci may implicate variants evolving under a model of antagonistic pleiotropy, where the selective pressure was pathogen resistance, and the autoimmune disorder is a pleiotropic consequence of the resistance allele. This could hence be a mechanism explaining the prevalence of AD risk alleles that are common in the population.

Indeed, pathogens have been the main selective pressure through human evolution.10 In an analysis that included climate, diet regimes and pathogen loads, Fumagalli et al.10 showed that the diversity of the local pathogenic environment is the predominant driver of local adaptation, and that climate conditions only had a relatively minor role. In addition, they reported an enrichment of genes associated to ADs, such as celiac disease, type 1 diabetes and multiple sclerosis, which supports the hypothesis that some susceptibility alleles for ADs may be maintained in human population due to past selective processes.10 The enrichment for signals of positive selection in inflammatory-disease susceptibility loci has been recently corroborated.23

Recent reviews of selection signatures left by pathogen-exerted pressure, including immune-related genes, can be found elsewhere.72, 73

Genetic regions associated with susceptibility to different ADs and evidence of selection that has been attributed to host–pathogen coevolution are shown in Table 2. In slightly less than half of the regions with evidence for selection and AD-association, known pathogens have been implicated as the selective pressure. Variation in the HLA and SH2B3 has been reported as a protective factor against bacterial infection.20, 40, 74, 75 Variants in the IFIH1 gene, whose protein is a cytoplasmic helicase that recognizes RNA of picornaviruses and mediates induction of interferon response to viral RNA, have been shown to affect IFIH1 function and host antiviral response.76 In the context of systemic lupus erythematosus (SLE) predisposing loci, Clatworthy et al.77 has shown that FCGR2B is important in controlling the immune response to Plasmodium falciparum, the parasite responsible for the most severe form of malaria, and suggests that the higher frequency of human FCGR2B polymorphisms predisposing to SLE in Asians and Africans may be maintained because these variants reduce susceptibility to malaria. Machado et al.78 has suggested that helminth infection has driven positive selection of FCGRs variation. Grossman et al.22 implicated Salmonella typhimurium and other exposures that directionally drive selection of the toll-like receptor 5 (TLR5) gene,79 which is involved in recognition of flagellated bacteria. Unlike endosomal TLRs, such as TLR7 and TLR8, that have been subject to purifying selection, cell-surface TLRs involved in pathogen recognition (TLR1, TLR2, TLR4, TLR5, TLR6 and TLR10) experienced more relaxed constraints.32 UHRF1BP1 has been shown to be significantly differentially expressed in primary dendritic cells upon Mycobacterium tuberculosis infection,80 suggesting that response to tuberculosis has shaped genetic variation at this locus. NOD2 shows complex signatures of selection, with some variation consistent with neutrality81 and other with balancing or recent positive selection,19, 82 probably reflecting its association with multiple traits (Crohn's disease, ulcerative colitis and mycobacterial infections).19 Another complex pattern is that observed for the nonsynonymous variant in PTPN22 that increases the risk of SLE, rheumatoid arthritis, type 1 diabetes, vitiligo, autoimmune thyroid disease and ulcerative colitis, but is protective against Crohn's disease, despite ulcerative colitis and Crohn's disease being closely related phenotypes.28 Karlsson et al.83 have recently reported that cholera has exerted strong selective pressure on proinflammatory pathways.

It is important to highlight examples where there is strong statistical evidence of selection plus an experimentally validated phenotypic effect likely to predispose to autoimmunity or its complications, as these examples provide the most convincing evidence of positive selection increasing AD susceptibility. These include the SLE-associated polymorphism of the inhibitory receptor FCGR2B, which is found at higher frequency in African and Asian populations where malaria is endemic and enhances phagocytosis of Plasmodium falciparum-infected erythrocytes, demonstrating that FCGR2B is important in controlling the immune response to malarial parasites.77 Experimental characterization of a nonsynonymous variant in TLR5, another SLE-associated gene, showed that it leads to altered NF-κB signaling in response to bacterial flagellin.22 Functional analysis revealed that cells from individuals carrying the SH2B3 risk allele display an increased proinflammatory cytokine production in response to muramyl dipeptide, a component in all bacterial cell walls, indicating that SH2B3 has a role in protection against bacteria infection.20 As previously noted, it is noteworthy that variants within the APOL1 gene known to be under strong selective pressure in some African populations predispose to a manifestation of SLE, end-stage kidney disease.11 APOL1 has a role in innate immunity by protecting against Trypanosoma brucei infection, the parasitic protozoa transmitted by the tsetse fly that causes human African trypanosomiasis (a.k.a. sleeping sickness). There is compelling evidence that the kidney disease risk alleles’ frequencies increased rapidly because of their potential protective effects against human African trypanosomiasis.84 Given that these disease risk adaptive alleles have been functionally validated, these examples support the hypothesis that the increased prevalence of ADs may result, at least partially, from past events of selection that increased host resistance to infection.85


This review summarizes the genetic regions associated with susceptibility to different ADs and concomitant evidence for selection, including known agents of selection when known. Uncovering these AD-associated loci under selection underscores the importance of population genetics and how the understanding of human genetic diversity is crucial to understanding disease etiology or treatment response at both the population and individual levels.

Progress achieved in recent years is a direct consequence of large-scale projects such as the HapMap86 and HGDP.87 However, the common variation analyzed by these projects does not capture the novel, deleterious or functional variants that, together with common variation, distinguish global human populations.7 Now, with genome sequence information available on a population scale from projects like the 1000 Genomes,88 the role of selection in shaping AD risk can be assessed with unprecedented detail in an unbiased fashion.89, 90 Recent studies using whole-genome sequence data helped identify the established functional SNPs in known loci and report novel candidate regions for positive selection.23, 91 In the future, analyses of heritable epigenetic variation will unveil signatures of natural selection on epigenetic variation.

The complexity of gene effects in different ADs is remarkable, as shown, for example, by the complex signatures of selection seen at NOD2,19, 81, 82 and the plethora of shared variants that increase AD predisposition concordantly or discordantly, such as for PTPN22.28 These observations illustrate that the model whereby AD risk alleles are positively selected due to their protective effects against infections is naive. A combination of population-level phenomena, including possibly bottlenecks, migration, admixture, natural selection and random genetic drift, are likely contributors to this complexity of gene effects. Given the complex history of selective pressures acting on humans, unequal selective pressures and a diverse spectrum of plausible evolutionary models are expected to be exerted on susceptibility loci for ADs.33 It is likely that several pathogens have exerted pressure on the same loci and that selection can vary in form, intensity, time and space, which is consistent with the observation that both risk and protective alleles for ADs increased in frequency due to selection.18 For most regions, the exact selective pressure leaving the signature of selection is unclear. Clearly, these signatures are not necessarily the result of adaptation, but might be a consequence of random genetic drift. In any case, regardless of the population phenomenon shaping current human genetic diversity, this genetic variation is the basis of clinically relevant traits at both the individual and population levels.7

An important next step to delineate the selective advantage conferred by these AD risk variants are functional studies using in vitro experiments and model organisms to identify the underlying functional variants and quantify the phenotypic consequences of the candidate adaptive alleles. Human–pathogen coevolution is ongoing and, despite the emergence of new pathogens (for example, HIV), potential pathogens driving these host-specific adaptations are expected to have long-standing relationships with humans, including those that cause malaria, smallpox, cholera, tuberculosis and leprosy,92 as well as the human microbiome.93 Regardless of the agent of selection and the reasons for the emergence of both common and rare AD-causing alleles, incorporating population genetics to understand human genetic diversity will lead to a better understanding of the causes of health disparities, identification of functional variants and discovery of cellular mechanisms and contribute to the development of new therapies.