Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Balancing immunity and tolerance: genetic footprint of natural selection in the transcriptional regulatory region of HLA-G


Human leukocyte antigen-G (HLA-G) has well-recognized immunosuppressive properties modulating the activity of many immune system cells, and polymorphisms observed at the HLA-G 5′ upstream regulatory region (5′URR) may influence gene transcriptional regulation. In this study, we characterized the sequence variation and haplotype structure of the HLA-G 5′URR in worldwide populations to investigate the evolutionary history of the HLA-G promoter and shed some light into the mechanisms that may underlie HLA-G expression control. A 1.4-kb region, encompassing the known HLA-G regulatory elements, was sequenced in three African populations from Senegal, Benin and Congo, and data were combined with those available in the literature, resulting in a total of 1411 individuals from 21 worldwide populations. High levels of nucleotide and haplotype diversities, excess of intermediate-frequency variants and reduced population differentiation were observed at this locus when compared with the background genomic variation. These features support a strong molecular signature of balancing selection at HLA-G 5′URR, probably as a result of the competing needs to maintain both a maternal–fetal immune tolerance and an efficient host immune response to invading pathogens during human evolution. An extended analysis of a 300-kb region surrounding HLA-G revealed that this region is not involved in a hitchhiking effect and may be the direct target of selection.


The human leukocyte antigen (HLA) complex located at the 6p21.3 chromosome region is a high-density gene region of the human major histocompatibility complex, presenting many genes involved in the initiation of innate and adaptive immune responses. At the telomeric end, this region includes both classical class Ia (HLA-A, HLA-B and HLA-C) and non-classical class Ib (HLA-E, HLA-F and HLA-G) genes, as well as two pseudogenes (HLA-H and HLA-J). Classical class Ia genes are highly polymorphic in their coding region and encode soluble and membrane-bound proteins that act primarily as antigen presentation molecules to T CD8 lymphocytes. In contrast, non-classical class Ib genes have limited polymorphism in their coding region and restricted tissue distribution, and primarily act modulating the function of many immune system cells. HLA-G is the most studied non-classical class Ib gene because of its crucial role in immune tolerance. It encodes molecules (seven isoforms) that are known ligands for inhibitory receptors found on the surface of different immune cell sub-populations, such as antigen-presenting cells, natural killer cells, B and T lymphocytes.1, 2, 3, 4, 5 Besides its relevant physiological role in the induction of maternal–fetal tolerance, HLA-G can be expressed in numerous pathological situations, being either beneficial (autoimmune and inflammatory diseases, transplantation) or detrimental (infectious diseases, cancer) to the patient.6, 7, 8 A tight regulation of HLA-G expression from fetal to adult life is then required for a fine adjustment of the immune response depending on the physiopathological context. Although some light has been shed, the complex mechanisms underlying HLA-G expression control have not been fully elucidated.

In contrast to the relatively conserved coding region, the 5′ upstream regulatory region (5′URR) and 3′ untranslated region (3′UTR) of the HLA-G gene exhibit substantial polymorphism that may have a pivotal role in the regulation of HLA-G expression at the transcriptional and post-transcriptional levels, respectively.7, 9, 10, 11, 12, 13, 14 HLA-G 3′UTR nucleotide variations have been extensively studied in both experimental and pathological conditions, and several of these polymorphic sites have been shown to influence HLA-G production by modifying the binding of specific microRNAs or the stability of the HLA-G mRNA.15, 16, 17 Notably, the 14-bp insertion/deletion (indel, rs371194629) has been associated with a wide range of clinical conditions, including transplantation, tumors, pregnancy disorders, autoimmune diseases and chronic viral infections.7, 8 In contrast, the HLA-G 5′URR has been poorly studied. Only a few reports have surveyed sequence variation in this gene region in healthy individuals,4, 9, 12, 18, 19, 20, 21 providing limited knowledge of the worldwide genetic diversity at this regulatory gene segment. Similarly, there have been few attempts to associate HLA-G 5′URR variations with modulation of HLA-G expression and susceptibility to diseases.4, 18, 19, 22, 23, 24, 25, 26, 27, 28 Ober et al.18 focused on the triallelic single-nucleotide variant (SNV) −725 C>G/T (rs115535764) and reported that the fetal loss rates were highest among couples who carried the −725 G allele, which was further associated with increased HLA-G transcription.24 Nicolae et al.22 reported a significant association between bronchial asthma and the −964 G>A SNV (rs115338359). The −486 A>C SNV (rs114626623) was associated with increased risk of miscarriage4 and end-stage renal disease.28 Finally, Costa et al.19 found a significant association between HLA-G 5′URR haplotypes (including notably the putatively functional −1140 A>T SNV (rs115371574)) and implantation outcome in couples undergoing assisted reproduction treatment. Considering the unique features of the HLA-G 5′URR among HLA class I genes,29 and the fact that nucleotide changes in the cis-acting regulatory elements of this region may determine its differential responsiveness to transcriptional factors,30, 31 further genetic investigations are needed to clarify the role of HLA-G 5′URR polymorphisms in the spatio-temporal expression profile of HLA-G.

In this study, we have surveyed nucleotide sequence variation at the HLA-G 5′URR, extended to nearly 1400-bp upstream of the first translated codon, in ethnically diverse global populations. Knowledge of the pattern of genetic diversity and haplotype structure at this locus in worldwide populations has important implications for understanding how HLA-G promoter polymorphisms influence HLA-G expression, and consequently tissue distribution in physiological and pathological conditions, and contribute to disease phenotypes. Our results suggest that balancing selection has maintained haplotype variation at this locus, and that the −1140 A>T SNV (rs115371574) may have some functional role in the control of HLA-G expression.


Patterns of sequence variation at the HLA-G 5′URR

Sequence variation at HLA-G 5′URR was surveyed in a 1.4-kb region upstream of the start codon that contains the known regulatory elements of HLA-G expression (Figure 1a). This region was fully sequenced in three population samples from sub-Saharan Africa, including 28 Yansi from Congo, 30 Serer from Senegal and 30 Tori from Benin (Supplementary Figure S1). A total of 27 variable sites, including 25 SNVs and 2 indels, were identified (Figures 1a and b, Supplementary Table S1). All these variants have been described in previous studies.9, 12, 20 The investigation of 5′URR HLA-G sequences in 1089 individuals from 14 worldwide populations (Supplementary Figure S1) generated by the 1000 Genomes (1KG) project32 allowed the identification of five new variants. All of these variants occurred at low frequencies in all individual populations (minor allele frequency (MAF)<0.06). One of them was exclusively found in the British sample (−1310 T); three others occurred in several populations within the same continent (−539 G in Yoruba and Luhya, −521 A in Japanese and in both Han Chinese samples, and −1098 A in Colombians and Mexicans). The fifth one (−355 A) had a more cosmopolitan distribution and was found in several continental regions: Africa (Yoruba and Luhya), Europe (British and Utah residents with European ancestry) and East Asia (both Han Chinese samples). The observed genotype frequencies at all variable sites in both African and 1KG samples were in conformity with the assumptions of Hardy–Weinberg equilibrium when tested using Fisher's exact tests (data not shown).

Figure 1

Regulatory elements and genetic variations across the HLA-G 5′URR. (a) Schematic representation of the cis-acting regulatory elements and transcription factors known to be relevant for the transcriptional control of HLA-G expression. They are located in a region of about 1450-bp upstream of the ATG translation start codon, where the Adenine is designated as nucleotide +1. The proximal region of HLA-G 5′URR contains several regulatory modules: the enhancer A (EnhA), reported to bind to the p50/p50 homodimer, the interferon-stimulated response element (ISRE) and the SXY module, a target for the multiprotein complex regulatory factor X (RFX). These regulatory elements have acquired mutations or deletions, turning the proximal HLA-G gene promoter unresponsive to classical transcriptional inducers of major histocompatibility complex (MHC) class I genes. The binding of a repressor factor RREB1 (Ras-responsive element binding 1) and of a progesterone receptor (PR) to, respectively, a Ras response element (RRE) and a progesterone response element (PRE), may also influence HLA-G expression. In the distal region of the promoter, the next module is a cis heat shock element (HSE) that binds the heat shock factor 1 (HSF1), a regulatory stress-induced protein. Interferon gamma-activated site (GAS) is another regulatory sequence located at the −734 nucleotide position, reported to be unresponsive in HLA-G. Close to GAS, in the position −746, there is another ISRE, which is responsive to interferon-γ and which modulates HLA-G expression by the binding of interferon regulatory factor 1 (IRF1). An activating transcription factor 1 (ATF1) and/or cAMP response element binding protein 1 (CREB1) can bind to three cAMP response elements (CRE). One of these is found in a locus control region (LCR) candidate, along with another RRE. The LCR region is critical for the HLA-G expression regarding when and where it should be expressed.29 The extent of the region explored in previous and present sequencing surveys is represented by a gray bar for each study. (b) Nucleotide sequence variations identified in the previous and present sequencing surveys. The presence of a genetic variant in a given study is represented by a gray square. Polymorphisms not evaluated in the corresponding study are indicated by an ‘X’.

These sequence data were then combined to those previously published in four human populations, including 47 European Americans, 44 African Americans, 43 Han Chinese and 100 Southeastern Brazilians.9, 12 Altogether, 1411 individuals representing 21 worldwide populations were evaluated for their sequence variability at HLA-G 5′URR (Supplementary Figure S1). A total of 35 variable sites (33 SNVs and 2 indels) were identified across the whole segment, 26 of which reached polymorphic frequencies (> 0.01 at the global level). Most of them are located within or very close to previously described HLA-G regulatory elements (Figure 1a). The number of variable sites identified in each individual sample was generally high (25 on average), ranging from 17 in Iberians to 28 in Colombians and Mexicans. With more than one-third of variants (13 out of 35) having their MAF within the 0.40–0.50 frequency range, the HLA-G 5′URR displays an unusual excess of intermediate-frequency variants, compared with the frequency distribution observed for the remaining variation of the genome (Figure 2).

Figure 2

Frequency distribution of 5'URR HLA-G variants. For comparison purposes, the frequency distribution of 36 382 854 SNVs identified in the whole genome of 1089 individuals from the 1KG project is shown at the left bottom.

Patterns of linkage disequilibrium (LD) across the HLA-G 5′URR region

Global LD patterns across the 5′URR of HLA-G were determined by computing the D’ statistic between all pairs of variants in each individual population sample, including the three new sub-Saharan African samples and the 14 samples from 1KG (Supplementary Figure S2), as well as in each continental region (Figure 3). Strong and significant LD was observed across the whole gene region in all population samples, with 71.4% of pairs of markers in significant LD (P<0.05) in the worldwide sample. This number remained high even after applying a Bonferroni correction to account for the number of pairs tested (52.3% with P<0.00008) and was consistently high in all continents (62.1% in Africa, 62.4% in Europe, 55.0% in East Asia and 61.6% in America). A single LD block including 33 of the 35 variants identified, encompassing the −1305 A>G and +15 A>G SNVs, was defined in all continental regions (Figure 3) and in all individual samples studied, with the exception of Tori for whom two distinct blocks were inferred (Supplementary Figure S2). As the two variable sites −1377 T>G and −1310 C>T were systematically excluded from the LD blocks defined, they were not included in the statistical inference of HLA-G 5′URR haplotypes.

Figure 3

LD patterns across the HLA-G 5′URR in four continental regions. LD plot generated by Haploview71 showing LD values (depicted as D’) for each pair of variants. Population samples were grouped according to their predominant component of ancestry: Africa (YAN, TOR, SER, YRI, LWK and ASW), Europe (IBS, TSI, CEU, GBR, FIN), East Asia (JPT, CHB, CHS) and the Americas (CLM, MXL, PUR). For comparison purposes, the 35 variation sites identified at the global level are represented, even if some of them are not present in a given continental region. Areas in dark red indicate strong LD (LOD2, D′=1), shades of pink indicate moderate LD (LOD2, D′<1), blue indicates weak LD (LOD<2, D′=1) and white indicates no LD (LOD<2, D′<1). The haplotype blocks were defined by the confidence intervals method of Gabriel.73 D′ values different from 1.0 are represented within each diamond as percentages. AA-Tan, Chicago residents with ancestry from African descent; AE-Tan, Chicago residents with ancestry from European descent; ASW, people of African ancestry from the southwestern United States; CEU, Utah residents with Northern and Westhern European ancestry; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; CN-Tan, Han Chinese from South China; CLM, Colombians from Medellín, Colombia; FIN, Finnish from Finland; GBR, British from England and Scotland; IBS, Iberian populations from Spain; JPT, Japanese from Tokyo, Japan; LOD, log of the odds; LWK, Luhya fromWebuye, Kenya; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; SER, Serer from Niakhar, Senegal; TOR, Tori from Tori-Bossito, Benin; TSI, Toscani from Italy; YAN, Yansi from Bandundu, Democratic Republic of the Congo; YRI, Yoruba from Ibadan, Nigeria.

HLA-G 5′URR haplotype diversity and genealogy

HLA-G 5′URR haplotypes were reconstructed using the PHASE software33 in the three African samples and the 14 samples from 1KG, by considering the 33 variable sites included in the previously defined LD block (from −1305 A>G to +15 G>A). We also included the three samples of Tan et al.9 in the haplotype analysis, but discarded the Brazilian sample of Castelli et al.12 because of missing genotypes at three SNVs (−990G>A, −964A>G and −810C>T) located in a fragment not covered by their sequencing survey (from positions −1049 to −764; Figure 1a). A total of 68 haplotypes were inferred across the 20 samples, of which 9 occurred at a frequency above 1% at the global level (Supplementary Tables S2 and S3). The number of inferred haplotypes greatly varied across continental regions with as many as 48 distinct haplotypes in Africa, whereas only 21, 19 and 17 haplotypes were identified in Europe, America and East Asia, respectively. Ten haplotypes had a frequency 0.05 in at least one population and 0.01 in at least one continent. Their worldwide distribution across populations and continents is shown in Figure 4. If the most common ones (Prom-1, Prom-2, Prom-3 and Prom-4) are extensively shared among different continents and show no obvious geographical structuring, some of them exhibit a more restricted geographic distribution: for instance, Prom-5 exclusively occurs in East Asia and America, Prom-6 is mostly found in Europe and America, and Prom-7 mainly occurs in Africa and America (Figure 4). It is interesting to note that only three haplotypes accounted for >75% of the total haplotype frequency at the worldwide level: Prom-1 (31.3%), Prom-2 (29.7%) and Prom-3 (15.2%). These haplotypes are homogeneously distributed across human populations and have a cumulative frequency ranging from 45 to 89% depending on the population. A high intra-population haplotype diversity was consistently observed in all population samples (mean value=0.77±0.06 s.d.), ranging from 0.67 in Iberians to 0.92 in Yansi (Supplementary Table S3). Europeans and East Asians displayed slightly lower values (0.71±0.03 and 0.73±0.03, respectively) than Africans and Americans (0.82±0.06 and 0.83±0.02, respectively).

Figure 4

HLA-G 5′URR haplotypes. Only haplotypes occurring at a frequency 0.05 in at least one population and 0.01 in at least one continental region are represented. (a) HLA-G 5′URR haplotype sequences. The derived allele of each SNV is highlighted in blue. Haplotypes were grouped according to their sequence similarity, revealing the presence of four distinct lineages. Prom-1, Prom-2, Prom-3, Prom-4, Prom-5, Prom-6, Prom-7, Prom-8 and Prom-9 correspond to the haplotypes previously named Promo-G010101a, Promo-G010102a, Promo-G0104a, Promo-G010101b, Promo-G0104b, Promo-G010101c, Promo-G010101d, Promo-G0103d and Promo-G0103e, respectively, in Castelli et al.12 (b) Worldwide frequency distribution of HLA-G 5′URR haplotypes. Apart from the 10 most common haplotypes, all other haplotypes were pooled into a single group ('others' represented in white). Africa (YAN (Yansi from Bandundu, Democratic Republic of the Congo), TOR (Tori from Tori-Bossito,Benin), SER (Serer from Niakhar, Senegal), YRI (Yoruba in Ibadan, Nigeria), LWK (Luhya in Webuye, Kenya), ASW (African Ancestry in South-West USA), AA-Tan (Chicago residents with ancestry from African descent)); Europe (IBS (Iberian population in Spain), TSI (Toscani in Italia), CEU (Utah residents with ancestry from northern and western Europe), AE-Tan (Chicago residents with ancestry from European descent), GBR (British from England and Scotland, FIN (Finnish in Finland)), East Asia (JPT (Japanese in Tokyo, Japan), CHB (Han Chinese in Beijing, China), CHS and CN-Tan (Han Chinese from South China)); America (CLM (Colombians in Medellin, Colombia), MXL (Mexican Ancestry in Los Angeles, California, USA), PUR (Puerto Ricans in Puerto Rico)); AFR, Africa; EUR, Europe; ASN, East Asia, AMR, America; ALL, for all individuals studied.

The genealogy of human HLA-G 5′URR haplotypes revealed two mains haplotype clusters separated by long branch lengths, containing six and four of the most common haplotypes, respectively (Figure 5). Prom-1 and Prom-2 each belong to one of these two main lineages. Although separated by as many as 14 mutational events, they occur at a similar frequency in the global human population and are evenly distributed across the four continental regions. Prom-3 belongs to the same lineage than Prom-2 and differs from it by only two nucleotide substitutions. The inferred ancestral sequence of human haplotypes occupies an intermediate position between the two lineages and differs by 9, 12 and 53 mutational events from the orthologous sequences of the gorilla, chimpanzee and orangutan species, respectively.

Figure 5

Median-joining network of HLA-G 5′URR haplotypes. The mutational relationships among the 10 most common haplotypes (frequency 0.05 in at least one population and 0.01 in at least one continental region) were inferred using the Network software. Circle areas are proportional to the global haplotype frequency and branch lengths to the number of mutations separating haplotypes. Circles are color-coded according to relative frequencies in continental regions (blue, Africa; red: Europe; yellow, Asia; green, America). Labels of haplotypes are indicated next to the corresponding circles, and labels of mutations on the network branches. The orthologous sequence of the orangutan species, highly divergent, is included in the network represented in the embedded box. Prom-1, Prom-2, Prom-3, Prom-4, Prom-5, Prom-6, Prom-7, Prom-8 and Prom-9 correspond to the haplotypes previously named Promo-G010101a, Promo-G010102a, Promo-G0104a, Promo-G010101b, Promo-G0104b, Promo-G010101c, Promo-G010101d, Promo-G0103d and Promo-G0103e, respectively, in Castelli et al.12

Population genetic differentiation

The level of population genetic differentiation at the HLA-G 5′URR locus was quantified using the fixation index FST,34 which measures the proportion of genetic variance explained by allele frequency differences among populations. Low levels of genetic differentiation among populations and among continents were generally observed across the whole region, with an overall interpopulation FST of 0.036 and intercontinental FST of 0.028. These values are well below the average for the human genome (Supplementary Table S4).35, 36 Thirty of the 33 variable sites (91%) exhibited interpopulation and intercontinental FST scores below or very close to 0.05 (Supplementary Table S4).

To determine whether such low FST values are atypical when compared with the genome average, we adopted an outlier approach by constructing genome-wide distributions of FST scores using the whole-genome variation data available from the 1KG project (see Materials and methods section). The FST scores of the 29 variants segregating in the 14 populations from 1KG were re-calculated by considering only these 14 samples. They were then compared with three empirical FST distributions constructed from a large number of independent SNVs drawn from the whole genome or located in genic or 5′URR regions, thereby providing three empirical P-values for each variant (Figure 6, Supplementary Figure S3 and Supplementary Table S5). Although most HLA-G 5′URR genetic variants displayed low FST values at both the population and continental levels, the −1140 A>T variant was the only one to exhibit significant empirical P-values in the three distributions tested. Although the P-values of interpopulation FST scores at this locus were very close to the 0.05 significance threshold, those relating to intercontinental FST were highly significant (1E−05, 2E−05 and 3E−03 for the genome-wide, genic and 5′URR distributions, respectively) and still remain so after applying a Bonferroni correction that accounts for the number of variants tested (corrected P=0.0017). Such atypical pattern of allele frequency differentiation therefore suggests that this locus may be subject to balancing or species-wide directional selection.

Figure 6

Interpopulation and intercontinental FST scores and their empirical P-values for the HLA-G 5′URR variants identified in the 14 population samples from the 1KG project. Empirical P-values are derived from the genome-wide empirical distribution built from 26 680 716 SNVs. Empirical P-values were estimated from the proportion of FST scores in the empirical distribution that are lower or equal to the observed value at the variant of interest within each bin of MAF. The gray vertical bars represent the FST scores at each variation site and the black dots the corresponding empirical P-values (represented as −log10 (P-value)).The gray line indicates the statistical significance threshold at 5%.

Detecting natural selection based on nucleotide diversity and site frequency spectrum

Different summary statistics representing several aspects of variation were computed in each of the 20 population samples included in the worldwide survey and several neutrality tests based on these statistics were carried out to detect departures from neutral expectations in the distribution pattern of diversity among individuals (Table 1). The statistical significance of the tests was assessed using (i) coalescent simulations (10 000 runs) of a standard, neutral equilibrium model, and (ii) empirical distributions of the test statistics constructed using either a set of 100 unlinked non-coding regions (expected to be neutrally evolving) or a set of 100 unlinked 5′URR regions, each of the same size than the region under study (see Materials and methods section). This latter approach allows a better discrimination between the influences of population history and natural selection and overcomes the difficulties of interpretation of the tests based on the standard neutral model when significant deviations are detected. It was only applied to the 14 population samples from 1KG for which genome-wide variation data were available.

Table 1 Summary diversity statistics and results of neutrality tests at the HLA-G 5′URR in the 20 worldwide population samples

A considerable amount of genetic diversity at the HLA-G 5′URR locus was consistently observed in all population samples with a mean value of nucleotide diversity π of 0.60±0.05%. In most samples, this value was significantly higher than those observed for the 100 non-coding and 100 5′URR regions sampled in the same set of individuals (Table 1). It is also about eight times higher than the average value (0.073%) estimated for 16 489 5′URR regions dispersed throughout the human genome.37 In contrast, human–chimpanzee divergence at the HLA-G 5′URR locus (1.23±0.04% on average) was within the range of values observed for most human nuclear loci,37, 38 pointing out an unusually high ratio of diversity to divergence suggestive of balancing selection.

Positive values of Tajima’s D and Fu and Li’s D* and F* were observed in all studied samples, indicating a skew in the frequency spectrum toward an excess of intermediate-frequency variants (Table 1). Statistically significant P-values (P<0.05), as assessed by simulations of the neutral equilibrium model, were observed for one or more statistics in all populations, except Japanese where Tajima’s D and Fu and Li’s F* were not significant but close to the 0.05 threshold (P=0.055 and P=0.057, respectively). When the empirical distributions, instead of the coalescent build neutral model, were considered to assess statistical significance, all populations had at least one statistic ranking among the top 5% of the highest values. These observations strengthened the hypothesis that balancing selection has influenced the patterns of variation at HLA-G 5′URR.

EHH breakdown around HLA-G 5′URR

To determine whether the genetic footprint of natural selection detected at HLA-G 5′URR is due to balancing selection directly acting at this locus, or whether it is caused by a neighboring HLA gene known to be under strong balancing selection, such as HLA-A, through a genetic hitchhiking effect, we carried out an extensive study of the patterns of LD across a 300-kb genomic region spanning the HLA-F, HLA-G, HLA-H and HLA-A class I genes, using the phased sequence data from the 1KG project. Under the assumption that HLA-G Prom-1, Prom-2 and Prom-3 haplotypes have been maintained at high frequencies in human populations through a hitchhiking effect caused by the direct action of balancing selection at the HLA-A locus, one would expect these haplotypes to be surrounded by long flanking stretches of homozygosity and strong LD should be observed between these haplotypes and HLA-A alleles. To assess whether such long-range association exists, we measured the extended haplotype homozygosity (EHH) on both sides of each core haplotype at the HLA-G 5′URR locus (considered as the ‘core’ region) across a 300-kb region. The EHH score on both sides of the three major promoter haplotypes (Prom-1, Prom-2 and Prom-3) reaches a value near zero long before reaching the neighboring HLA genes (HLA-F and HLA-H), and therefore well before HLA-A (Figure 7). HLA-G 5′URR is thus probably not involved in a hitchhiking genetic effect and may rather represent a direct target of balancing selection independently of another gene in the region.

Figure 7

EHH decay over physical distance for each core haplotype at the HLA-G 5′URR locus across a 300-kb region encompassing the HLA-F, HLA-G, HLA-H and HLA-A genes, in the four continental regions. Prom-1 (red), Prom-2 (blue) and Prom-3 (green) were contrasted to the remaining haplotypes pooled together (gray). Prom-1, Prom-2 and Prom-3 correspond to the haplotypes previously named Promo-G010101a, Promo-G010102a and Promo-G0104a, respectively, in Castelli et al.12 The black vertical line indicates the position of the HLA-G 5′URR core region. Coding RNAs, pseudogenes and non-coding RNAs are represented below as black, dark gray and light gray boxes, respectively. The genomic position (in megabases) on chromosome 6 is indicated on the horizontal axis (human GRCh37/hg19 assembly). The EHH score on both sides of Prom-1, Prom-2 and Prom-3 haplotypes reaches a value close to zero well before reaching HLA-A.


The 5′URR of classical HLA class I genes consists of two groups of juxtaposed cis-acting elements that can be viewed as regulatory modules. Both modules are divergent in HLA-G, rendering the HLA-G 5′URR unique among HLA genes.29, 39 The known regulatory elements of HLA-G present distinctive structural characteristics and extend up to ~1500-bp upstream of the first translated codon, instead of only 500 bp in other class I genes. These unusual features may be responsible for the non-classical transcriptional regulation of HLA-G and its restricted expression in specific tissues such as the trophoblast or thymic epithelial cells in non-pathological conditions.40, 41, 42 The proximal promoter region includes the enhancer A and the SXY module (Figure 1a). Modifications of these elements through mutations and the lack of the functional interferon-stimulated response element, usually present in other class I genes, render the proximal HLA-G gene promoter unresponsive to the classical transcriptional inducers of HLA class I genes (nuclear factor-kB, interferon-γ and class II trans-activator).29, 43, 44, 45 The distal region of HLA-G 5′URR includes several important regulatory loci such as a heat shock element,46 an interferon-stimulated response element,47 located next to a nonfunctional GAS (interferon gamma-activated site) element, and the locus control region, also referred to as ‘tissue-specific regulatory element’, harbored at −1438 to −1185 nucleotide positions.31, 48

Here, we present a comprehensive study of global human variation across a 1.4-kb region of the HLA-G 5′URR, encompassing the known regulatory elements that may influence the variable, tissue-specific expression observed for HLA-G. Our survey included several African populations that have been underrepresented in prior studies despite the greater genetic diversity and genetic heterogeneity observed across the African continent.49 A total of 35 segregating sites (33 SNVs and 2 indels) were identified across the whole region (Figure 1 and Supplementary Table S1). Haplotype reconstruction based on 33 segregating sites, included in a single LD block, identified 68 distinct HLA-G 5′URR haplotypes. Only three of them (Prom-1, Prom-2 and Prom-3) accounted for ~75% of the global haplotype variation (Figure 4, Supplementary Tables S2 and S3). All populations, whether African or non-African, displayed exceptionally high and similar levels of nucleotide and haplotype diversity. Moreover, an atypically low level of population differentiation (Figure 6) and an excess of polymorphism to divergence levels were observed (Table 1), suggesting a non-neutral evolution of this gene region. Statistical tests of neutrality based on allele frequency distributions revealed an excess of intermediate-frequency variants (Table 1), further supporting a history of balancing selection at the HLA-G 5′URR locus.

Evidence of balancing selection at the HLA-G promoter has been already suggested in two previous studies.9, 12 However, these reports included only a few population samples (three samples of African, European and Chinese ancestries in the study by Tan et al.9 and a Brazilian sample in the study by Castelli et al.12 and used a theoretical approach based on coalescent simulations to detect significant deviations from a standard, neutral equilibrium model. As this model makes unrealistic assumptions about population demographic history (assuming a constant size, randomly mating population), it is difficult to unambiguously interpret any significant deviation from the null model as evidence for natural selection. For example, both balancing selection and reduction in population size lead to an excess of intermediate-frequency alleles relative to what is expected under a standard neutral model. In this study, we used an empirical approach to clarify the relative contributions of demographic and selective forces, by comparing patterns of variation at HLA-G 5′URR with those observed at a large number of unlinked loci dispersed throughout the genome in the same set of individuals. Although demography affects the whole genome, selection acts upon specific loci. The HLA-G 5′URR was detected as an outlier of the empirical distributions of different test statistics (Wright’s fixation index FST, Tajima’s D and Fu and Li’s D* and F*), thus providing unequivocal evidence that balancing selection has shaped patterns of variation at this locus. Interestingly, the results of neutrality tests based on the nucleotide diversity and site frequency spectrum were slightly more significant when considering the 5′URR regions, instead of the non-coding ones, as the reference distribution, especially in the non-African samples (Table 1). This observation suggests that 5′URRs of the genome may be subjected to selective pressures with opposite effects to those of balancing selection, like purifying selection, which tends to eliminate deleterious mutations, giving rise to an excess of rare variants.50, 51

An extended analysis of the patterns of LD across a 300-kb region surrounding the HLA-G gene did not reveal any long-range associations between the HLA-G Prom-1, Prom-2 and Prom-3 haplotypes and HLA-A alleles, suggesting that this region is not involved in a hitchhiking effect because of a nearby linked gene. These data corroborate the findings of Tan et al.9 who demonstrated that the strong balancing selection evidenced at HLA-A is unlikely to have influenced the patterns of diversity observed at the HLA-G locus. However, we cannot rule out the possibility that the selection signature identified at HLA-G 5′URR results from epistatic selection, because of the interaction of HLA-G with its various partners, such as CD8, the leukocyte immunoglobulin-like receptors B1/B2 (LILRB1 and LILRB2) and the killer cell immunoglobulin-like receptor KIR2DL4. Such epistatic effects have been already experimentally demonstrated between the KIR receptors and their HLA ligands.52 Similarly, we cannot exclude the possible influence of polymorphisms in the 3′UTR of HLA-G as this gene region has also been described as being shaped by balancing selection.14 We indeed found a high LD between 5′URR and 3′UTR haplotypes in the populations from 1KG, the Prom-1, Prom-2 and Prom-3 haplotypes being strongly associated with UTR-1, UTR-2 and UTR-3, respectively (Supplementary Figure S4). However, it is noteworthy that the signatures of balancing selection evidenced in this study were much more pronounced than those described at the 3′UTR, both in terms of statistical significance and number of populations affected, rather supporting HLA-G 5′URR as the most likely target of selection.

One effect of balancing selection is to preserve two or more lineages over an extended period of time. The genealogy of HLA-G promoter haplotypes indeed revealed two major clades separated by long branch lengths, each containing one common haplotype (Prom-1 and Prom-2, respectively) (Figure 5). Prom-1 is the most divergent from the inferred ancestral haplotype and is likely to be the most recent one. Prom-1 and Prom-2 occur at a similar, high frequency (~30% each) and are evenly distributed across human populations. Our observations that these two haplotypes may be maintained at high frequency in most human populations by balancing selection suggest that they are associated with different transcriptional activities, resulting in different levels of HLA-G expression. The single variant of the HLA-G 5′URR (−725C>G,T) known to influence HLA-G expression levels24 displays the same allele (−725C) in the two haplotype sequences. However, the two haplotypes differ at 14 variable sites. Interestingly, many of these variants either coincide with or are in proximity to the known regulatory elements and may thus affect the binding of the corresponding regulatory factors (Figure 1). The maintenance of two haplotype sequences with different promoter activities may be beneficial to the individual by conferring the ability to enhance immune response flexibility by modulating allele-specific gene expression in different cell-types and in response to diverse stimuli. This would meet the competing needs to maintain both a maternal–fetal immune tolerance and an efficient host immune response to invading pathogens during human evolution. It is noteworthy that the 14 variable sites differing between Prom-1 and Prom-2 were found to be monomorphic in 74 individuals from six great ape species (Pan troglodytes, Pan paniscus, Gorilla beringei, Gorilla gorilla, Pongo abellii, Pongo pygmaeus).38 A similar observation was made for the 35 genetic variants of human HLA-G 5′URR. Therefore, the hypothesis of trans-species polymorphisms maintained for several million years at this locus, through shared and long-term balancing selection pressures, seems unlikely.

An atypical pattern of geographic differentiation was disclosed at the position −1140, probably as a result of balancing selection, pointing to a potential functional relevance of the −1140 A>T polymorphism. This variant is located very close to the locus control region/tissue-specific regulatory element region, situated at least 1.2-kb upstream to the first translated ATG, which controls HLA-G expression in a tissue-specific manner, through interactions with various cis-regulatory elements (Figure 1).53 The recent identification of a new mechanism for the regulation of HLA-G expression, involving a direct interaction of the Hedgehog pathway signaling transducer factor GLI-3 with the promoter of HLA-G at a very nearby position (−1108),54 suggests that the −1140 A>T SNV may be located in an important regulatory region for the control of HLA-G expression. Interestingly, the two main 5′URR haplotypes suspected to be maintained by balancing selection carry different alleles at the −1140 position (A in Prom-1 and T in Prom-2). Under the hypothesis that these alleles confer different binding affinities to transcriptional factors, they may be associated with different transcriptional levels of the HLA-G mRNA. Considering the high allelic associations between Prom-1 and 3′UTR-1, and between Prom-2 and 3′UTR-2 (Supplementary Figure S4), and the fact that 3′UTR-1 has been associated with a higher HLA-G production than 3′UTR-2,55 the Prom-2 haplotype could be associated with a lower HLA-G expression, in opposition to Prom-1 that could be considered as a high-expressing haplotype. However, there are still controversial results regarding the functional effects of 3′UTR-1 and 3′UTR-2 on the level of HLA-G production29 and few data are currently available about the functionality of the −1140 SNV. Therefore, further investigations are needed to corroborate this hypothesis.

Interestingly, the 14-bp indel located at the 3′UTR of HLA-G also showed an atypical geographic differentiation profile and was also pointed out as a putative target of balancing selection.14 Contrary to the −1140 A>T SNV, the functionality of this indel is well established, with a significant effect on the HLA-G mRNA stability, the magnitude of HLA-G production and the level of inhibition of natural killer cell cytotoxicity.56, 57, 58, 59, 60, 61 Therefore, a high LD between this indel and the −1140 A>T promoter variant may explain the pattern observed at the −1140 position. A significant allelic association between these two variants was indeed observed using the 1KG data, with moderate-to-strong LD depending on the population analyzed (r2 values ranging from 0.57–0.62 in Africa and America to 0.88–0.92 in Europe and East Asia) (A Sabbagh, unpublished data). Such a high LD can make it difficult for the identification of the true causal variant when a significant association is detected with a given phenotype. Only functionality tests assessing the impact of this polymorphism on the HLA-G promoter activity will determine whether it has a functional effect, independently of the 14-bp indel. Note, however, that the −1140 A>T SNV is by far the variant with the most extreme geographic differentiation pattern at both the interpopulation and intercontinental levels when considering the whole set of variants occurring across the entire human HLA-G gene (A Sabbagh, unpublished data). This further supports the possible functional relevance of this specific variant.

Although the 3′UTR seems to undergo the same pattern of selective pressures than the one observed at the HLA-G promoter, the selective forces acting at the coding region seem rather different. Limited protein diversity has indeed been described at the HLA-G coding region in human populations.7, 12, 21, 62, 63, 64 Although the HLA-G region presents several variation sites, the majority of them are either intronic or synonymous substitutions, yielding only 16 different full-length proteins and two truncated molecules (null alleles), of which only five are frequently observed in worldwide populations.7, 21 One of them, the full-length molecule known as G*01:01, largely predominates over the others, representing about 60% of all HLA-G coding haplotypes.21 The relatively low variability, excess of synonymous changes and frequency spectrum suggest that the coding region of the HLA-G gene may suffer a strong selective pressure for the conservation of the protein sequence. Although the frequency distributions of HLA-G coding alleles were concordant with neutral expectations in three populations,65 strong evidence of purifying selection was demonstrated in a more recent study.63 Interestingly, the two major HLA-G 5′URR haplotypes, Prom-1 and Prom-2, are both mostly associated with coding haplotypes encoding the same G*01:01 molecule21, thereby suggesting that the functionality of HLA-G may be rather regulated by variability in the regulatory regions.

In conclusion, we characterized global patterns of HLA-G 5′URR variation in a wide range of worldwide populations and provided unequivocal evidence that natural selection has shaped patterns of genetic diversity at this gene segment. Our findings contribute to understanding of how variation at the HLA-G 5′URR may have been adaptive for dealing with both fetal tolerance and exposure to pathogens during human evolution, and have important implications for understanding the role of promoter variants in the control of HLA-G expression.

Materials and methods

African population samples and Sanger sequencing

HLA-G 5′URR nucleotide variation was surveyed in 88 individuals from three sub-Saharan African population samples. The first one was composed of 28 Yansi individuals from the Bandundu province of the Democratic Republic of Congo, located about 400 km East of Kinshasa. The second one was composed of 30 Serer individuals living in the Niakhar area, located 150 km Southeast from Dakar, Senegal. The last one included 30 Tori individuals living in the district of Tori-Bossito, a semi-rural area located at the coastal plain of Southern Benin, 40 km Northeast of Cotonou, the economic capital. All individuals were unrelated, randomly selected adults previously used as controls in case–control or family studies.66, 67, 68 The relevant local research ethics committee of each area (Ethics Committee of the Health Minister of Senegal, Ethics Committee of the Public Health Ministry of Democratic Republic of the Congo, and Institutional Ethics Committee of the Faculté des Sciences de la Santé at the University of Abomey-Calavi in Benin) reviewed and approved the protocol of the study. All participants gave written informed consent before blood withdrawal.

A PCR amplified fragment of approximately 1800 bp including the - 1446 (5′URR) and +388 (exon2) nucleotides (considering the Adenine of the first translated ATG codon as nucleotide +1) was produced for each individual, using the GPROMO.S—5′-IndexTermACATTCTAGAAGCTTCACAAGAATG-3′ and GPROMO.R—5′-IndexTermGCCTTGGTGTTCCGTGTCT-3′ primers and the same PCR conditions as previously described.12 The amplified products were visualized by electrophoresis on 7% polyacrylamide gel electrophoresis stained with silver, and directly sequenced in an ABI310 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) using nine internal primers and PCR primers, as described elsewhere.12 Sequence variation was evaluated using the 1456-bp fragment, encompassing nucleotides −1421 to +35, which contains the regulatory elements known to be relevant for the transcriptional control of HLA-G expression.39, 42 Tori DNA samples were amplified and sequenced by Genoscreen (Lille, France) following the same protocol as described above.

Data retrieval

The HLA-G 5′URR sequence data of 1089 unrelated individuals drawn from 14 different populations in Europe, East Asia, sub-Saharan Africa and the Americas were directly downloaded from the 1000 Genomes Project website ( using the phase 1 integrated release version 3 released in April 2012.32 To compare with the sequence data generated in the three African samples, we considered the same 1456-bp gene segment encompassing nucleotides −1421 to +35 (chr6:29 794 201–29 795 656 in the human GRCh37/hg19 assembly). Whole-genome polymorphism low coverage data from the same individuals (~36 million variants) were also downloaded to perform tests of natural selection, as described below.

Sequence data available in the literature for the same promoter region were also included in this study. A 1321-bp fragment encompassing the −1305 and +15 nucleotides was sequenced by Tan et al.9 in 47 European American, 44 African American and 43 Han Chinese individuals. Another sequencing survey of HLA-G 5′URR concerned a 1151-bp fragment spanning nucleotides from −1399 to −1049 and from −764 to +35 in 100 individuals from Southeastern Brazil (Ribeirão Preto, State of São Paulo).12 Other previously published data were not included in the present analysis because of (a) a lack of information on the ethnicity of sampled individuals,69 (b) a selection bias when defining the individuals to be sequenced for the HLA-G promoter region,18 (c) a lack of frequency data provided in the published paper19, 20 and (d) a lack of allele frequency data for SNVs with MAF<0.02.4, 28

LD, tagging and haplotype analyses

Strength of LD between pairs of markers was measured as D’70 using the Haploview software v4.2,71 and statistical significance of LD was assessed by a Fisher’s exact test of LD using the Arlequin v. software.72 LD blocks were defined using the confidence interval method of Gabriel73 as implemented in Haploview. Haplotype reconstruction was performed using Bayesian method74 implemented in PHASE v2.1.1,33 using the default parameter values in the Markov chain Monte Carlo simulations and the ‘multiallelic loci’ option to account for the −725C>G,T triallelic SNV. For each population sample, we applied the algorithm 10 times with different seeds for the random number generator, and checked for consistency of the results across the independent runs in order to verify that the algorithm did not converge to a local, rather than global, mode of the posterior distribution. We chose the results from the run displaying the best average goodness-of-fit of the estimated haplotypes to the underlying coalescent model. Haplotype networks describing the mutational relationships among the inferred haplotypes were generated by using the median joining algorithm75 of the Network software ( Analysis of molecular variance76 and measure of haplotype diversity based on estimated haplotype frequencies were computed using Arlequin v. software.72

Test for selective neutrality based on population genetic differentiation

To determine whether an atypical pattern of population genetic differentiation is observed at the HLA-G 5′URR locus, we used the fixation index FST,34 which quantifies the proportion of genetic variance explained by allele frequency differences among populations. FST ranges from 0 (for genetically identical populations) to 1 (for completely differentiated populations). We calculated FST values at each locus using the BioPerl module PopGen77 at two different levels: (i) among populations and (ii) among continental regions, that is, when the individual samples were grouped into major geographical regions (Africa, Europe, East Asia, and America) according to their predominant component of ancestry. Extreme values of FST can result from natural selection but also from nonselective events such as demographic changes and genetic drift. As these latter events randomly act on the genome, they are expected to have the same average effect across the genome, in contrast to natural selection, which impacts population differentiation in a locus-specific manner. The genome-wide variation data of the 14 populations from 1KG can thus be used to infer the action of natural selection by adopting an outlier approach.78 For that purpose, we built empirical distributions from three sets of autosomal SNVs located throughout the genome. The first set contains the whole set of SNVs detected in the genome of 1089 unrelated individuals included in 1KG (~36 million SNVs). A LD-based pruning procedure was applied to these genome-wide genotype data using Plink79 with default parameters (pruning based on a variance inflation factor of two within each sliding window of 50 SNVs with a step of five SNVs), leaving a total of 26 680 716 independent autosomal SNVs, referred hereafter to as ’GW’. The second set of SNVs was built from the subset of variants annotated in dbSNP (build 137) as located in gene regions. We kept those remaining after a LD-based pruning procedure using the same parameters as described above, thus retrieving a ’Genic’ set containing 11 286 949 independent genic SNVs. The third set was built from the SNVs annotated in dbSNP as belonging to 5′URRs: 24 395 SNVs remained after applying the LD-based pruning procedure, making up the ’5′URR’ set. FST scores were computed for all SNVs included in the three sets, thus providing reference distributions to assess whether the pattern of genetic differentiation observed at a locus of interest is atypical. As we were interested in detecting signals of balancing selection, we focused on the lower tail of the FST distributions. Therefore, the estimated empirical P-value corresponds to the proportion of FST scores in the empirical distribution that are lower or equal to the observed value at the locus of interest. As FST strongly correlates with heterozygosity,36, 80, 81 empirical P-values were calculated within bins of SNVs grouped according to MAF. A total of 27 bins were considered for the whole MAF range: 10 bins of size 0.001 for MAF between 0 and 0.01, 9 bins of size 0.01 for MAF between 0.01 and 0.10, and 8 bins of size 0.05 for MAF between 0.10 and 0.50.

Tests for selective neutrality based on nucleotide diversity and site frequency spectrum

To assess whether natural selection has influenced the frequency spectrum of mutations compared with that expected under the standard neutral model, we computed summary diversity statistics and performed several neutrality tests, including Tajima’s D82 and Fu and Li’s D* and F*,83 with DnaSP v.5.10.84 These statistics compare different estimators of the population mutation rate θ =4Nμ, which have the same expectation under neutrality. Tajima’s D compares estimates of θ derived from the average number of pairwise differences (π) and the number of segregating sites (S). Fu and Li’s D* compares estimates of θ derived from the number of segregating sites (S) and the number of singleton mutations (ηs, alleles appearing only once in the sample) and Fu and Li’s F* compares estimates of θ derived from the average number of pairwise differences (π) and the number of singleton mutations (ηs). The statistical significance of the tests was first estimated from 10 000 coalescent simulations of an infinite site locus that condition on the sample size, with recombination (value of the recombination parameter estimated from the observed sequence data). As rejection of these tests may be caused by violation of any of the assumptions in the null hypotheses (neutrality, constant size, panmixia), the effects of natural selection and demographic history is not easy to disentangle. Therefore, we opted for an empirical approach and selected two sets of 100 regions in the human genome to compute two reference distributions for each summary diversity statistic, using the genome-wide data available for the 14 populations from 1KG. The first one includes 100 unlinked non-coding regions expected to be mostly neutrally evolving. More specifically, we selected 100 autosomal regions dispersed throughout the human genome that met the following criteria: (i) to be at least 100 kb away from any known or predicted genes or expressed sequence tag or region transcribed into mRNA, (ii) to be outside any segmental duplication or region transcribed into a long non-coding RNA or conserved non-coding element (as defined in Woolfe et al.85), (iii) to be distant from each other by at least 100 kb and not in LD with each other, (iv) to be of the same size than the HLA-G 5′URR studied (1400 bp), and (v) to contain at least eight SNVs. The second set includes 100 5′URRs that met criteria (iii), (iv) and (v). For each region, Tajima’s D, Fu and Li’s D* and F* scores were computed to obtain the null distribution of these test statistics in each individual population sample, for both neutrally evolving regions and 5′URRs of the genome. To test for balancing selection, empirical P-values were estimated from the proportion of regions showing a test statistic greater or equal to the value observed at the HLA-G 5′URR locus within each distribution. We considered as significant any P-value below 0.05.

Decay of EHH on both either sides of the HLA-G 5′URR locus

To assess how LD breaks down with increasing distance from a specific core haplotype at the HLA-G 5′URR locus (designated as the ‘core’ region), we used the EHH statistic,86 which measures the span of SNV homozygosity at increasing distances from that core. For instance, for a population of individuals sharing the Prom-1 haplotype, EHH at a distance ‘X’ from the locus is the probability that two random chromosomes carrying the Prom-1 haplotype are identical by descent for the entire interval from the locus to ‘X’. EHH is on a scale of 0 (no homozygosity, all extended haplotypes are different) to 1 (complete homozygosity, all extended haplotypes are the same). The EHH score was computed on both sides of each of the three 5′URR haplotypes suspected to be targeted by balancing selection (Prom-1, Prom-2 and Prom-3) across a 300-kb region encompassing the HLA-F, HLA-G, HLA-H, and HLA-A genes, using the R-EHH package in R software ( The analysis was performed separately in each continental region (Africa, Europe, East Asia and America) using the phased sequence data from the 1KG project.


  1. 1

    Rouas-Freiss N, Goncalves RM, Menier C, Dausset J, Carosella ED . Direct evidence to support the role of HLA-G in protecting the fetus from maternal uterine natural killer cytolysis. Proc Natl Acad Sci USA 1997; 94: 11520–11525.

    CAS  PubMed  Google Scholar 

  2. 2

    Le Gal FA, Riteau B, Sedlik C, Khalil-Daher I, Menier C, Dausset J et al. HLA-G-mediated inhibition of antigen-specific cytotoxic T lymphocytes. Int Immunol 1999; 11: 1351–1356.

    CAS  PubMed  Google Scholar 

  3. 3

    Ristich V, Liang S, Zhang W, Wu J, Horuzsko A . Tolerization of dendritic cells by HLA-G. Eur J Immunol 2005; 35: 1133–1142.

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4

    Berger DS, Hogge WA, Barmada MM, Ferrell RE . Comprehensive analysis of HLA-G: implications for recurrent spontaneous abortion. Reprod Sci 2010; 17: 331–338.

    CAS  PubMed  Google Scholar 

  5. 5

    Naji A, Menier C, Morandi F, Agaugue S, Maki G, Ferretti E et al. Binding of HLA-G to ITIM-bearing Ig-like transcript 2 receptor suppresses B cell responses. J Immunol 2014; 192: 1536–1546.

    CAS  PubMed  Google Scholar 

  6. 6

    Larsen MH, Hviid TV . Human leukocyte antigen-G polymorphism in relation to expression, function, and disease. Hum Immunol 2009; 70: 1026–1034.

    CAS  PubMed  Google Scholar 

  7. 7

    Donadi EA, Castelli EC, Arnaiz-Villena A, Roger M, Rey D, Moreau P . Implications of the polymorphism of HLA-G on its function, regulation, evolution and disease association. Cell Mol Life Sci 2011; 68: 369–395.

    CAS  PubMed  Google Scholar 

  8. 8

    Gonzalez A, Rebmann V, LeMaoult J, Horn PA, Carosella ED, Alegre E . The immunosuppressive molecule HLA-G and its clinical implications. Crit Rev Clin Lab Sci 2012; 49: 63–84.

    CAS  PubMed  Google Scholar 

  9. 9

    Tan Z, Shon AM, Ober C . Evidence of balancing selection at the HLA-G promoter region. Hum Mol Genet 2005; 14: 3619–3628.

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10

    Alvarez M, Piedade J, Balseiro S, Ribas G, Regateiro F . HLA-G 3’-UTR SNP and 14-bp deletion polymorphisms in Portuguese and Guinea-Bissau populations. Int J Immunogenet 2009; 36: 361–366.

    CAS  PubMed  Google Scholar 

  11. 11

    Castelli EC, Mendes-Junior CT, Deghaide NH, de Albuquerque RS, Muniz YC, Simoes RT et al. The genetic structure of 3’untranslated region of the HLA-G gene: polymorphisms and haplotypes. Genes Immun 2010; 11: 134–141.

    CAS  PubMed  Google Scholar 

  12. 12

    Castelli EC, Mendes-Junior CT, Veiga-Castelli LC, Roger M, Moreau P, Donadi EA . A comprehensive study of polymorphic sites along the HLA-G gene: implication for gene regulation and evolution. Mol Biol Evol 2011; 28: 3069–3086.

    CAS  PubMed  Google Scholar 

  13. 13

    Lucena-Silva N, Monteiro AR, de Albuquerque RS, Gomes RG, Mendes-Junior CT, Castelli EC et al. Haplotype frequencies based on eight polymorphic sites at the 3’ untranslated region of the HLA-G gene in individuals from two different geographical regions of Brazil. Tissue Antigens 2012; 79: 272–278.

    CAS  PubMed  Google Scholar 

  14. 14

    Sabbagh A, Luisi P, Castelli EC, Gineau L, Courtin D, Milet J et al. Worldwide genetic variation at the 3’ untranslated region of the HLA-G gene: balancing selection influencing genetic diversity. Genes Immun 2014; 15: 95–106.

    CAS  PubMed  Google Scholar 

  15. 15

    Tan Z, Randall G, Fan J, Camoretti-Mercado B, Brockman-Schneider R, Pan L et al. Allele-specific targeting of microRNAs to HLA-G and risk of asthma. Am J Hum Genet 2007; 81: 829–834.

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    Yie SM, Li LH, Xiao R, Librach CL . A single base-pair mutation in the 3’-untranslated region of HLA-G mRNA is associated with pre-eclampsia. Mol Hum Reprod 2008; 14: 649–653.

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17

    Castelli EC, Moreau P, Oya e Chiromatzo A, Mendes-Junior CT, Veiga-Castelli LC, Yaghi L et al. In silico analysis of microRNAS targeting the HLA-G 3’ untranslated region alleles and haplotypes. Hum Immunol 2009; 70: 1020–1025.

    CAS  PubMed  Google Scholar 

  18. 18

    Ober C, Aldrich CL, Chervoneva I, Billstrand C, Rahimov F, Gray HL et al. Variation in the HLA-G promoter region influences miscarriage rates. Am J Hum Genet 2003; 72: 1425–1435.

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19

    Costa CH, Gelmini GF, Wowk PF, Mattar SB, Vargas RG, Roxo VM et al. HLA-G regulatory haplotypes and implantation outcome in couples who underwent assisted reproduction treatment. Hum Immunol 2012; 73: 891–897.

    CAS  PubMed  Google Scholar 

  20. 20

    Martinez-Laso J, Herraiz MA, Penaloza J, Barbolla ML, Jurado ML, Macedo J et al. Promoter sequences confirm the three different evolutionary lineages described for HLA-G. Hum Immunol 2013; 74: 383–388.

    CAS  PubMed  Google Scholar 

  21. 21

    Castelli EC, Ramalho J, Porto I, Lima TH, Felicio LP, Sabbagh A et al. Insights into HLA-G genetics provided by worldwide haplotype diversity. Front Immunol 2014; 5: 476.

    PubMed  PubMed Central  Google Scholar 

  22. 22

    Nicolae D, Cox NJ, Lester LA, Schneider D, Tan Z, Billstrand C et al. Fine mapping and positional candidate studies identify HLA-G as an asthma susceptibility gene on chromosome 6p21. Am J Hum Genet 2005; 76: 349–357.

    CAS  PubMed  Google Scholar 

  23. 23

    Doherty VL, Rush AN, Brennecke SP, Moses EK . The -56T HLA-G promoter polymorphism is not associated with pre-eclampsia/eclampsia in Australian and New Zealand women. Hypertens Pregnancy 2006; 25: 63–71.

    CAS  PubMed  Google Scholar 

  24. 24

    Ober C, Billstrand C, Kuldanek S, Tan Z . The miscarriage-associated HLA-G -725G allele influences transcription rates in JEG-3 cells. Hum Reprod 2006; 21: 1743–1748.

    CAS  PubMed  Google Scholar 

  25. 25

    Kroner A, Grimm A, Johannssen K, Maurer M, Wiendl H . The genetic influence of the nonclassical MHC molecule HLA-G on multiple sclerosis. Hum Immunol 2007; 68: 422–425.

    CAS  PubMed  Google Scholar 

  26. 26

    Kim SK, Hong MS, Shin MK, Uhm YK, Chung JH, Lee MH . Promoter polymorphisms of the HLA-G gene, but not the HLA-E and HLA-F genes, is associated with non-segmental vitiligo patients in the Korean population. Arch Dermatol Res 2011; 303: 679–684.

    CAS  PubMed  Google Scholar 

  27. 27

    Kim SK, Chung JH, Kim DH, Yun DH, Hong SJ, Lee KH . Lack of association between promoter polymorphisms of HLA-G gene and rheumatoid arthritis in Korean population. Rheumatol Int 2012; 32: 509–512.

    CAS  PubMed  Google Scholar 

  28. 28

    Misra MK, Prakash S, Kapoor R, Pandey SK, Sharma RK, Agrawal S . Association of HLA-G promoter and 14-bp insertion-deletion variants with acute allograft rejection and end-stage renal disease. Tissue Antigens 2013; 82: 317–326.

    CAS  PubMed  Google Scholar 

  29. 29

    Castelli EC, Veiga-Castelli LC, Yaghi L, Moreau P . Transcriptional and posttranscriptional regulations of the HLA-G gene. J Immunol Res 2014; 2014: 734068.

    PubMed  PubMed Central  Google Scholar 

  30. 30

    Schmidt CM, Ehlenfeldt RG, Athanasiou MC, Duvick LA, Heinrichs H, David CS et al. Extraembryonic expression of the human MHC class I gene HLA-G in transgenic mice. Evidence for a positive regulatory region located 1 kilobase 5’ to the start site of transcription. J Immunol 1993; 151: 2633–2645.

    CAS  PubMed  Google Scholar 

  31. 31

    Moreau P, Paul P, Gourand L, Prost S, Dausset J, Carosella E et al. HLA-G gene transcriptional regulation in trophoblasts and blood cells: differential binding of nuclear factors to a regulatory element located 1.1 kb from exon 1. Hum Immunol 1997; 52: 41–46.

    CAS  PubMed  Google Scholar 

  32. 32

    Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE et al. An integrated map of genetic variation from 1,092 human genomes. Nature 2012; 491: 56–65.

    PubMed  PubMed Central  Google Scholar 

  33. 33

    Stephens M, Donnelly P . A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet 2003; 73: 1162–1169.

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34

    Wright S . The genetical structure of populations. Ann Eugen 1951; 15: 323–354.

    CAS  PubMed  Google Scholar 

  35. 35

    Akey JM, Zhang G, Zhang K, Jin L, Shriver MD . Interrogating a high-density SNP map for signatures of natural selection. Genome Res 2002; 12: 1805–1814.

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36

    Elhaik E . Empirical distributions of F(ST) from large-scale human polymorphism data. PLoS ONE 2012; 7: e49837.

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37

    Mu XJ, Lu ZJ, Kong Y, Lam HY, Gerstein MB . Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project. Nucleic Acids Res 2011; 39: 7058–7076.

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38

    Prado-Martinez J, Sudmant PH, Kidd JM, Li H, Kelley JL, Lorente-Galdos B et al. Great ape genetic diversity and population history. Nature 2013; 499: 471–475.

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39

    Solier C, Mallet V, Lenfant F, Bertrand A, Huchenq A, Le Bouteiller P . HLA-G unique promoter region: functional implications. Immunogenetics 2001; 53: 617–625.

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40

    Kovats S, Main EK, Librach C, Stubblebine M, Fisher SJ, DeMars R . A class I antigen, HLA-G, expressed in human trophoblasts. Science 1990; 248: 220–223.

    CAS  PubMed  Google Scholar 

  41. 41

    Crisa L, McMaster MT, Ishii JK, Fisher SJ, Salomon DR . Identification of a thymic epithelial cell subset sharing expression of the class Ib HLA-G molecule with fetal trophoblasts. J Exp Med 1997; 186: 289–298.

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42

    Moreau P, Flajollet S, Carosella ED . Non-classical transcriptional regulation of HLA-G: an update. J Cell Mol Med 2009; 13: 2973–2989.

    PubMed  PubMed Central  Google Scholar 

  43. 43

    Gobin SJ, Peijnenburg A, Keijsers V, van den Elsen PJ . Site alpha is crucial for two routes of IFN gamma-induced MHC class I transactivation: the ISRE-mediated route and a novel pathway involving CIITA. Immunity 1997; 6: 601–611.

    CAS  PubMed  Google Scholar 

  44. 44

    Gobin SJ, Keijsers V, van Zutphen M, van den Elsen PJ . The role of enhancer A in the locus-specific transactivation of classical and nonclassical HLA class I genes by nuclear factor kappa B. J Immunol 1998; 161: 2276–2283.

    CAS  PubMed  Google Scholar 

  45. 45

    Gobin SJ, van Zutphen M, Woltman AM, van den Elsen PJ . Transactivation of classical and nonclassical HLA class I genes through the IFN-stimulated response element. J Immunol 1999; 163: 1428–1434.

    CAS  PubMed  Google Scholar 

  46. 46

    Ibrahim EC, Morange M, Dausset J, Carosella ED, Paul P . Heat shock and arsenite induce expression of the nonclassical class I histocompatibility HLA-G gene in tumor cell lines. Cell Stress Chaperones 2000; 5: 207–218.

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47

    Lefebvre S, Berrih-Aknin S, Adrian F, Moreau P, Poea S, Gourand L et al. A specific interferon (IFN)-stimulated response element of the distal HLA-G promoter binds IFN-regulatory factor 1 and mediates enhancement of this nonclassical class I gene by IFN-beta. J Biol Chem 2001; 276: 6133–6139.

    CAS  PubMed  Google Scholar 

  48. 48

    Schmidt CM, Chen HL, Chiu I, Ehlenfeldt RG, Hunt JS, Orr HT . Temporal and spatial expression of HLA-G messenger RNA in extraembryonic tissues of transgenic mice. J Immunol 1995; 155: 619–629.

    CAS  PubMed  Google Scholar 

  49. 49

    Campbell MC, Tishkoff SA . African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping. Annu Rev Genomics Hum Genet 2008; 9: 403–433.

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50

    Bamshad M, Wooding SP . Signatures of natural selection in the human genome. Nat Rev Genet 2003; 4: 99–111.

    CAS  PubMed  Google Scholar 

  51. 51

    Nielsen R, Hellmann I, Hubisz M, Bustamante C, Clark AG . Recent and ongoing selection in the human genome. Nat Rev Genet 2007; 8: 857–868.

    CAS  PubMed  PubMed Central  Google Scholar 

  52. 52

    Single RM, Martin MP, Gao X, Meyer D, Yeager M, Kidd JR et al. Global diversity and evidence for coevolution of KIR and HLA. Nat Genet 2007; 39: 1114–1119.

    CAS  PubMed  PubMed Central  Google Scholar 

  53. 53

    Li Q, Peterson KR, Fang X, Stamatoyannopoulos G . Locus control regions. Blood 2002; 100: 3077–3086.

    CAS  PubMed  PubMed Central  Google Scholar 

  54. 54

    Deschaseaux F, Gaillard J, Langonne A, Chauveau C, Naji A, Bouacida A et al. Regulation and function of immunosuppressive molecule human leukocyte antigen G5 in human bone tissue. Faseb J 2013; 27: 2977–2987.

    CAS  PubMed  Google Scholar 

  55. 55

    Martelli-Palomino G, Pancotto JA, Muniz YC, Mendes-Junior CT, Castelli EC, Massaro JD et al. Polymorphic sites at the 3’ untranslated region of the HLA-G gene are associated with differential hla-g soluble levels in the Brazilian and French population. PLoS ONE 2013; 8: e71742.

    CAS  PubMed  PubMed Central  Google Scholar 

  56. 56

    Hiby SE, King A, Sharkey A, Loke YW . Molecular studies of trophoblast HLA-G: polymorphism, isoforms, imprinting and expression in preimplantation embryo. Tissue Antigens 1999; 53: 1–13.

    CAS  PubMed  PubMed Central  Google Scholar 

  57. 57

    O’Brien M, McCarthy T, Jenkins D, Paul P, Dausset J, Carosella ED et al. Altered HLA-G transcription in pre-eclampsia is associated with allele specific inheritance: possible role of the HLA-G gene in susceptibility to the disease. Cell Mol Life Sci 2001; 58: 1943–1949.

    PubMed  Google Scholar 

  58. 58

    Rebmann V, van der Ven K, Passler M, Pfeiffer K, Krebs D, Grosse-Wilde H . Association of soluble HLA-G plasma levels with HLA-G alleles. Tissue Antigens 2001; 57: 15–21.

    CAS  PubMed  PubMed Central  Google Scholar 

  59. 59

    Hviid TV, Hylenius S, Rorbye C, Nielsen LG . HLA-G allelic variants are associated with differences in the HLA-G mRNA isoform profile and HLA-G mRNA levels. Immunogenetics 2003; 55: 63–79.

    CAS  PubMed  PubMed Central  Google Scholar 

  60. 60

    Rousseau P, Le Discorde M, Mouillot G, Marcou C, Carosella ED, Moreau P . The 14 bp deletion-insertion polymorphism in the 3’ UT region of the HLA-G gene influences HLA-G mRNA stability. Hum Immunol 2003; 64: 1005–1010.

    CAS  PubMed  PubMed Central  Google Scholar 

  61. 61

    Svendsen SG, Hantash BM, Zhao L, Faber C, Bzorek M, Nissen MH et al. The expression and functional activity of membrane-bound human leukocyte antigen-G1 are influenced by the 3’-untranslated region. Hum Immunol 2013; 74: 818–827.

    CAS  PubMed  Google Scholar 

  62. 62

    Castelli EC, Mendes-Junior CT, Donadi EA . HLA-G alleles and HLA-G 14 bp polymorphisms in a Brazilian population. Tissue Antigens 2007; 70: 62–68.

    CAS  PubMed  PubMed Central  Google Scholar 

  63. 63

    Mendes-Junior CT, Castelli EC, Meyer D, Simoes AL, Donadi EA . Genetic diversity of the HLA-G coding region in Amerindian populations from the Brazilian Amazon: a possible role of natural selection. Genes Immun 2013; 14: 518–526.

    CAS  PubMed  Google Scholar 

  64. 64

    Santos KE, Lima TH, Felicio LP, Massaro JDPalomino GM, Silva AC et al. Insights on the HLA-G evolutionary history provided by a nearby Alu insertion. Mol Biol Evol 2013; 30: 2423–2424.

    CAS  PubMed  Google Scholar 

  65. 65

    Julie DC, Buhler S, Frassati C, Basire A, Galicher V, Baier C et al. Linkage disequilibrium between HLA-G*0104 and HLA-E*0103 alleles in Tswa Pygmies. Tissue Antigens 2011; 77: 193–200.

    CAS  PubMed  Google Scholar 

  66. 66

    Courtin D, Milet J, Jamonneau V, Yeminanga CS, Kumeso VK, Bilengue CM et al. Association between human African trypanosomiasis and the IL6 gene in a Congolese population. Infect Genet Evol 2007; 7: 60–68.

    CAS  PubMed  Google Scholar 

  67. 67

    Milet J, Nuel G, Watier L, Courtin D, Slaoui Y, Senghor P et al. Genome wide linkage study, using a 250K SNP map, of Plasmodium falciparum infection and mild malaria attack in a Senegalese population. PLoS ONE 2010; 5: e11616.

    PubMed  PubMed Central  Google Scholar 

  68. 68

    Le Port A, Cottrell G, Martin-Prevel Y, Migot-Nabias F, Cot M, Garcia A . First malaria infections in a cohort of infants in Benin: biological, environmental and genetic determinants. Description of the study site, population methods and preliminary results. BMJ Open 2012; 2: e000342.

    PubMed  PubMed Central  Google Scholar 

  69. 69

    Cervera I, Herraiz MA, Penaloza J, Barbolla ML, Jurado ML, Macedo J et al. Human leukocyte antigen-G allele polymorphisms have evolved following three different evolutionary lineages based on intron sequences. Hum Immunol 2010; 71: 1109–1115.

    CAS  PubMed  Google Scholar 

  70. 70

    Lewontin RC . The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 1964; 49: 49–67

    CAS  PubMed  PubMed Central  Google Scholar 

  71. 71

    Barrett JC, Fry B, Maller J, Daly MJ . Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 2005; 21: 263–265.

    CAS  PubMed  PubMed Central  Google Scholar 

  72. 72

    Excoffier L, Laval G, Schneider S . Arlequin (version 3.0): an integrated software package for population genetics data analysis. Evol Bioinform Online 2007; 1: 47–50.

    PubMed  PubMed Central  Google Scholar 

  73. 73

    Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B et al. The structure of haplotype blocks in the human genome. Science 2002; 296: 2225–2229.

    CAS  Google Scholar 

  74. 74

    Stephens M, Smith NJ, Donnelly P . A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 2001; 68: 978–989.

    CAS  PubMed  PubMed Central  Google Scholar 

  75. 75

    Bandelt HJ, Forster P, Rohl A . Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol 1999; 16: 37–48.

    CAS  PubMed  Google Scholar 

  76. 76

    Excoffier L, Smouse PE, Quattro JM . Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 1992; 131: 479–491.

    CAS  PubMed  PubMed Central  Google Scholar 

  77. 77

    Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res 2002; 12: 1611–1618.

    CAS  PubMed  PubMed Central  Google Scholar 

  78. 78

    Kelley JL, Madeoy J, Calhoun JC, Swanson W, Akey JM . Genomic signatures of positive selection in humans and the limits of outlier approaches. Genome Res 2006; 16: 980–989.

    CAS  PubMed  PubMed Central  Google Scholar 

  79. 79

    Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; 81: 559–575.

    CAS  PubMed  PubMed Central  Google Scholar 

  80. 80

    Beaumont MA, Nichols RA . Evaluating loci for use in the genetic analysis of population structure. Proc R Soc 1996; 263: 1619–1626.

    Google Scholar 

  81. 81

    Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L . Natural selection has driven population differentiation in modern humans. Nat Genet 2008; 40: 340–345.

    CAS  PubMed  Google Scholar 

  82. 82

    Tajima F . Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 1989; 123: 585–595.

    CAS  PubMed  PubMed Central  Google Scholar 

  83. 83

    Fu YX, Li WH . Statistical tests of neutrality of mutations. Genetics 1993; 133: 693–709.

    CAS  PubMed  PubMed Central  Google Scholar 

  84. 84

    Librado P, Rozas J . DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 2009; 25: 1451–1452.

    CAS  PubMed  Google Scholar 

  85. 85

    Woolfe A, Goode DK, Cooke J, Callaway H, Smith S, Snell P et al. CONDOR: a database resource of developmentally associated conserved non-coding elements. BMC. Dev Biol 2007; 7: 100.

    Google Scholar 

  86. 86

    Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF et al. Detecting recent positive selection in the human genome from haplotype structure. Nature 2002; 419: 832–837.

    CAS  Google Scholar 

Download references


This work was supported by the binational collaborative research program CAPES-COFECUB (grant #653/09) and by the Spanish National Institute for Bioinformatics ( LG was supported by grants from Région Ile-de-France. PL was supported by a PhD fellowship from ‘Acción Estratrégica de Salud, en el Marco del Plan Nacional de Investigacio'n Científica, Desarrollo e Innovación Tecnológica 2008–2011’ from Instituto de Salud Carlos III. BP was supported by a PhD fellowship from the doctoral program in Public Health from Paris Sud University. ECC was supported by the Brazilian Council for Scientific and Technological Development—CNPq (grant # 304471/2013-5). EAD was supported by the Brazilian Council for Scientific and Technological Development—CNPq Project (grant # 476036/2013-5).

Author information



Corresponding author

Correspondence to A Sabbagh.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

Supplementary Information accompanies this paper on Genes and Immunity website

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gineau, L., Luisi, P., Castelli, E. et al. Balancing immunity and tolerance: genetic footprint of natural selection in the transcriptional regulatory region of HLA-G. Genes Immun 16, 57–70 (2015).

Download citation

Further reading


Quick links