The HLA-G (human leukocyte antigen-G) molecule plays a pivotal role in immune tolerance by inhibiting different cell subsets involved in both innate and adaptive immunity. Besides its primary function in maintaining the maternal–fetal tolerance, HLA-G has been involved in a wide range of pathological conditions where it can be either favorable or detrimental to the patient, depending on the nature of the pathology. Although several studies have demonstrated the utmost importance of the 3′ untranslated region (3′UTR) in the HLA-G expression profile, limited data exist on the sequence variability of this gene region in human populations. In this study, we characterized the genetic diversity and haplotype structure of the HLA-G 3′UTR by resequencing 444 individuals from three sub-Saharan African populations and retrieving data from the 1000 Genomes project and the literature. A total of 1936 individuals representing 21 worldwide populations were combined and jointly analyzed. Our data revealed a high level of nucleotide diversity, an excess of intermediate frequency variants and an extremely low population differentiation, strongly supporting a history of balancing selection at this locus. The 14-bp insertion/deletion polymorphism was further pointed out as the likely target of selection, emphasizing its potential role in the post-transcriptional regulation of HLA-G expression.
HLA-G (human leukocyte antigen-G) is one of the most interesting nonclassical major histocompatibility complex class I antigens. It plays a pivotal role in immune tolerance by inhibiting different cell subsets involved in both innate and adaptive immunity. Because of its preferential expression on trophoblast cells at the fetomaternal interface during pregnancy, this HLA molecule has primarily been associated with maternal–fetal tolerance, enabling fetuses to survive unharmed in a genetically foreign host during successful pregnancies and thereby playing a crucial role in the outcome of pregnancy.1, 2 Further observations indicated that HLA-G expression in normal tissues was broader than originally described, including different tissues of the male reproductive system,3 although mainly restricted to immune privileged sites such as thymus, cornea or proximal nail matrix.4, 5, 6 More interestingly, it can be induced in numerous pathological conditions and is beneficial (autoimmune and inflammatory diseases, transplantation) or harmful (infectious diseases, cancer) to the patient.7, 8 The tolerogenic properties of HLA-G have gained considerable interest in the past decade and the potential use of HLA-G protein expression as a diagnostic, prognostic and possibly therapeutic tool is the focus of extensive research.9, 10, 11, 12
Because of its distinct function, the HLA-G gene presents a very different pattern of nucleotide variation compared with neighboring classical HLA class I genes, suggesting it may have evolved under a different selective regime. In contrast to the highly polymorphic classical class I loci that are among the best-known examples of diversifying selection in the human genome, HLA-G displays very limited sequence polymorphism in the coding region, in particular at the protein level, suggesting a strong selective pressure for invariance (that is, purifying selection).8, 13 In contrast, a higher degree of variation is observed at the 5′ upstream regulatory (or promoter) region and 3′ untranslated region (3′UTR) that may influence the magnitude of HLA-G expression at both transcriptional and post-transcriptional levels.13, 14, 15, 16, 17, 18 Many association studies have thus focused on these regulatory regions, particularly on the 3′UTR polymorphic sites that seem to play a pivotal role in the regulation of HLA-G expression by influencing the binding of specific microRNAs or the stability of the HLA-G mRNA.19, 20, 21 Some HLA-G 3′UTR polymorphisms have been shown to influence HLA-G production and a clear link between 3′UTR sequence variation and the level of inhibition of natural killer cell cytotoxicity was recently demonstrated.22 This gene segment has been studied in a wide range of pathological conditions, including transplantation, tumors, pregnancy disorders, autoimmune diseases and viral infections.7, 8 More recently, the possible role of HLA-G 3′UTR polymorphisms on the outcome of parasitic infections has been suggested.23, 24, 25 Nonetheless, many of the association studies published so far have led to controversial results among human populations that may be because of a lack of knowledge of the genetic structure of this gene region and its geographic variation. Spurious associations can indeed arise from an unknown population structure and significant differences in allele frequencies and haplotype composition between populations may explain some of the conflicting association reports.26 Consequently, characterizing patterns of genetic diversity in worldwide populations is an imperative prerequisite in the context of association studies aiming to better understand the role of the HLA-G 3′UTR in the susceptibility to complex diseases.
Most association studies merely genotype a limited set of presumably functional variants, whereas only a few population genetic surveys have investigated the full sequence variation of the HLA-G 3′UTR in unrelated healthy individuals.16, 17, 18 In particular, African populations have been understudied despite the fact that the African continent represents one of the most ethnically and genetically diverse regions of the world, with extensive genetic variation even among geographically close populations.27
In the present study, we aimed to characterize the sequence variation and haplotype structure of the HLA-G 3′UTR in several worldwide populations in order to better understand the global patterns of variation at this locus and provide further insights into the mechanisms that may underlie the control of HLA-G expression. In an attempt to detect novel variants not yet described and to further characterize African diversity at this locus, we sequenced this region in 444 individuals from three sub-Saharan African populations (Serer from Senegal, Yansi from the Democratic Republic of Congo and Tori from Benin). These data were then pooled with those available from the 1000 Genomes project and the literature, leading to a total of 21 worldwide population samples included in our population genetic analyses. Beyond simple description, this study also aimed to investigate the evolutionary history of this HLA-G regulatory region by unraveling the relative influences of natural selection and human demography in determining its present-day variability.
HLA-G 3′UTR nucleotide sequence variation and linkage disequilibrium patterns
The 3′UTR region of HLA-G was screened for sequence variation in 444 individuals from three population samples from sub-Saharan Africa, including 239 Serer from Senegal, 175 Yansi from Congo and 30 Tori from Benin. A total of 14 variation sites (including the 14-bp insertion/deletion (indel) and 13 single-nucleotide variants (SNVs)) were identified, 4 of which were reported for the first time (Table 1). Three of these new variants were exclusively found in Yansi (+3032G/C, +3107C/G and +3121T/C), whereas the fourth one was detected in both Yansi and Serer (+3052C/T), all occurring at a frequency below 0.02. The 10 remaining variants (14-bp indel, +3001C/T, +3003T/C, +3010C/G, +3027C/A, +3035C/T, +3111G/A, +3142G/C, +3187A/G and +3196C/G) have all been described in previous studies. The observed genotype frequencies at all variation sites were in conformity with the assumptions of Hardy–Weinberg equilibrium when tested using Fisher’s exact tests.
We combined these new sequence data with those generated by the 1000 Genomes (1KG) project28 and those previously published in the literature16, 17, 18 to evaluate the sequence variation of this gene region at a worldwide scale. Altogether, 1936 individuals representing 21 populations from Europe, East Asia, sub-Saharan Africa and the Americas were examined (Table 1). A total of 16 variants were observed at the 3′UTR of HLA-G, 9 of which reached polymorphic frequencies (>0.01 at a global level). Five of them were particularly frequent in human populations (14-bp indel, +3010C/G, +3142G/C, +3187A/G and +3196C/G), occurring at a global frequency of >0.25. Ancestral alleles were generally more frequent than derived alleles, with the notable exception of the 14-bp indel and +3035C/T variants. The number of variation sites per sample was an average of 9, varying from 7 in Tori to 13 in Yansi (Table 2). Interestingly, two of the four new variants identified in the Yansi and Serer samples were also detected in samples from 1KG (+3032G/C in Luhya from Kenya, African Americans and British and +3121T/C in Toscani, British, Han Chinese from Beijing and Puerto Ricans), at low frequencies (⩽0.02; Table 1). Two other SNVs, not yet reported in previous studies, were specifically found in populations from 1KG: +3092G/T occurred in heterozygous state in two Luhya individuals and +3227G/A was found in 11 different samples, with a frequency of the +3227A allele ranging from 0.011 in Finnish to 0.073 in Puerto Ricans.
The global patterns of linkage disequilibrium (LD) across the 3′UTR region of HLA-G were then examined in each individual population sample, including the three new samples from sub-Saharan Africa and the 14 samples from 1KG, by computing the r2 measure of LD between pairs of variants (Table 2 and Supplementary Figure 1). Figure 1 depicts the pairwise LD matrix in the four main geographical regions after grouping samples according to their main continent of origin. Globally, low amounts of LD were observed across the whole region, with only one, two, four and five pairs of variants showing high allelic association (r2>0.60) in America, Africa, Asia and Europe, respectively. Only two variants, +3010C/G and +3142G/C, were consistently found in very strong LD in all worldwide populations (r2⩾0.91). Populations within continental groups displayed roughly similar levels of LD, with no significant differences in the mean pairwise r2 values between populations on the same continent (Kruskal–Wallis test, all P-values >0.17). In contrast, strong variation in the level of LD was observed between geographical groups (Kruskal–Wallis test, P=0.029). In particular, Asians (mean pairwise r2=0.33) showed higher LD levels than Americans (0.21) and Africans (0.23) (Mann–Whitney test, P=0.034 and P=0.045, respectively). This translates into a greater efficiency of the tagging procedure in Asians where only four tag markers are needed to capture the common genetic variation across the 3′UTR (Table 2).
Worldwide diversity and genealogy of HLA-G 3′UTR haplotypes
HLA-G 3′UTR haplotypes were reconstructed from the unphased genotype data at the 16 ascertained variation sites in each of the 21 individual population samples included in the haplotype analysis (Table 3 and Supplementary Table 1). A total of 44 haplotypes were inferred, of which 20 have been previously described in the literature.13, 16, 17, 18 Out of the 44 haplotypes, 20 (45%) were found in more than one population and 17 (39%) were found in more than one continental region. Only 8 occurred at a frequency above 1% at the global level. The number of distinct haplotypes greatly varied across individual samples, ranging from 5 in Han Chinese from South China to 23 in Yansi, as well as across continental regions: whereas only 7 distinct haplotypes were identified in Asia, 17 and 18 haplotypes were found in Europe and America, respectively, and as many as 35 in Africa. A high within-population haplotype diversity was consistently observed in all worldwide samples (mean value=0.78±0.04 s.d.), ranging from 0.66 in Japanese to 0.84 in Yansi. Asian samples displayed slightly lower values (0.71±0.06 s.d.) compared with Africans (0.81±0.02 s.d.) and Americans (0.82±0.01 s.d.). It is noteworthy that three major haplotypes (UTR-1, UTR-2 and UTR-3) account for ∼70% of total chromosomes, ranging from 61% in Yansi to 94% in Japanese (Figure 2). They are homogeneously distributed among all human populations. Although the most frequent haplotypes UTR-1 and UTR-2 occur at roughly similar frequencies at the global level (∼25% each), they differ at five variation sites (14-bp indel, +3010C/G, +3142G/C, +3187A/G and +3196C/G) identified as the most polymorphic variants in the worldwide panel (Figure 3). UTR-3 occupies an intermediate position, differing from UTR-2 by two divergent mutations (14-bp indel and +3196C/G) and from UTR-1 by three nucleotide substitutions (+3010C/G, +3142G/C and +3187A/G). The full network including the orthologous sequences from chimpanzee, gorilla and orangutan species points to UTR-5 as the most likely ancestral haplotype (Figure 3), consistent with previous observations.13, 29 This haplotype differs by 2, 4 and 11 mutational events from the sequences of the chimpanzee, gorilla and orangutan species, respectively. It occurs at a global frequency of ∼6% and is shared among all continents.
Population genetic structure at HLA-G 3′UTR
Using hierarchical F-statistics computed through an analysis of molecular variance, we examined the effect of population subdivision of individuals with respect to the total population. Dividing the total population into the four main continental groups and into 21 subpopulations yielded statistically significant FST estimates of 1.9% and 3.6%, respectively, when considering the multilocus genotypes (Table 4). This is well below the typical values of ∼10–15% observed for most nuclear loci in the genome.30, 31 Similarly, low levels of genetic differentiation among populations and among continents were observed for all individual variants across the 3′UTR, with most FST values falling below 5%. These low values are consistent with the lack of geographical structuring of haplotypes previously observed in the haplotype analysis.
Detecting signatures of natural selection
To investigate to what extent present HLA-G 3′UTR variation patterns solely reflect stochastic events of human evolution, or are distorted by balancing selection, we used two complementary approaches based on the level of population genetic differentiation and on the allele frequency spectrum of segregating sites. We first determined whether the low levels of genetic differentiation observed at HLA-G 3′UTR variation sites can actually be considered as atypically low when compared with the genome-wide distribution of FST scores. Empirical distributions representing the average genetic differentiation of human populations corrected for heterozygosity were constructed from the genome-wide variation data available from 1KG (see Materials and methods) and an outlier approach was adopted to identify those variants with atypical patterns of allele frequency differentiation. The FST scores computed for each 3′UTR variation site at both the population and continental levels for the 14 populations of 1KG are shown in Table 5, along with the empirical P-values estimated from the genome-wide, genic and 3′UTR distributions. Most genetic variants had very low FST values, particularly the 14-bp indel that displayed FST scores lower or very close to the 5th percentile of the genome-wide distributions (P=0.058 and P=0.009 at the population and continental level, respectively; Table 5). Very similar results were obtained when the genic and 3′UTR distributions were taken as references (Table 5). Such extremely low levels of population genetic differentiation indicate that this locus may be subject to balancing or species-wide directional selection.
Summary statistics representing several aspects of variation were calculated in each of the 21 population samples included in the worldwide diversity survey (Table 6). Considering all data, the HLA-G 3′UTR region displays a considerable amount of genetic diversity, the mean value for π being 0.79±0.07%, considerably higher than most variable nuclear human loci investigated thus far. For comparison, this value is ∼9–14 times higher than the average value estimated for 19 502 3′UTR regions dispersed throughout the human genome, considering the same four populations than those included in the present analysis (0.078% in Yoruba (YRI), 0.061% in Europeans (CEU) and 0.056% in Asians (CHB+JPT)).32 Furthermore, it is worth noting that this value is consistently high in all populations studied, with no significant difference between the four continents (0.75% in Africa, 0.85% in Europe, 0.74% in Asia and 0.81% in America; Kruskal–Wallis test, P>0.05). In contrast, human–chimpanzee divergence at the HLA-G 3′UTR locus (1.49%±0.10% on average) is within the range of values observed for the 19 502 3′UTRs of the human genome (mean value of 1.12% in CEU and 1.11% in YRI and JPT+CHB),32 pointing out an unusually high ratio of diversity to divergence suggestive of balancing selection.
Three formal tests of natural selection, based on Tajima’s D and Fu and Li’s D* and F* statistics, were then applied to the nucleotide sequence data of each of the 21 population samples to detect possible distortions in the allele frequency spectrum relative to neutral expectations. Positive values of the three statistics were observed in all analyzed populations, indicating a skew in the frequency spectrum toward an excess of intermediate frequency variants (Table 6). Statistical significance was first assessed from coalescent simulations (10 000 runs) of a standard, neutral equilibrium model. In all populations except four (Yansi, African-Americans, Iberians and Japanese), significant deviations from neutral expectations were observed for one or more statistics (P<0.05). In African Americans, Tajima’s D statistic was close to the significance threshold (P=0.066). However, as the null model used to assess significance makes unrealistic assumptions about population demographic history (that is, a constant size, randomly mating population), it is difficult to unambiguously interpret these data as evidence for natural selection. Significantly positive values of these statistics can suggest either a recent population bottleneck, population subdivision or some form of balancing selection.33 Therefore, to discriminate between the influences of population history and natural selection, we adopted an empirical approach analogous to that used for FST, by considering sequence variation data at multiple unlinked loci in the 14 populations from 1KG. We sampled 100 independent noncoding autosomal regions dispersed throughout the genome that are expected to be mainly influenced by neutral demographic processes (see Materials and methods), and calculated the percentile rank of HLA-G 3′UTR values in the distributions of Tajima’s D and Fu and Li’s D* and F* statistics computed for this set of regions (Table 6). Values for HLA-G 3′UTR ranked above the 95th percentile for one or more statistics in 9 of the 14 populations analyzed (Tajima’s D ranked 92th in a tenth one, namely African Americans), a finding consistent with the hypothesis of balancing selection. Interestingly, in two populations (Colombians and British), Tajima’s D statistic was no longer significant when the empirical distributions were considered instead of the coalescent build neutral model.
The present study provides a comprehensive analysis of genetic variation at the 3′UTR segment of the HLA-G gene on a worldwide scale. Overall, the nucleotide sequence data of 1936 individuals representing 21 populations from four continental regions (Africa, Europe, Asia and America) were combined and jointly analyzed. Population genetics analyses performed on these data revealed a high level of polymorphism in this region, with as many as 16 segregating sites (including the 14-bp indel and 15 SNVs) identified in a region as small as 358 bp. A high intrapopulation genetic diversity was consistently observed in all the populations investigated. This contrasts with the general patterns of diversity usually observed for other loci of the genome, with a gradual loss of diversity with growing distance from Africa as a consequence of the successive colonization bottlenecks as our species grew and spread all over the world.34, 35 The lack of relationship between genetic diversity and geography implies that patterns of human diversity at HLA-G 3′UTR are unlikely to be accounted for by the simple interaction of drift events and geographically structured gene flow, as expected under neutral evolution. A total of 44 haplotypes were reconstructed, of which 3 (UTR-1, UTR-2 and UTR-3) account for ∼70% of the global variation and are evenly distributed across human populations. This translates into a low level of genetic differentiation among populations and among continents at both the nucleotide and haplotype levels. The most common haplotypes UTR-1 and UTR-2, each occurring at a global frequency of ∼25%, differ at five variation sites presenting the highest heterozygosities in humans (14-bp indel, +3010C/G, +3142G/C, +3187A/G and +3196C/G). UTR-1 is the most divergent from the ancestral haplotype and is likely to be the most recent one. Altogether, the documented features of molecular diversity at HLA-G 3′UTR suggest that patterns of human variation at this locus may not simply be the result of human population history. The elevated nucleotide diversity, excess of polymorphism to divergence levels, reduced population differentiation and presence of two divergent haplotypes maintained at high frequencies in all human populations rather support a history of balancing selection at this locus.
Robust inferences of natural selection at a particular genomic region are difficult because of the confounding effects of population demographic history. Demographic and selective events tend to leave very similar traces over sequences. For example, both balancing selection and reduction in population size lead to an excess of intermediate frequency alleles relative to what is expected under a standard neutral model. One way to circumvent this problem is to exploit the fact that demography affects the whole genome, whereas selection acts upon specific loci. Thus, comparing patterns of variation at the locus of interest with those observed at a large number of unlinked loci expected to represent the neutral genomic background can clarify the relative contributions of these two evolutionary forces, and alleviates the need to assume any precise demographic history for the population under study.36, 37, 38, 39 We applied such an empirical approach to determine whether HLA-G 3′UTR has been shaped by natural selection. This gene region clearly appeared as an outlier of the empirical distributions of different test statistics (Wright’s fixation index FST, Tajima’s D and Fu and Li’s D* and F*), with an unusually low population differentiation pattern and an excess of intermediate frequency variants, both of which are hallmarks of balancing selection.
For neutrality tests based on the frequency spectrum, we were able to compare the results provided by the empirical approach with those relying on coalescent simulations under a standard neutral model. It is noteworthy that both approaches yielded very similar results in the three African populations (Yoruba, Luhya and African Americans), with close P-values, whereas significant changes were noted in non-African populations with the empirical approach (Table 6). Such a contrast between African and non-African populations may result from the influence of demographic factors specific to these populations. Whereas non-African populations have experienced one or several bottlenecks during the migration out of Africa, African populations are thought to have undergone moderate but uninterrupted population expansion.35, 40, 41, 42 Data from whole-genome sequencing confirmed that individuals of European or East Asian origin carry much less low-frequency variants (0.5–5% frequency) than those with substantial African ancestry, reflecting these ancestral bottlenecks.28 Meanwhile, non-Africans also show an enrichment of rare variants (<0.5%), reflecting recent explosive increases in population size.28 We thus expect globally increased Tajima’s D and Fu and Li’s F* values and decreased Fu and Li’s D* values in the genome of non-Africans with respect to expectations under a model of constant-population size. This may explain why we observe overall less significant results with Tajima’s D and Fu and Li’s F* and more significant ones with Fu and Li’s D* when testing for balancing selection at HLA-G 3′UTR with the empirical approach in non-African populations. These findings reinforce the importance of adopting an empirical approach when testing for natural selection. Assessing statistical significance using coalescent simulations of a standard neutral model may lead to false positive values in non-Africans, at least for Tajima’s D and Fu and Li’s F*. This is of particular relevance as Tajima’s D is the most widely used statistic for interpreting patterns of nucleotide sequence variation. When considering the results provided by the empirical approach, 9 out of 14 samples have at least one significant test statistic at the 0.05 threshold and this number increases to 10 when a threshold of 0.10 is considered. If these results strongly support the hypothesis of balancing selection, they are quite unexpected considering the very small size of the region surveyed (<400 bp) that provides little power to detect significant deviations from neutral expectations. Simulation studies have indeed shown that the number of mutations in the sample is the greatest factor affecting test power, more segregating sites conferring more power.43 Many of the estimated P-values are very close to the significance threshold without reaching it (0.05⩽P<0.10; Table 6). This may simply reflect the limited power of neutrality tests in such a small region.
Altogether, our results suggest that two or more HLA-G 3′UTR haplotypes have been maintained at high frequencies in human populations through balancing selection. Considering that high expression levels of HLA-G can be advantageous in some circumstances (maintenance of the maternal–fetal immune tolerance during pregnancy and decreased risk of miscarriages and pregnancy-associated disorders) and detrimental to others (less efficient host immune response against invading pathogens and tumor cells), one can readily postulate that the maintenance of both high-expressing and low-expressing haplotypes may be beneficial for the individual. Tan et al.14 already reported such evidence of balancing selection at the HLA-G locus by describing two divergent lineages of promoter haplotypes in human populations that might be associated with different promoter activities. Such balancing selection may come about through heterozygote advantage, spatially and temporally varying selection or both, allowing the individual to meet different immunologic needs depending on the cell type or development stage and to face different and changing environmental conditions such as exposure to pathogens. This selective scenario implies that the haplotypes targeted by selection are associated with different patterns of HLA-G expression. UTR-1 and UTR-2, which are the most likely candidates for balancing selection, differ at five variation sites, three of which have been associated with the regulation of HLA-G expression levels (14-bp indel, +3142G/C and +3187A/G). UTR-2 carries the 14-bp insertion, the +3142G allele and the +3187A allele, all of which have been associated with a decreased production of HLA-G, by either modifying microRNA affinity or the stability of the HLA-G mRNA.8 In contrast, UTR-1 carries the opposite alleles and is rather considered as a ‘high producer’ haplotype. A recent study further confirmed that UTR-1 is associated with a higher production of soluble HLA-G.44 The maintenance of UTR-1 and UTR-2 at high and similar frequencies in all human populations studied is likely to translate into an intermediate level of HLA-G expression in all human populations. This intermediate level may represent a fine balance between optimal levels of expression allowing for both fetal tolerance and an appropriate defense against pathogens. As the UTR-1 and UTR-2 haplotypes, and the individual polymorphisms defining them, have been associated with variable risks to develop various adult immune-mediated diseases and pregnancy-associated disorders, such as severe pre-eclampsia,45 and as they occur at roughly similar frequencies among worldwide populations, all human populations would be equally prone to develop these clinical conditions. Note that the 14-bp indel was more specifically pointed out as a locus with an atypical pattern of geographic differentiation compared with the genome-wide distribution of FST (Table 5) and might thus represent the direct target of balancing selection. The compelling evidence for the functionality of this polymorphism in alternative splicing and its potential role in post-transcriptional regulation support this hypothesis.21, 22, 46, 47, 48, 49 This is further corroborated by recent evidence for balancing selection acting on this locus in several South Amerindian tribes.50, 51
As high LD has been demonstrated across the whole HLA-G gene,13 we cannot rule out the possibility that the evidence of balancing selection observed in our data is because of a hitchhiking effect caused by LD between 3′UTR polymorphisms and other regions of the gene. The extent of LD between 3′UTR and the coding region has been extensively evaluated in worldwide populations.13, 15, 17, 52, 53 The UTR-1 and UTR-2 haplotypes have been found to be mostly associated with the coding alleles HLA-G*01:01:01:01 and HLA-G*01:01:02, respectively, that encode the same HLA-G protein. This makes unlikely a role of the coding region in the selection signal observed, especially as the variation patterns of the coding sequence have been shown to be more consistent with purifying selection.29 LD was also detected among 3′UTR polymorphisms and the HLA-G 5′ regulatory region in a Brazilian population,13 but further studies on the LD patterns between these two regions in a wider range of populations are needed to answer this specific question. Finally, highly significant LD has been reported between HLA-G and some specific alleles of the HLA-A gene, located only 110 kb away from HLA-G.54, 55, 56, 57, 58 However, balancing selection at the HLA-A locus was shown to be unlikely to influence the patterns of diversity present at the HLA-G locus.14 Moreover, a thorough examination of the patterns of LD across a 300-kb genomic region encompassing the HLA-F, HLA-G, HLA-H and HLA-A genes using the phased sequence data from the 1000 Genomes Project did not reveal any long-range association between the HLA-G 3′UTR-1 and UTR-2 haplotypes and HLA-A alleles, thus corroborating the findings of Tan et al.14 (Supplementary Figure 2).
This study also provides a detailed description of the magnitude and patterns of LD across the HLA-G 3′UTR segment in a large set of human populations, representing a useful resource for future genetic association studies. Significant differences in the patterns of LD were observed among geographical groups that might explain some of the contradictory observations of positive or negative associations between 3′UTR polymorphisms and specific phenotypes reported in the literature. For instance, the 14-bp indel was found to be in high LD with several other variants of the HLA-G 3′UTR in Europe (namely, +3010C/G, +3142G/C and +3196C/G), whereas much lower levels of LD were observed with these polymorphisms in other continental regions (Figure 1 and Supplementary Figure 1). Similarly, the +3142G/C and +3187A/G variants were found in strong LD in East Asia (r2=0.89) but displayed only little allelic association elsewhere. Conversely, some other LD patterns were shared among all populations examined. This is the case for the +3010C/G and +3142G/C variants that were consistently found in nearly complete LD in all samples, and for the 14-bp indel and +3187A/G variants that were always found in weak LD (mean r2=0.21±0.09).
In conclusion, this study provides an exhaustive characterization of the genetic variation and haplotype structure of the HLA-G 3′UTR region in worldwide human populations, building up a useful resource for future studies interested in epidemiological or anthropological research questions involving the HLA-G gene. In addition, we performed a number of neutrality tests, taking advantage of the whole genome sequence data generated by 1KG, to clarify the selective pressures that have acted on this regulatory region. We provided unequivocal evidence that balancing selection has shaped patterns of variation at this locus but further investigations are warranted to determine whether HLA-G 3′UTR is the genuine target of balancing selection.
Materials and methods
African population samples and Sanger sequencing
Sequence variation was surveyed from DNA samples collected from three sub-Saharan African populations. The first one was composed of 239 Serer individuals (109 males and 130 females) from the Niakhar area, located 150 km southeast from Dakar, Senegal. The second one was composed of 175 Yansi individuals (80 males and 95 females) living in the Bandundu province of the Democratic Republic of the Congo, located ∼400 km east of Kinshasa. The last one was composed of 30 Tori (28 males, 2 females) living in the district of Tori-Bossito, a semi-rural area located on the coastal plain of Southern Benin, 40 km northeast of Cotonou, the economic capital. All individuals were unrelated, randomly selected adults previously used as controls in case–control or family studies.59, 60, 61 In all studies, informed written consent was provided by all participants before blood withdrawal and the study protocols were reviewed and approved by the relevant Local Research Ethics Committee (Ethics Committee of the Health Minister of Senegal, Ethics Committee of the Public Health Ministry of the Democratic Republic of the Congo and Institutional Ethics Committee of the Faculté des Sciences de la Santé at the University of Abomey-Calavi in Benin).
The 3′UTR of HLA-G was PCR amplified in each individual using the primers HLAG8F: 5′-IndexTermTGTGAAACAGCTGCCCTGTGT-3′ and HLAG8R: 5′-IndexTermGTCTTCCATTTATTTTGTCTCT-3′, using the same PCR conditions as described elsewhere.17 The amplified products were visualized by electrophoresis on 7% polyacrylamide gel electrophoresis stained with silver, and directly sequenced in an ABI310 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) using the reverse primer HLAG8R to prevent sequence overlaps in heterozygous 14-bp samples. This fragment contains all the genetic variants known to be relevant for the post-transcriptional control of HLA-G expression.8 All singletons were verified by PCR reamplification and resequencing the PCR products in both directions.
The HLA-G 3′UTR sequence data of 1089 unrelated individuals generated by the 1000 Genomes Project was directly downloaded from the website (http://www.1000genomes.org), using the phase 1 integrated release version 3 released in April 2012.28 The 1089 individuals are drawn from 14 different populations in sub-Saharan Africa, Europe, East Asia and the Americas: Yoruba in Ibadan, Nigeria (YRI); Luhya in Webuye, Kenya (LWK); people with African Ancestry in the Southwest United States (ASW); Utah residents with Northern and Western European ancestry (CEU); Toscani in Italy (TSI); British in England and Scotland (GBR); Finnish in Finland (FIN); Iberians in Spain (IBS); Han Chinese in Beijing, China (CHB); Southern Han Chinese in China (CHS); Japanese in Tokyo, Japan (JPT); people with Mexican Ancestry in Los Angeles, California (MXL); Colombians in Medellin, Colombia (CLM); and Puerto Ricans in Puerto Rico (PUR). For comparison purposes with the sequence data generated in the three African samples, we considered the 358-bp gene segment encompassing nucleotides +2924 to +3281, considering the Adenine of the first translated ATG codon as nucleotide +1 and the presence of the 14-bp fragment (chr6:29798545-29798889 in the human GRCh37/hg19 assembly). Whole-genome polymorphism low-coverage data from the same individuals (∼36 million variants) were also downloaded to perform tests of natural selection, as described below.
Data available in the literature for this gene region were also retrieved and included in population genetic analyses. Previous sequencing surveys of HLA-G 3′UTR concerned 283 unrelated healthy individuals from two Brazilian populations, including 128 samples from Northeastern Brazil (Recife, State of Pernambuco)18 and 155 samples from Southeastern Brazil (Ribeirão Preto, State of São Paulo),17 as well as 60 individuals of Portuguese nationality and 60 natives of Guinea-Bissau.16
LD, tagging and haplotype analyses
The strength of LD between pairs of markers was measured as r2,62 using the Haploview software v4.1,63 and statistical significance of LD was assessed using Fisher’s exact tests followed by Bonferroni correction. To test whether the amount of LD differs between human populations, the mean pairwise r2 values computed for variants with a minor allele frequency (MAF) >0.05 were compared using Mann–Whitney or Kruskal–Wallis tests. The Tagger program implemented in Haploview was used to select the subset of tag markers allowing to capture all common variations (with MAF >0.05) across the 3′UTR of HLA-G. We required the minimum estimated r2 between the tagged and tag marker sets to be ⩾0.80.
Haplotype reconstruction was performed using the Bayesian method implemented in PHASE v.2.1.1,64 using the default parameter values in the Markov chain Monte Carlo simulations. For each population sample, we applied the algorithm 10 times with different seeds for the random number generator, and checked for consistency of the results across the independent runs in order to verify that the algorithm did not converge to a local, rather than global, mode of the posterior distribution. We chose the results from the run displaying the best average goodness of fit of the estimated haplotypes to the underlying coalescent model. Haplotype networks describing the mutational relationships among the inferred haplotypes were generated by using the median joining algorithm of the Network 22.214.171.124 software (http://www.fluxus-engineering.com/). Analysis of molecular variance65 and measure of haplotype diversity based on estimated haplotype frequencies were computed using Arlequin v.3.11 software.66
Imputation of genotypes for the 14-bp insertion/deletion
As the HLA-G 3′UTR 14-bp indel polymorphism was not present in the 1KG database, the genotypes of this variant were imputed in each of the 14 populations of the project using the hidden Markov model implemented in MaCH v1.0 software.67 The 3′UTR haplotypes inferred in the three African populations sampled in the present study (Sereer, Yansi and Tori) and those found in the four populations previously studied in the literature (Northeastern and Southeastern Brazilians, Portuguese and Guineans) were used as reference haplotypes in the genotype imputation procedure. Briefly, this method combines genotype data of studied samples with the reference haplotype data and then infers genotypes of the untyped marker based on probability. The most frequently sampled genotype is the final imputed one (most-likely genotype). The program was run with default settings, except for the number of iterations of the Markov chain sampler that was set to 200 to ensure convergence. The MaCH software produces two quality measures for individual genotypes that are considered to be good predictors of imputation accuracy.67 The first one is the marker quality score that estimates the accuracy of each imputed genotype based on the proportion of iterations where the final imputed genotype was selected, averaged across all individuals in the sample. The second one is an estimate of the r2 correlation between imputed and true genotypes, obtained by comparing the distribution of sampled genotypes in each iteration with the estimated allele counts that result from averaging over all iterations. Very high-quality scores (0.972±0.011 s.d., ranging from 0.951 in Han Chinese from South China to 0.985 in Iberians) and r2 values (0.936±0.030 s.d., ranging from 0.874 in Han Chinese from South China to 0.972 in Iberians) were obtained for all 14 populations. The imputed genotype data for the 14-bp indel were thus considered of sufficient quality to be confidently included in the analysis.
Tests for selective neutrality
To determine whether natural selection has shaped patterns of genetic variation at the HLA-G 3′UTR locus, we used two complementary approaches based on the level of population genetic differentiation and the site frequency spectrum. First, we used the fixation index FST68 that quantifies the proportion of genetic variance explained by allele frequency differences among populations. FST ranges from 0 (for genetically identical populations) to 1 (for completely differentiated populations). We calculated FST values at each locus using the BioPerl module PopGen69 at two different levels: (1) among populations and (2) among continental regions, that is, when the individual samples were grouped into major geographical regions (Africa, Europe, Asia, and America) according to their predominant component of ancestry. Extreme values of FST can result from not only natural selection but also nonselective events such as demographic changes and genetic drift. As these latter events randomly act on the genome, they are expected to have the same average effect across the genome, in contrast to natural selection that affects population differentiation in a locus-specific manner. The genome-wide variation data of the 14 populations from 1KG can thus be used to infer the action of natural selection by adopting an outlier approach.70 For that purpose, we built empirical distributions from three sets of autosomal SNVs located throughout the genome. The first set contains the whole set of SNVs detected in the genome of 1089 unrelated individuals included in 1KG (∼36 million SNVs): a LD-based pruning procedure was applied to these genome-wide genotype data using Plink71 with default parameters (pruning based on a variance inflation factor of two within each sliding window of 50 SNVs with a step of five SNVs), leaving a total of 26 680 716 independent autosomal SNVs, referenced hereafter as ‘GW’. The second set of SNVs was built from the subset of variants annotated in dbSNP (build 137) as located in gene regions. We kept those remaining after a LD-based pruning procedure using the same parameters as described above, thus retrieving a ‘Genic’ set containing 198 804 independent genic SNVs. The third set was built from the SNVs annotated in dbSNP as belonging to a 3'UTR region: 11 286 949 SNVs remained after applying the LD-based pruning procedure, making up the ‘3′UTR’ set. The FST scores were computed for all SNVs included in the three sets, thus providing reference distributions to assess whether the pattern of genetic differentiation observed at a locus of interest is atypical. As we were interested in detecting signals of balancing selection, we focused on the lower tail of the FST distributions. Therefore, the empirical P-value estimated corresponds to the proportion of FST scores in the empirical distribution that are lower or equal to the observed value at the locus of interest. As FST strongly correlates with heterozygosity,31, 72, 73 empirical P-values were calculated within bins of SNVs grouped according to MAF. A total of 27 bins were considered for the whole MAF range: 10 bins of size 0.001 between 0 and 0.01; 9 bins of size 0.01 between 0.01 and 0.10; and 8 bins of size 0.05 between 0.10 and 0.50.
Second, we determined whether the frequency spectrum of mutations conformed to the expectations of the standard neutral model at the population level by computing summary diversity statistics and performing several neutrality tests, including Tajima’s D and Fu and Li’s D* and F*,74, 75 with DnaSP v.5.10.76 Tajima’s method uses the difference between the average number of pairwise differences between nucleotide sequences (π) and the Watterson’s estimator θw,77 based on the number of segregating sites. Fu and Li’s tests are based on the comparison between an estimator of θ (θw for D* and π for F*) and the number of unique mutations (singletons) in external branches of the genealogy (ηe). The statistical significance of the tests was first estimated from 10 000 coalescent simulations of an infinite site locus that condition on the sample size, with recombination (value of the recombination parameter estimated from the observed sequence data). As rejection of these tests may be caused by violation of any of the assumptions in the null hypotheses (neutrality, constant size and panmixia), it is difficult to disentangle the effects of natural selection and demographic history. Therefore, we opted for an empirical approach and selected a set of unlinked noncoding regions expected to be mostly neutrally evolving to compute a reference distribution for each summary diversity statistic, using the genome-wide data available for the 14 populations from 1KG. We selected 100 autosomal regions dispersed throughout the human genome that met the following criteria: (1) to be at least 100 kb away from any known or predicted genes or expressed sequence tag or region transcribed into mRNA; (2) to be outside any segmental duplication or region transcribed into a long noncoding RNA or conserved noncoding element (as defined in Woolfe et al.78); (3) to be distant from each other by at least 100 kb and not in LD with each other; (4) to be of the same size than the HLA-G 3′UTR region studied (358 bp); and (5) to contain at least 8 SNVs. For each region, Tajima’s D, Fu and Li’s D* and F* scores were computed so as to obtain the null (selectively neutral) distribution of these test statistics in each individual population sample. To test for balancing selection, empirical P-values were estimated from the proportion of regions showing a test statistic greater or equal to the value observed at the HLA-G 3′UTR locus. We considered as significant any P-value <0.05.
Hviid TV . HLA-G in human reproduction: aspects of genetics, function and pregnancy complications. Hum Reprod Update 2006; 12: 209–232.
Rizzo R, Vercammen M, van de Velde H, Horn PA, Rebmann V . The importance of HLA-G expression in embryos, trophoblast cells, and embryonic stem cells. Cell Mol Life Sci 2011; 68: 341–352.
Larsen MH, Bzorek M, Pass MB, Larsen LG, Nielsen MW, Svendsen SG et al. Human leukocyte antigen-G in the male reproductive system and in seminal plasma. Mol Hum Reprod 2011; 17: 727–738.
Crisa L, McMaster MT, Ishii JK, Fisher SJ, Salomon DR . Identification of a thymic epithelial cell subset sharing expression of the class Ib HLA-G molecule with fetal trophoblasts. J Exp Med 1997; 186: 289–298.
Le Discorde M, Moreau P, Sabatier P, Legeais JM, Carosella ED . Expression of HLA-G in human cornea, an immune-privileged tissue. Hum Immunol 2003; 64: 1039–1044.
Menier C, Rabreau M, Challier JC, Le Discorde M, Carosella ED, Rouas-Freiss N . Erythroblasts secrete the nonclassical HLA-G molecule from primitive to definitive hematopoiesis. Blood 2004; 104: 3153–3160.
Larsen MH, Hviid TV . Human leukocyte antigen-G polymorphism in relation to expression, function, and disease. Hum Immunol 2009; 70: 1026–1034.
Donadi EA, Castelli EC, Arnaiz-Villena A, Roger M, Rey D, Moreau P . Implications of the polymorphism of HLA-G on its function, regulation, evolution and disease association. Cell Mol Life Sci 2011; 68: 369–395.
Carosella ED . The tolerogenic molecule HLA-G. Immunol Lett 2011; 138: 22–24.
Deschaseaux F, Delgado D, Pistoia V, Giuliani M, Morandi F, Durrbach A . HLA-G in organ transplantation: towards clinical applications. Cell Mol Life Sci 2011; 68: 397–404.
Yie SM, Hu Z . Human leukocyte antigen-G (HLA-G) as a marker for diagnosis, prognosis and tumor immune escape in human malignancies. Histol Histopathol 2011; 26: 409–420.
González A, Rebmann V, LeMaoult J, Horn PA, Carosella ED, Alegre E . The immunosuppressive molecule HLA-G and its clinical implications. Crit Rev Clin Lab Sci 2012; 49: 63–84.
Castelli EC, Mendes-Junior CT, Veiga-Castelli LC, Roger M, Moreau P, Donadi EA . A comprehensive study of polymorphic sites along the HLA-G gene: implication for gene regulation and evolution. Mol Biol Evol 2011; 28: 3069–3086.
Tan Z, Shon AM, Ober C . Evidence of balancing selection at the HLA-G promoter region. Hum Mol Genet 2005; 14: 3619–3628.
Hviid TV, Rizzo R, Melchiorri L, Stignani M, Baricordi OR . Polymorphism in the 5' upstream regulatory and 3' untranslated regions of the HLA-G gene in relation to soluble HLA-G and IL-10 expression. Hum Immunol 2006; 67: 53–62.
Alvarez M, Piedade J, Balseiro S, Ribas G, Regateiro F . HLA-G 3'-UTR SNP and 14-bp deletion polymorphisms in Portuguese and Guinea-Bissau populations. Int J Immunogenet 2009; 36: 361–366.
Castelli EC, Mendes-Junior CT, Deghaide NH, de Albuquerque RS, Muniz YC, Simões RT et al. The genetic structure of 3'untranslated region of the HLA-G gene: polymorphisms and haplotypes. Genes Immun 2010; 11: 134–141.
Lucena-Silva N, Monteiro AR, de Albuquerque RS, Gomes RG, Mendes-Junior CT, Castelli EC et al. Haplotype frequencies based on eight polymorphic sites at the 3' untranslated region of the HLA-G gene in individuals from two different geographical regions of Brazil. Tissue Antigens 2012; 79: 272–278.
Tan Z, Randall G, Fan J, Camoretti-Mercado B, Brockman-Schneider R, Pan L et al. Allele-specific targeting of microRNAs to HLA-G and risk of asthma. Am J Hum Genet 2007; 81: 829–834.
Yie SM, Li LH, Xiao R, Librach CL . A single base-pair mutation in the 3’-untranslated region of HLA-G mRNA is associated with pre-eclampsia. Mol Hum Reprod 2008; 14: 649–653.
Castelli EC, Moreau P, Oya E, Chiromatzo A, Mendes-Junior CT, Veiga-Castelli LC et al. In silico analysis of microRNAS targeting the HLA-G 3' untranslated region alleles and haplotypes. Hum Immunol 2009; 70: 1020–1025.
Svendsen SG, Hantash BM, Zhao L, Faber C, Bzorek M, Nissen MH et al. The expression and functional activity of membrane-bound human leukocyte antigen-G1 are influenced by the 3'-untranslated region. Hum Immunol 2013; 74: 818–827.
Courtin D, Milet J, Sabbagh A, Massaro JD, Castelli EC, Jamonneau V et al. HLA-G 3' UTR-2 haplotype is associated with Human African trypanosomiasis susceptibility. Infect Genet Evol 2013; 17: 1–7.
Garcia A, Milet J, Courtin D, Sabbagh A, Massaro JD, Castelli EC et al. Association of HLA-G 3'UTR polymorphisms with response to malaria infection: a first insight. Infect Genet Evol 2013; 16: 263–269.
Sabbagh A, Courtin D, Milet J, Massaro JD, Castelli EC, Migot-Nabias F et al. Association of HLA-G 3' untranslated region polymorphisms with antibody response against Plasmodium falciparum antigens: preliminary results. Tissue Antigens 2013; 82: 53–58.
Goldstein DB, Chikhi L . Human migrations and population structure: what we know and why it matters. Annu Rev Genomics Hum Genet 2002; 3: 129–152.
Campbell MC, Tishkoff SA . African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping. Annu Rev Genomics Hum Genet 2008; 9: 403–433.
The 1000 Genomes Project Consortium Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM et al. An integrated map of genetic variation from 1,092 human genomes. Nature 2012; 491: 56–65.
Castro MJ, Morales P, Martinez-Laso J, Allende L, Rojo-Amigo R, Gonzalez-Hevilla M et al. Evolution of MHC-G in humans and primates based on three new 3'UT polymorphisms. Hum Immunol 2000; 61: 1157–1163.
Akey JM, Zhang G, Zhang K, Jin L, Shriver MD . Interrogating a high-density SNP map for signatures of natural selection. Genome Res 2002; 12: 1805–1814.
Elhaik E . Empirical distributions of FST from large-scale human polymorphism data. PLoS One 2012; 7: e49837.
Mu XJ, Lu ZJ, Kong Y, Lam HY, Gerstein MB . Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project. Nucleic Acids Res 2011; 39: 7058–7076.
Maruyama T, Fuerst PA . Population bottlenecks and non-equilibrium models in population genetics. II. Number of alleles in a small population that was formed by a recent bottleneck. Genetics 1985; 111: 675–689.
Tishkoff SA, Reed FA, Friedlaender FR, Ehret C, Ranciaro A, Froment A et al. The genetic structure and history of Africans and African Americans. Science 2009; 324: 1035–1044.
Henn BM, Cavalli-Sforza LL, Feldman MW . The great human expansion. Proc Natl Acad Sci USA 2012; 109: 17758–17764.
Nielsen R . Statistical tests of selective neutrality in the age of genomics. Heredity 2001; 86: 641–647.
Teshima KM, Coop G, Przeworski M . How reliable are empirical genomic scans for selective sweeps? Genome Res 2006; 16: 702–712.
Yu F, Keinan A, Chen H, Ferland RJ, Hill RS, Mignault AA et al. Detecting natural selection by empirical comparison to random regions of the genome. Hum Mol Genet 2009; 18: 4853–4867.
Akey JM . Constructing genomic maps of positive selection in humans: where do we go from here? Genome Res 2009; 19: 711–722.
Laval G, Patin E, Barreiro LB, Quintana-Murci L . Formulating a historical and demographic model of recent human evolution based on resequencing data from noncoding regions. PLoS One 2010; 5: e10284.
McEvoy BP, Powell JE, Goddard ME, Visscher PM . Human population dispersal ‘Out of Africa’ estimated from linkage disequilibrium and allele frequencies of SNPs. Genome Res 2011; 21: 821–829.
Pritchard JK . Whole-genome sequencing data offer insights into human demography. Nat Genet 2011; 43: 923–925.
Pluzhnikov A, Donnelly P . Optimal sequencing strategies for surveying molecular genetic diversity. Genetics 1996; 144: 1247–1262.
Martelli-Palomino G, Pancoto JA, Muniz YCN, Mendes-Junior CT, Castelli EC, Massaro JD et al. Polymorphic sites at the 3’ untranslated region of the HLA-G gene are associated with differential HLA-G soluble levels in the Brazilian and French population. PLoS One 2013; 8: e71742.
Larsen MH, Hylenius S, Andersen AM, Hviid TV . The 3'-untranslated region of the HLA-G gene in relation to pre-eclampsia: revisited. Tissue Antigens 2010; 75: 253–261.
O’Brien M, McCarthy T, Jenkins D, Paul P, Dausset J, Carosella ED et al. Altered HLA-G transcription in pre-eclampsia is associated with allele specific inheritance: possible role of the HLA-G gene in susceptibility to the disease. Cell Mol Life Sci 2001; 58: 1943–1949.
Hviid TV, Hylenius S, Rorbye C, Nielsen LG . HLA-G allelic variants are associated with differences in the HLA-G mRNA isoform profile and HLA-G mRNA levels. Immunogenetics 2003; 55: 63–79.
Rousseau P, Le Discorde M, Mouillot G, Marcou C, Carosella ED, Moreau P . The 14-bp Deletion-Insertion polymorphism in the 3’ UT region of the HLA-G gene influences HLA-G mRNA stability. Hum Immunol 2003; 64: 1005–1010.
Gonzalez A, Alegre E, Torres MI, Díaz-Lagares A, Lorite P, Palomeque T et al. Evaluation of HLA-G5 plasmatic levels during pregnancy and relationship with the 14-bp polymorphism. Am J Reprod Immunol 2010; 64: 367–374.
Mendes-Junior CT, Castelli EC, Simões RT, Simões AL, Donadi EA . HLA-G 14-bp polymorphism at exon 8 in Amerindian populations from the Brazilian Amazon. Tissue Antigens 2007; 69: 255–260.
Veit TD, Cazarolli J, Salzano FM, Schiengold M, Chies JA . New evidence for balancing selection at the HLA-G locus in South Amerindians. Genet Mol Biol 2012; 35: 919–923.
Hviid TV, Rizzo R, Christiansen OB, Melchiorri L, Lindhard A, Baricordi OR . HLA-G and IL-10 in serum in relation to HLA-G genotype and polymorphisms. Immunogenetics 2004; 56: 135–141.
Santos KE, Lima THA, Felício LP, Massaro JD, Palomino GM, Silva ACA et al. Insights on the HLA-G evolutionary history provided by a nearby Alu insertion. Mol Biol Evol 2013; 30: 2423–2434.
Ober C, Rosinsky B, Grimsley C, van der Ven K, Robertson A, Runge A . Population genetic studies of HLA-G: allele frequencies and linkage disequilibrium with HLA-A1. J Reprod Immunol 1996; 32: 111–123.
van der Ven K, Skrablin S, Engels G, Krebs D . HLA-G polymorphisms and allele frequencies in Caucasians. Hum Immunol 1998; 59: 302–312.
Hviid TV, Christiansen OB . Linkage disequilibrium between human leukocyte antigen (HLA) class II and HLA-G—possible implications for human reproduction and autoimmune disease. Hum Immunol 2005; 66: 688–699.
Castelli EC, Mendes-Junior CT, Viana de Camargo JL, Donadi EA . HLA-G polymorphism and transitional cell carcinoma of the bladder in a Brazilian population. Tissue Antigens 2008; 72: 149–157.
Park Y, Park Y, Kim YS, Kwon OJ, Kim HS . Allele frequencies of human leukocyte antigen-G in a Korean population. Int J Immunogenet 2012; 39: 39–45.
Courtin D, Milet J, Jamonneau V, Yeminanga CS, Kumeso VK, Bilengue CM et al. Association between human African trypanosomiasis and the IL6 gene in a Congolese population. Infect Genet Evol 2007; 7: 60–68.
Milet J, Nuel G, Watier L, Courtin D, Slaoui Y, Senghor P et al. Genome wide linkage study, using a 250K SNP map, of Plasmodium falciparum infection and mild malaria attack in a Senegalese population. PLoS One 2010; 5: e11616.
Le Port A, Cottrell G, Martin-Prevel Y, Migot-Nabias F, Cot M, Garcia A . First malaria infections in a cohort of infants in Benin: biological, environmental and genetic determinants. Description of the study site, population methods and preliminary results. BMJ Open 2012; 2: e000342.
Hill WG, Robertson A . Linkage disequilibrium in finite populations. Theor Appl Genet 1968; 38: 226–231.
Barrett JC, Fry B, Maller J, Daly MJ . Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 2005; 21: 263–265.
Stephens M, Donnelly P . A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet 2003; 73: 1162–1169.
Excoffier L, Smouse PE, Quattro JM . Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 1992; 131: 479–491.
Excoffier L, Laval G, Schneider S . Arlequin ver. 3.0: an integrated software package for population genetics data analysis. Evol Bioinform Online 2005; 1: 47–50.
Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR . MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 2010; 34: 816–834.
Wright S . The genetical structure of populations. Ann Eug 1951; 15: 323–354.
Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res 2002; 12: 1611–1618.
Kelley JL, Madeoy J, Calhoun JC, Swanson W, Akey JM . Genomic signatures of positive selection in humans and the limits of outlier approaches. Genome Res 2006; 16: 980–989.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D et al. PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet 2007; 81: 559–575.
Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L . Natural selection has driven population differentiation in modern humans. Nat Genet 2008; 40: 340–345.
Beaumont MA, Nichols RA . Evaluating loci for use in the genetic analysis of population structure. Proc R Soc Lond B 1996; 263: 1619–1626.
Tajima F . Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 1989; 123: 585–595.
Fu YX, Li WH . Statistical tests of neutrality of mutations. Genetics 1993; 133: 693–709.
Librado P, Rozas J . DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 2009; 25: 1451–1452.
Watterson GA . On the number of segregating sites in genetical models without recombination. Theor Popul Biol 1975; 7: 256–276.
Woolfe A, Goode DK, Cooke J, Callaway H, Smith S, Snell P et al. CONDOR: a database resource of developmentally associated conserved non-coding elements. BMC Dev Biol 2007; 7: 100.
Nei M . Molecular Evolutionary Genetics. Columbia University Press: New York, 1987.
This work was supported by the binational collaborative research program CAPES-COFECUB (Grant 653/09) and by the Spanish National Institute for Bioinformatics (www.inab.org). PL is supported by a PhD fellowship from ‘Acción Estratrégica de Salud, en el Marco del Plan Nacional de Investigación Científica, Desarrollo e Innovación Tecnológica 2008–2011’ from Instituto de Salud Carlos III. We deeply thank Txema Heredia, Angel Carreno and Jordi Rambla for computational support.
The authors declare no conflict of interest.
Supplementary Information accompanies this paper on Genes and Immunity website
About this article
Cite this article
Sabbagh, A., Luisi, P., Castelli, E. et al. Worldwide genetic variation at the 3′ untranslated region of the HLA-G gene: balancing selection influencing genetic diversity. Genes Immun 15, 95–106 (2014). https://doi.org/10.1038/gene.2013.67
- linkage disequilibrium
- balancing selection
HLA-G 3′ untranslated region variants +3187G/G, +3196G/G and +3035T define diametrical clinical status and disease outcome in epithelial ovarian cancer
Scientific Reports (2019)
Balancing immunity and tolerance: genetic footprint of natural selection in the transcriptional regulatory region of HLA-G
Genes & Immunity (2015)