HLA-G has an important role in the modulation of the maternal immune system during pregnancy, and evidence that balancing selection acts in the promoter and 3′UTR regions has been previously reported. To determine whether selection acts on the HLA-G coding region in the Amazon Rainforest, exons 2, 3 and 4 were analyzed in a sample of 142 Amerindians from nine villages of five isolated tribes that inhabit the Central Amazon. Six previously described single-nucleotide polymorphisms (SNPs) were identified and the Expectation-Maximization (EM) and PHASE algorithms were used to computationally reconstruct SNP haplotypes (HLA-G alleles). A new HLA-G allele, which originated in Amerindian populations by a crossing-over event between two widespread HLA-G alleles, was identified in 18 individuals. Neutrality tests evidenced that natural selection has a complex part in the HLA-G coding region. Although balancing selection is the type of selection that shapes variability at a local level (Native American populations), we have also shown that purifying selection may occur on a worldwide scale. Moreover, the balancing selection does not seem to act on the coding region as strongly as it acts on the flanking regulatory regions, and such coding signature may actually reflect a hitchhiking effect.
The nonclassical human leukocyte antigen (HLA) class I molecules HLA-E, HLA-F and HLA-G differ from classical class I molecules in terms of cellular and tissue expression patterns, peptide binding properties and functions.1 Although a variety of normal human adult tissues express HLA-G transcripts at low levels,2 upregulation of these genes may occur in several pathological conditions, including inflammatory, neoplastic and infectious disorders.3, 4, 5, 6 Based on its abundant expression at the maternal–fetal interface, particularly in the extravillous cytotrophoblast, the HLA-G molecule has been primarily associated with maternal–fetal tolerance.3 Alternative splicing of a single HLA-G messenger RNA (mRNA) may generate four membrane-bound isoforms (HLA-G1 to G4) and three soluble isoforms (HLA-G5 to G7).7 The membrane-bound variant HLA-G1 suppresses CD4+ T-cell proliferation and the soluble variant HLA-G5 may induce apoptosis in activated CD8+ T cells.8 As HLA-G participates in the inhibition of maternal cytotoxic T-lymphocyte and natural killer cytolytic function,2, 9 in the prevention of proliferation of CD4+ T cells,8 and in dendritic cell tolerization,10 this molecule has an important role in the modulation of the maternal immune system during pregnancy.2, 11, 12 Moreover, HLA-G expression has been extensively evaluated in several disorders of distinct etiologies, including chronic viral infections, autoimmune disorders and tumors.13
The 5′-upstream regulatory region (5′URR) and the 3′-untranslated region (3′UTR) of HLA-G transcripts control the constitutive and inducible expression of this gene. The 5′URR exhibits unique structural and functional characteristics in terms of regulatory sequences, trans-acting factors and regulatory pathways when compared with other HLA class I genes,14 whereas 3′UTR may influence mRNA stability and mRNA affinity for several microRNAs.15, 16 In contrast to the modest level of variation typical of the coding region, the regulatory regions of the HLA-G gene are quite polymorphic.17, 18
The HLA-G coding region is less polymorphic than the coding region of the classical class I loci (which are the most polymorphic in the human genome),19 and limited variability has been observed in different populations.13 To date, 50 HLA-G alleles have been officially recognized by the World Health Organization and the International Immunogenetics Database (IMGT, July 2013, database release 3.13.1), presenting sequence variations in coding and noncoding regions. Despite the 50 recognized HLA-G alleles, the existence of only 16 different membrane-bound full-length proteins indicate that most of the known alleles contain synonymous or intronic substitutions. In addition, there are two truncated nonfunctional proteins encoded by two null alleles carrying premature stop codons.20, 21 The first null allele, G*01:05 N, is defined by a ΔC deletion at the first base of codon 130 in exon 3 of G*01:01:02:01,18 which shifts the reading frame.22, 23 The second null allele, G*01:13 N, is defined by a CT transition at the first base of codon 54 in exon 2, which determines the presence of a premature TAG stop codon;24 this allele is apparently associated with an incomplete formation of all HLA-G membrane-bound and soluble isoforms.
The HLA-G immunoregulatory function probably places it under a different selective regime19 as compared with classical HLA molecules, which experience strong balancing selection at codons involved in peptide binding. Such lower degree of polymorphism in HLA-G may ensure that the allogeneic fetus survives, providing an invariable tolerance mechanism. Evidence that HLA-G alleles are associated with recurrent miscarriage has been reported: increased frequencies of the G*01:01:03 (Germany),25 G*01:04 (USA),26 and G*01:06 (France)27 alleles as well as higher frequency of the 14-bp insertion (USA)28 have been observed. The first two studies also revealed association of the G*01:05 N allele with recurrent miscarriage,25, 26 suggesting that a reduced level of HLA-G molecules is a risk factor for spontaneous miscarriage.12 The frequencies of these alleles associated with increased susceptibility vary across geographic regions. Most of the populations analyzed so far present low G*01:05 N frequencies, ranging from 0 to 2.3% in Europe and from 0 to 1.4% in Eastern Asia. However, the allele reaches high frequencies in India (15%) and in sub-Saharan Africa (ranging from 4.8% in Ghana to 11.1% in Zimbabwe—reviewed by Donadi et al.13). The high frequency of the null allele in some populations and the putative role of HLA-G in placental development suggest that maintenance of this mutation may have been subject to non-neutral evolutionary forces.29 The fact that the highest frequencies of G*01:05 N occur in populations from areas with a historically high pathogen load indicates that intrauterine pathogens may act as selective agents, increasing the survival of heterozygous fetuses carrying the G*01:05 N allele in pregnancies happening in the presence of infections. In this case, reduced HLA-G expression in G*01:05 N heterozygous placentas may raise the number of T cells available in the uterus and reverse the inhibition of natural killer cells, improving the intrauterine defense against infections,29, 30 without leading to a miscarriage. Nonetheless, sequence analyses of HLA-G coding regions have shown either weak or no support for selection, as judged from data on substitution rates or site frequencies (e.g., Tajima’s D).18, 20, 21 However, the pattern of variation in both the 5′URR and 3′UTR HLA-G regulatory regions is consistent with balancing selection,17, 18, 31, 32 culminating in highly frequent divergent HLA-G lineages. These lineages may exhibit different HLA-G expression profiles and might participate in a fine balance between high-expressing and low-expressing HLA-G haplotypes.17, 18
Isolated Amerindian populations in whom pregnancies have been rarely monitored via specialized medical care may represent a good model to study HLA-G gene evolution. South America, particularly the Amazon Rainforest, is home to many pathogens33 that can trigger infections accounting for abortions and stillbirths.34, 35 Such intrauterine infections could more markedly affect fetuses carrying high HLA-G expression genotypes. In fact, analysis of the 14-bp insertion/deletion polymorphism in HLA-G 3′UTR,36 whose insertion allele has been associated with lower HLA-G mRNA production11 and more stable HLA-G transcripts,37 has revealed relatively low insertion frequency in Amerindian populations from the Amazon Rainforest and a trend toward balancing selection (possibly in the form of heterozygote advantage) at this locus.31 However, in further studies that aimed to determine the possible influence of balancing selection in favor of G*01:05 N or G*01:13 N heterozygotes, neither allele was encountered in isolated Amerindian populations from Brazil, a fact probably due to demographic and/or selective factors.20, 21
In this new attempt to investigate the role of natural selection in the HLA-G coding region in the Amazon Rainforest, exons 2, 3 and 4—which are responsible for encoding the extracellular domains α1, α2 and α3 of the HLA-G molecule, respectively2—were sequenced in 142 Amerindians from five isolated tribes that inhabit the Central Amazon (Katukina, Kaxináwa, Marúbo, Tikúna and Yaminawa).
Definition of HLA-G alleles and genotypes
At present, 37 single-nucleotide polymorphisms (SNPs) spread across exons 2 (14 SNPs), 3 (14 SNPs) and 4 (9 SNPs) define the official HLA-G alleles determined by exon polymorphism. Only six SNPs identified to date were observed in the Amerindian sample analyzed here. Such SNPs present strong linkage disequilibrium between each other and were, therefore, phased into haplotypes by the EM38 and PHASE39 algorithms (Table 1).
The EM and PHASE methodologies estimated the HLA-G allele composition of each individual with mean probabilities of 0.997 and 0.993, respectively. Of the 142 individuals submitted to computational haplotype reconstruction by EM and PHASE, only 5.63% and 7.75%, respectively, did not have their most probable haplotype constitutions reconstructed with a probability >99%. Only two (1.41%) individuals had their most probable haplotypes inferred with probabilities of <95%. Nevertheless, all the samples had the same haplotype combination determined by both EM and PHASE.
The resulting haplotypes were then converted into HLA-G alleles according to the official nomenclature. Eighteen samples presented the same new HLA-G allele in addition to an officially recognized one. The EM and PHASE methodologies estimated the HLA-G allele composition of these 18 samples with probabilities in the range of 0.982–1.000 and 0.995–1.000, respectively. A parsimony-based analysis indicated that this new allele (designated HLA-G*new throughout this paper) originated from crossing-over between alleles identified in this population sample. Exon 2 probably derived from either HLA-G*01:01:02, G*01:01:03, G*01:01:08 or G*01:04:01 alleles, whereas exon 4 derived from either G*01:01:06 or G*01:03:01 alleles (Table 1); the crossing-over event may have taken place along intron 2, exon 3 or intron 3. Cloning the HLA-G*new allele and sequencing its introns may contribute to disclosing the origin of such allele by revealing which of the above alleles underwent the proposed/putative crossing-over event. A second crossing-over event inside the HLA-G gene may have occurred between HLA-G*01:01:02 and another HLA-G allele that lacked the 14-bp sequence in the 3′UTR, generating the haplotype HLA-G*01:01:02/−14bp. One and two copies of this haplotype were found in the Bom Jardim (Tikúna) and Vida Nova (Marúbo) villages, respectively, an association that has not been found in worldwide populations analyzed to date.18, 40
HLA-G and Amerindian population genetics
Population genetics analysis was based on HLA-G haplotypes (Table 1). Allele frequencies are shown in Table 2; only 8 of the 41 officially recognized HLA-G alleles determined by exon polymorphism were found. Considering HLA-G alleles, three departures (in a total of nine tests conducted in individual villages) from Hardy–Weinberg equilibrium owing to heterozygote deficiency were observed (Table 3). However, a single deviation (Marúbo) persisted after correcting for multiple tests. Observed heterozygosity was lower than expected in all the analyzed villages and tribes (Table 3). Genotyping error is an additional concern when we find deviation from Hardy–Weinberg equilibrium. In the present study we used an average of two reads for each exon and inferred haplotypes with high accuracy. Amplification and sequencing reactions for each HLA-G exon were repeated whenever necessary to obtain high-quality electropherograms, with completely distinguishable peaks and without significant baseline noise. It should be emphasized that homozygous samples originate from amplification of three different DNA fragments, which could undermine the possibility of preferential amplification and null alleles. Moreover, this heterozygote deficiency was also observed for the HLA-G 3′UTR 14-bp insertion/deletion polymorphism (Table 3). Various factors may be responsible for this observed heterozygote deficiency, but genetic drift (due to the isolation coupled with small population effective sizes), intra-population structure and nonrandom mating (inbreeding), and natural selection (due to the functional role of this polymorphism) are the most probable ones in the present scenario. The disequilibrium observed when all the villages are considered together is certainly due to the amalgamation of highly heterogeneous populations.
The exact test of population differentiation based on HLA-G allele frequencies revealed statistically significant differences (P=0.0000±0.0000) between the Tikúna tribe and the Pano linguistic group (represented here by the Katukina, Kaxináwa, Marúbo and Yaminawa tribes), which is consistent with the already described tribal isolation that characterizes the Tikúna tribe.41 Moreover, no significant differences were observed in any of the six pairs of Tikúna villages (P>0.05), which is consistent with the intra-tribal exogamy of the previously described Tikúna social system.41, 42, 43 On the other hand, statistically significant differences (P<0.05) were observed between villages from the Pano linguistic group in six of the ten possible comparisons, including the comparison between the two Katukina villages (P=0.0231±0.0008). Although at a first glance the latter finding may seem quite unexpected, it corroborates previous results concerning mtDNA haplogroup data obtained by our research group (Mendes-Junior CT, unpublished data), suggesting that the strong ethnic, linguistic and cultural homogeneity that characterizes the Pano linguistic group44, 45, 46 does not imply genetic homogeneity. An analysis of molecular variance was performed assuming a hierarchical structure, in which the first group, Tikúna, was composed of all Tikúna villages (Bom Jardim, Campo Alegre, Marajá and Nova Itália), whereas the second group, Pano, was composed of all Pano villages (Cana Brava, Morada Nova, Sete Estrelas, Vida Nova and the Yaminawa village). Differences between the two groups accounted for 4.92% of the variance of HLA-G allele frequencies, whereas only 2.15% of the variance occurred as a consequence of differences between the villages that belong to the same group. These figures reflected the results obtained by means of the exact test of population differentiation based on HLA-G allele frequencies.
Signatures of natural selection on HLA-G
To investigate whether natural selection may exert significant pressure on the allele and genotype frequencies, the Ewens–Watterson test of neutrality was performed (Table 3). Regarding the HLA-G 14-bp insertion/deletion polymorphism, only the Pano linguistic group presented a statistically significant negative normalized F value (observed homozygosity lower than expected homozygosity). All the 11 remaining normalized F values were negative, and with low P-values (0.0504P0.3207). Regarding the HLA-G alleles, 11 of the 12 normalized F values were negative and 3 of them were statistically significant. Moreover, F values estimated for the whole sample for both the 14-bp polymorphism (P=0.0669) and HLA-G alleles (P=0.0203) were negative. The second neutrality test employed, the Tajima’s D test, which takes into account sequence diversity, identified positive D values for all 12 tests, with 4 of them being statistically significant (Table 4). Moreover, the D value estimated for the whole sample was also significantly positive. Taken together, the negative Ewens–Watterson’s normalized F values and the positive Tajima’s D values reflected an excess of diversity with respect to neutral expectations, which may be due to balancing selection.
The third neutrality test applied, McDonald and Kreitman’s test, compares the ratios of fixed to polymorphic substitutions of nonsynonymous and synonymous substitutions between species (Table 5). Although no signatures of natural selection were observed in comparisons between humans and great apes, significant departures from neutrality were detected for exon 2 and exons 2 and 3 in the comparisons between humans and both Macaca mulatta and Macaca fascicularis. In both cases, the ratios of fixed to polymorphic sites were much lower in synonymous than in nonsynonymous sites. These results are consistent with positive selection operating by fixing nonsynonymous mutations since the divergence between macaques and humans. The outcome of the last neutrality test, based on the difference between synonymous and nonsynonymous nucleotide substitution rates in human HLA-G sequences, revealed excess of synonymous changes in all exons from worldwide HLA-G sequences retrieved from IMGT, which is consistent with purifying selection (Table 6). The results of this test when applied to HLA-G sequences from the Amerindian populations analyzed in the present study corroborate those from worldwide HLA-G sequences, although without achieving statistical significance (Table 6), which is probably due to the smaller amount of different sequences.
The difference between the tests should be interpreted in the light of the timescale and type of data analyzed. Whereas the Ewens–Watterson and the Tajima’s D tests are sensitive to relatively recent evolutionary history (tens to hundreds of thousands of years) and interpret allele frequency variation in population samples, the McDonald and Kreitman’s and dN/dS tests are sensitive to long-term selection (acting continuously over long spans of time) and only attain significance if multiple adaptive substitutions take place over the entire locus. In the present case, the results indicate that adaptive substitutions in nonsynonymous sites occurred since the divergence between great apes and old world monkeys, that few adaptive substitutions have occurred at the HLA-G locus within the Hominidae family, and finally that recent evolutionary history has resulted in a distribution of alleles among populations at frequencies that are consistent with balancing selection.
Various alleles from classical class I HLA genes have been found exclusively in Central and South Amerindian populations, which have stimulated new studies regarding the roles that mutation, natural selection and genetic drift have in HLA gene diversity in Amerindian populations. According to Salzano,47 in addition to the limited level of variability when compared with other ethnic groups, the most important features concerning the HLA system in Amerindians is the occurrence of allele turnover and antigen-driven positive selection, probably due to historical changes in population size and to diversified exposure to infectious agents. The allele turnover model48 was proposed to explain the limited number of alleles and the high proportion of exclusive alleles available inside each Amerindian population. This model is based on the joint action of genetic drift and balancing selection, which would preferentially maintain new alleles, usually originated by gene conversion instead of the old ones.48, 49 Changes in the frequency of alleles in the Americas may be interpreted as a selective response to specific pathogens that occur in this region.
Peptide presentation is not the major function of the HLA-G molecule.13 Rather, the primary function of this molecule is to modulate the immune system,2, 11, 12 and a conserved molecular structure probably has a major role in this scenario. Therefore, the genetic variability at HLA-G may be largely driven by purifying selection and genetic drift. Although, gene conversion results in a higher probability of generating a functional gene product, in comparison to random mutation involving single nucleotides, the maintenance of alleles that result from crossing-over events in these populations may reflect either selective pressures favoring specific recombinants or, alternatively, that the recombinants are variants of the parental alleles with equal functional properties.
HLA-G molecules present immunosuppressive properties that facilitate unharmed fetus survival in a genetically foreign host during successful pregnancies,2, 11, 12 and may reduce autoimmune manifestations; on the other hand, it may also contribute to the susceptibility and persistence of viral infections.13 In view of the function of HLA-G molecules, the regulatory control of HLA-G expression levels is apparently the most likely target of natural selection. In fact, evidence of balancing selection driving the diversity pattern of the HLA-G 5′URRs and 3′UTRs has been previously reported.17, 18, 31, 32 The HLA-G regulatory regions consist of very divergent and frequent haplotypes that are shared among diverse human populations, and such lineages may be associated with different expression profiles and may participate in a fine balance between high-expressing and low-expressing HLA-G haplotypes.17, 18 Regarding the 3′UTR, we have proposed that the 14-bp insertion/deletion polymorphism, which is associated with stability and production level of HLA-G messenger RNA,11, 37 exhibits a trend toward balancing selection operating on the alleles at this locus in Amerindians.31 In addition, we also found a trend toward balancing selection in admixed Brazilians concerning both the 3′UTR and promoter sequences.15, 18 Moreover, HLA-G*01:05N, which is theoretically associated with reduced level of HLA-G molecules,23, 50 has been subjected to non-neutral evolutionary forces, increasing the survival of heterozygous fetuses carrying the G*01:05 N allele due to improved intrauterine defense against infections in areas with a historically high pathogen load.29, 30 Particular HLA-G genotypes associated with autoimmune disease development and susceptibility to infections could also result in signatures of natural selection.
Regardless of HLA-G function, the present results are consistent with balancing selection acting over the HLA-G coding region. The fact that the Ewens–Watterson test of neutrality (Table 3) revealed only a single positive normalized F value in 26 tests (probability of observing these by chance alone=3.87 × 10−7) makes it unlikely that these results are due to random factors, such as changes in population size or genetic drift. Rather, they suggest that balancing selection (possibly in the form of heterozygote advantage) is operating at this locus. The Tajima’s D test evidenced a similar trend (Table 4), which resulted in 13 positive values (probability of observing this by chance alone=1.22 × 10−4). A negative Tajima’s D value reflects a relative excess of rare variants, whereas a positive Tajima’s D value reveals a relative excess of intermediate frequency alleles as expected for a model of population subdivision or balancing selection.51, 52 A positive Tajima’s D value arises when selection favors the maintenance of different alleles in the population, resulting in a proportionally higher mean pairwise difference (π) as compared with the measure of diversity based on the number of polymorphic sites.53 It is noteworthy that: (a) in a worldwide study, 281 of 313 analyzed genes had negative Tajima’s D values,54 which was interpreted as strong evidence for a recent expansion of the modern human population; and (b) the Tajima’s D test lacks power to detect selection in different scenarios, such as in the presence of recent selective sweeps or negative selection,55, 56 as well as in situations with an excess of intermediate frequency variants, which characterizes population subdivision, bottlenecks or balancing selection.55
However, despite the genome-wide shift of D values toward negative values, as a consequence of the features of the human demographic history,57 Amerindian populations in fact present a shift toward positive D values.58 This is a result of their unique demographic histories, as ancient Amerindians experienced strong bottleneck events, such as the one that took place during their entry into the Americas across the Bering Strait.59 This shift toward positive D values was observed in a data set of 24 noncoding polymorphic DNA sequences, in which 21 of them presented positive Tajima’s D.58 However, only 3 of these 21 values were significant and the average D (0.7841) was much lower than the D values observed here (Table 4). Moreover, Fagundes et al.58 employed a sampling strategy of including a single individual from each sampled tribe, which may have increased the estimated D values. Therefore, we may conclude that the positive Tajima’s D values for the HLA-G coding region indeed suggest balancing selection. Given that the recent demographic histories of the analyzed South American indigenous populations differ considerably,33, 43, 46 the absence of negative Tajima’s D values and the lack of positive normalized F values are truly unexpected events under neutrality.
Apart from balancing selection acting over the HLA-G coding region, another plausible explanation is that the observed evidence could be due to a hitchhiking effect, that is, due to natural selection acting on the HLA-G 5′URRs and/or 3′UTRs or even on neighboring HLA genes that are under strong balancing selection.15, 17, 18, 19 In fact, each one of the five more frequent HLA-G alleles observed in the present study can be placed in a different main HLA-G lineage, among the six lineages identified so far.18
Interestingly, those lineages do present several nucleotide differences mainly in the regulatory regions that coincide with (or are very close to) known transcriptional factor binding sites or sites that influence HLA-G mRNA stability and degradation, a fact that is compatible with balancing selection.13, 17, 18, 60, 61 It should be emphasized that more work is needed to elucidate the mechanism(s) underlying the balancing selection acting on this gene.
On the other hand, the occurrence of purifying selection on a worldwide scale was disclosed by the synonymous and nonsynonymous nucleotide substitution test (Table 6). As mentioned earlier, this test is entirely different from the Tajima and Ewens–Watterson neutrality tests, because it does not evaluate the dynamics of the HLA-G variability within a population, but rather assesses natural selection over a set of identified HLA-G alleles and a deeper evolutionary history. Purifying selection implies that most novel DNA variants have a selective disadvantage with respect to preexisting ones, which were established in humans by positive selection, as disclosed by inter-specific analysis involving other primates by means of the McDonald and Kreitman’s test (Table 5). As a consequence, new nonsynonymous mutations remain at low frequencies and are more frequently removed by selection, whereas synonymous mutations persist.53 Therefore, purifying selection also results in increased proportion of low-frequency variants, which may stem from the deleterious effects of the novel nonsynonymous mutations that do not reach high frequencies. In contrast, balancing selection will increase the proportion of variants at intermediate frequencies, because this selection regime favors the maintenance of variation of multiple alleles.53
Knowing that HLA-G biologically affects the immune response, an invariable mechanism of maternal tolerance should be more effective, conferring higher reproductive fitness and leaving a signature of purifying selection. For example, LILRB1, the major HLA-G receptor, is expressed on the surface of several leukocytes and binds to the HLA-G molecule α3 domain. Thus, it is noteworthy that only one frequent worldwide HLA-G allele (G*01:06) presents a nonsynonymous nucleotide exchange at exon 4, which encodes the α3 domain of the molecule. One might expect that polymorphic residues existing in this domain negatively influence LILRB interactions, modulating the inhibitory intracellular signaling. It is interesting to observe that this HLA-G allele (G*01:06) has been associated with preeclampsia in several populations.13
In conclusion, we have presented evidence that natural selection has a complex role in the HLA-G coding region. While balancing selection shapes variability at a local level (Native American populations), we have also shown that purifying selection may occur on a worldwide scale and positive selection may have previously occurred during human evolutionary history. Moreover, the action of balancing selection in the coding region does not seem to be as strong as in the flanking regulatory regions, and such coding signature may actually reflect a hitchhiking effect. Whether signatures of balancing selection over HLA-G represent an isolated fact shaped by evolutionary forces characteristic of South American indigenous populations should be a matter of further studies investigating the entire HLA-G diversity in worldwide autochthonous populations.
Materials and methods
This study was approved in its ethical aspects by the Research Ethics Committee of this Institution (Hospital das Clínicas FMRP, USP) and by the National Research Ethics Committee, according to protocols HCRP n° 10049/2005 and CONEP n° 25000.159649/2005-80, respectively.
The samples analyzed here were obtained from indigenous Brazilian populations during the Alpha Helix expedition that took place in 1976.41, 44, 45 The 142 chosen Amerindians, without direct kinship (siblings or parents/sibs), belong to nine villages pertaining to five tribes (Katukina, Kaxináwa, Marúbo, Tikúna and Yaminawa) that inhabit the Central Amazon. The Katukina, Kaxináwa, Marúbo and Yaminawa tribes are members of the Pano linguistic group. Although they are scattered throughout a large geographic area, the members of the Pano linguistic group present strong ethnic, linguistic and cultural homogeneity, a feature that causes them to be considered as components of a same ‘Pano’ tribe.44, 45, 46 On the other hand, the Tikúna tribe speaks an isolated Macro-Tucanoan language that is not clearly related to any of the languages of the surrounding tribes, a fact that suggests considerable isolation and relatively little admixture with other tribes.41 Admixture levels with non-Amerindians were estimated to be definitely <5%.41, 45 Detailed descriptions of these populations such as geographic distribution, linguistics and demographic information can be found elsewhere.33, 41, 43, 44, 45
Blood samples were collected into vacutainers containing 2 ml of acid–citrate–dextrose as anticoagulant and left unfrozen for up to 12 h after collection. Within three days of collection, the samples were processed for storage; plasma was removed after centrifugation and packed cells, after being washed 3 × in saline solution, were diluted in 30% buffered glycerol and kept at −20 °C until use. DNA was extracted from 300 μl of the cell suspension according to a method described elsewhere.62
HLA-G allele characterization
HLA-G alleles were defined by nucleotide sequence variations at exons 2, 3 and 4 according to a previously described method63 that was modified as described elsewhere.40 Exons 2, 3 and 4 were individually amplified by PCR in products of 361, 457 and 364 bp, respectively. PCR products were directly sequenced, and all the sequences obtained from each sample (homozygous and heterozygous ones) were aligned with the genomic sequences of the official alleles recognized by the IMGT. Each detected SNP was individually recorded. As some samples may present two or more heterozygous SNPs, resulting in more than one possible haplotype combination, two computational methods were used to determine the haplotypes of each subject (which correspond to their HLA-G alleles), without taking any prior information into account: (1) the EM algorithm38 implemented with the HelixTree package version 6.02 (Golden Helix, Bozeman, MT, USA) and (2) a Bayesian method implemented with the PHASE v2 software.39
The methodology used in this study allowed the discrimination of all the official HLA-G alleles determined by exon polymorphism, but not the alleles with intron variation. On this basis, the alleles G*01:01:01:01 to G*01:01:01:06 were here considered as G*01:01:01, the alleles G*01:01:02:01 and G*01:01:02:02 were considered as G*01:01:02, the alleles G*01:01:03:01 to G*01:01:03:03 were considered as G*01:01:03, and the alleles G*01:03:01:01 and G*01:03:01:02 were considered as G*01:03:01.
Allelic frequencies and observed heterozygosity (hO) were computed by the direct counting method. Adherences of genotypic proportions to expectations under Hardy–Weinberg equilibrium were tested by the exact test of Guo and Thompson64 employing the GENEPOP 4.1 software.65 The latter program was also used to perform the pairwise exact test of population differentiation based on allelic frequencies.66 The expected heterozygosity values (hSk) and their s.d.s were estimated by the ARLEQUIN version 3.5 program.67 HLA-G sequence variation was assessed using θW, an estimate of the expected per-site heterozygosity, and π (nucleotide diversity), which is the average number of nucleotide differences per site between two sequences.52
The presence of a significant association between HLA-G alleles and the previously analyzed 14-bp polymorphism31 was evaluated by a likelihood ratio test of linkage disequilibrium68 using the ARLEQUIN version 3.5 program.67 Given the positive association but unknown gametic phase, the EM38 and PHASE39 algorithms, as implemented in the HelixTree and PHASE v.2 (ref. 39) packages, were used to reconstruct the HLA-G/14-bp haplotypes. The ARLEQUIN software was also employed to conduct the analysis of molecular variance69 based on HLA-G alleles.
Departure from selective neutrality was tested by four different methods. The first one was the Ewens–Watterson test.70, 71, 72 This test, based on Ewens’ sampling theory and infinite allele model, employs an alternative method to estimate the population mutation rate where K represents the number of segregating sites in a population. The observed homozygosity under Hardy–Weinberg proportions is then compared with the expected homozygosity computed by simulation under the hypothesis of neutrality/equilibrium expectations for the same sample size and number of alleles. This permits one to test alternative hypotheses of either directional (observed homozygosity greater than expected homozygosity) or balancing selection (observed homozygosity lower than expected homozygosity), by estimating the normalized deviate of the homozygosity (Fnd). This test was carried out using the PyPop 0.7.0 software.73 The second method was the Tajima’s D test,74 which examines the relationship between the number of segregating sites and nucleotide diversity by comparing the sequence diversity statistics θW and π. Under the standard neutral model, the expectations of θW and π are equal, and therefore the expected value of Tajima’s D is 0 under neutrality.52 A positive Tajima’s D value evidences heterozygous advantage and a negative value points to selection of one specific allele over alternate alleles.54 The significance of the D statistic is tested by generating 99 999 random samples under the hypothesis of selective neutrality and population equilibrium, using a coalescent simulation algorithm. This test was accomplished using the ARLEQUIN version 3.5 program.67 The third method consisted of the synonymous and nonsynonymous nucleotide substitution test, which evaluates the relative abundance of synonymous substitutions (that do not result in amino acid change) and nonsynonymous substitutions (that result in amino acid change) that occurred in the gene sequences. For data sets containing more than two sequences, this is done by first estimating the average number of synonymous substitutions per synonymous site (dS) and the number of nonsynonymous substitutions per nonsynonymous site (dN), and their variances. Then, the null hypothesis of neutrality (dN=dS) can be evaluated. This test was carried out using the modified Nei-Gojobori method75 implemented in the MEGA 5.1 program.76 The fourth test was the McDonald and Kreitman’s test,77 which compares the ratios of fixed to polymorphic substitutions of nonsynonymous and synonymous substitutions between species. Under neutrality, fixation rate is expected to be equal among nonsynonymous and synonymous sites, but positive selection would increase the rate of fixation in nonsynonymous sites, whereas balancing selection would increase the rate of nonsynonymous polymorphisms. This test was performed using DnaSP 5.10.01.78 To apply this test, 32 homologous sequences of exons 2 and/or 3 of the human HLA-G gene were retrieved from the GenBank (http://www.ncbi.nlm.nih.gov/genbank/) for the following primate species: Pan troglodytes (M95737.1, X65744.1, L41255.1, U33290.1 and U33291.1), Gorilla gorilla (M95736.1, X65745.1 and L41256.1), Pongo pygmaeus (U33292.1, U33294.1, U33288.1 and U33293.1), Cercopithecus aethiops (U33310.1, U33311.1, L41265.1 and L41266.1), M. mulatta (U33295.1, U33304.1, U33305.1, U33306.1, L41261.1, L41262.1, L41263.1 and L41264.1), and M. fascicularis (U33301.1, U33302.1, U33303.1, U33312.1, L41257.1, L41258.1, L41259.1 and L41260.1).
We thank Dr Simone Kashima, Sandra Rodrigues, Rochele Azevedo, Evandra Strazza Rodrigues and Adriana Marques for technical assistance. We thank Cynthia Maria de Campos Prado Manso and Elettra Greene for assistance with the English language. This study was supported by CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico—Brazil) grant 475670/2007-8, CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior), FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo) grant 2013/19382-0 and FAEPA (Fundação de Apoio ao Ensino, Pesquisa e Assistência do HCFMRP-USP). CTM-J was supported by post-doctoral fellowships from CNPq/Brazil (150996/2005-5) and from FAPESP/Brazil (07/58391-4), and is currently supported by a Research fellowship from CNPq/Brazil (305493/2011-6). ECC was supported by a post-doctoral fellowship from FAPESP/Brazil (07/58420-4).