Linkage disequilibrium and age of HLA region SNPs in relation to classic HLA gene alleles within Europe

Evseeva, Irina; Nicodemus, Kristin K; Bonilla, Carolina; Tonks, Susan; Bodmer, Walter F

doi:10.1038/ejhg.2010.32

Download PDF

Article
Published: 31 March 2010

Linkage disequilibrium and age of HLA region SNPs in relation to classic HLA gene alleles within Europe

Irina Evseeva¹^na1,
Kristin K Nicodemus^1,2^na1,
Carolina Bonilla¹,
Susan Tonks¹ &
…
Walter F Bodmer^1,3

European Journal of Human Genetics volume 18, pages 924–932 (2010)Cite this article

1534 Accesses
19 Citations
3 Altmetric
Metrics details

Subjects

Abstract

The HLA region on chromosome 6 is gene-rich and under selective pressure because of the high proportion of immunity-related genes. Linkage disequilibrium (LD) patterns and allele frequencies in this region are highly differentiated across broad geographical populations, making it a region of interest for population genetics and immunity-related disease studies. We examined LD in this important region of the genome in six European populations using 166 putatively neutral SNPs and the classical HLA-A, -B and -C gene alleles. We found that the pattern of association between classic HLA gene alleles and SNPs implied that most of the SNPs predated the origin of classic HLA gene alleles. The SNPs most strongly associated with HLA gene alleles were in some cases highly predictive of the HLA allele carrier status (misclassification rates ranged from <1 to 27%) in independent populations using five or fewer SNPs, a much smaller number than tagSNP panels previously proposed and often with similar accuracy, showing that our approach may be a viable solution to designing new HLA prediction panels. To describe the LD within this region, we developed a new haplotype clustering method/software based on r², which may be more appropriate for use within regions of strong LD. Haplotype blocks created using this proposed method, as well as classic HLA gene alleles and SNPs, were predictive of a northern versus southern European population membership (misclassification error rates ranged from 0 to 23%, depending on which independent population was used for prediction), indicating that this region may be a rich source of ancestry informative markers.

HLA allele-calling using multi-ancestry whole-exome sequencing from the UK Biobank identifies 129 novel associations in 11 autoimmune diseases

Article Open access 03 November 2023

Guillaume Butler-Laporte, Joseph Farjoun, … J. Brent Richards

A high-resolution HLA reference panel capturing global population diversity enables multi-ancestry fine-mapping in HIV host response

Article 05 October 2021

Yang Luo, Masahiro Kanai, … Soumya Raychaudhuri

HLA-G genetic diversity and evolutive aspects in worldwide populations

Article Open access 29 November 2021

Erick C. Castelli, Bibiana S. de Almeida, … Eduardo A. Donadi

Main

The HLA region on chromosome 6 is an important region of interest for both population genetics and immunity-related disease studies. Owing to the selective pressure associated with immune functions, linkage disequilibrium patterns and allele frequencies are highly differentiated across populations. As HLA typing of classic HLA gene alleles is expensive and time-consuming, although necessary for transplantation matching and detailed analysis of disease associations, we assessed whether a simple new method could ascertain SNPs that were informative of HLA allele carrier status. One important consideration in finding such SNPs is the age of the SNP in question, and whether it is likely that it arose before or after the origin of classic HLA gene alleles. SNPs that arose on a particular classic HLA gene allele haplotype background may be more informative for the prediction of carrier status for that allele than SNPs that predate the origin of the allele. It is also of interest to examine the linkage disequilibrium (LD) structure of the HLA region, especially within closely related populations. Previous approaches for the creation of haplotype blocks have generally relied on the LD metric D’, which may not be as sensitive within these high LD regions as the alternative LD metric r². We developed a new haplotype blocking strategy based on r², and applied this method to SNP data in the HLA region across six European populations. Finally, we tested whether haplotype blocks, HLA alleles and SNPs were useful for differentiating European populations using logistic regression and unsupervised clustering algorithm approaches. Using these approaches, we identified new SNPs in the HLA region that may be useful as ancestry informative markers (AIMs) for European populations.

Materials and methods

Populations sampled

The project involved genotyping 657 DNA samples from unrelated representatives of nine European populations: English (n=77, mainly from Birmingham), Orcadians from the Orkney Islands, UK (n=88), Catalans (n=66), Italians from Bergamo (n=82), Piedmont (n=59) and Sicily (n=59), French Basques (n=76), Finns (n=71) and unrelated individuals from the CEPH reference families in Utah (n=79). Nine European populations were available for study, of which we used six populations for logistic regression and clustering analyses. These populations were collapsed into northern European (Orkney and English) and southern European (Bergamo, Piedmont, Sicily and Catalan). This choice was based on the assumption that the Basques and the Finns are generally considered to be outlier populations, whereas the European CEPH are an unknown mixture, although retrospectively they are mostly northern European. This is a limited sample of European populations, but serves to illustrate methods of analysis and does reveal major differences between northern and southern European populations.

SNP selection

Two sets of markers were selected:

1
A total of 188 SNPs within the HLA region defined as putatively neutral by the following criteria: their presence in introns, pseudogenes, intergenic regions or by synonymous changes. The SNPs were selected on the basis of their chromosome position to provide, as far as possible, an even distribution across the 3.9 Mb HLA region (chromosome 6, 29587512–33516520, National Center for Biotechnology Information, Build 36.3) at an average density of 1 SNP per 18 kb. In addition, extra SNPs were chosen to cover areas around presumed recombination hotspots.¹ Only SNPs with minor allele frequency reported on NCBI as being greater than 0.05 were included.
2
A total of 78 SNPs in exons 2 and 3 of the HLA-A, B and C genes providing a low-to-medium resolution HLA Class I allele typing with a total of 69 alleles (locus A – 18, B – 31 and C – 20), all with frequencies greater than 0.05 in Caucasians in the Allele^*Frequencies in World Populations Database (http://www.allelefrequencies.net).

Genotyping

A total of 35 SNPs were successfully genotyped in house using ARMS-PCR (amplification refractory mutation system) with KCl buffer and 15 ng of DNA in a 6.5 μl PCR reaction. Detection of the product was carried out using AMDI (alkaline-mediated differential interaction). In all, 153 SNPs were typed by the Centre National de Genotypage (Ivry, France), using a customised Illumina Beadarray Platform.² Of these, 131 gave successful results and were included in the analysis, resulting in a total of 166 SNPs. HLA class I allele typing was carried out based on the 12th International Histocompatibility Workshop Protocol.³ The 78 ‘diagnostic’ SNPs were typed by ARMS-PCR with MgCl₂ buffer and 15 ng of DNA in 6.5 μl PCR reaction, followed by AMDI detection.⁴ All typing results were checked for Hardy–Weinberg equilibrium using a cutoff P-value of 0.05 to exclude abberrant results. Internal quality control with 94 duplicates gave 100% concordance. A total of 14 individuals had five or more missing genotypes and were removed from further analysis, thus the available N was 643.

Statistical methods

Prediction of classic HLA alleles by SNPs

We tested whether the 166 SNPs predicted individual classic HLA-A, B and C gene alleles by deriving a binary variable for each HLA allele and each SNP. The binary variable represented the presence or absence of the haplotype or minor allele, respectively, in an individual: thus, if an individual carried the haplotype or minor allele, they were assigned a 1, otherwise a 0. We then used Fisher's exact test to test the 2 × 2 associations between each SNP and HLA allele. We corrected for multiple testing using a Bonferroni correction for the total number of tests (166 SNPs × 56 observed HLA alleles = 9296 tests). We ranked pairwise associations between SNPs and HLA alleles by −log₁₀(P-value), and, using all SNPs passing Bonferroni's correction, obtained counts of the number of SNPs in which minor alleles were present in HLA allele carriers and noncarriers. In addition, we formally tested whether these SNPs were able to predict HLA carrier status using a split-half cross-validation approach. We estimated a logistic regression model on the training set containing the five most strongly associated SNPs (with the exception of HLA-B^*44, which only had four SNPs in the Bonferroni-corrected set) to predict HLA allele carrier status. We used only the top five most strongly associated SNPs instead of the full set because of multicollinearity. We then used the test set to predict the HLA allele carrier status. We validated the ability of this model built on the training data to predict the HLA allele carrier status using the independent CEPH population. The logistic regression model was of the following form:

where the probability of carrying a particular HLA allele was the outcome to be predicted by the linear combination of x_i of N markers in the equation, which were binary coded as the presence or absence of the minor frequency allele carried by that individual. We then calculated the sensitivity (number of predicted carriers/number of true carriers) and specificity (number of predicted noncarriers/number of true noncarriers) of the test set; as the association between SNPs and HLA alleles was calculated using the full set of six populations, this may overestimate both sensitivity and specificity. We therefore also calculated the same quantities for the independent CEPH population.

r2blocks: a haplotype blocking algorithm based on r²

We implemented a new algorithm to define correlated clusters or blocks of SNPs based on the LD metric r² (r2blocks) and compared this algorithm with blocks defined by the programme Haploview,⁵ which are based on the evidence for historical recombination using D’.⁶ D’ is less sensitive in tightly correlated LD regions than r². The clustering algorithm of r2blocks accepts genotype-level data or phased haplotypes as input and allows users to impute missing genotypes, set an r² threshold for defining blocks and set a maximum number (M) of SNPs to skip that do not pass the threshold while continuing to build a block. Briefly, starting with the highest r² value for all pairs of SNPs that are separated by the most M SNPs, it calculates pairwise r² measures for all SNPs within M SNPs of the first pair and continues to grow the block in either direction as long as one pairwise r² value within M SNPs of any SNP within the current block is above the threshold, omitting SNPs that do not pass the threshold (Figure 1). It then considers the next SNP within the block and all pairwise r² values for SNPs within M SNPs, growing the block until no additional SNPs remain or until no additional SNPs pass the threshold value. We evaluated two threshold values for the creation of haplotype blocks: r²≥0.70 and r²≥0.5 with M set to 4. The blocking algorithm is implemented in a freely available contributed package r2blocks for the R statistical computing environment (www.r-project.org).

Population differentiation: HWE, F_ST, association, prediction and clustering

As population differentiation can cause departures from the Hardy–Weinberg equilibrium (HWE), we tested for departures from HWE in the pooled population and in the northern and southern populations separately using Fisher's exact test.

We used Weir and Cockerham's⁷ estimate of F_ST as implemented in the R package Geneland⁸ to assess genetic differentiation using the 166 SNPs and the top 20 SNPs, individual r2blocks blocks or two-locus HLA haplotypes that were most strongly associated with the north–south status. In addition, we calculated F_ST for individual SNPs, r2blocks haplotype blocks, HLA alleles and HLA haplotypes. Three-locus HLA haplotypes were estimated using PHASE v.2.1.1^{9, 10} with parent-independent mutation; two-locus HLA haplotypes were derived from those estimates.

Allele- and genotype-based associations were tested between individual SNPs, r2blocks-defined haplotypes, HLA alleles and 2- and 3-locus HLA-A, B and C haplotypes, and the north–south status using χ² tests or Fisher's exact test, when appropriate. Allele-based tests tested the association between each allele and the north–south status (sample size = 2N); genotype-based tests tested the number of minor alleles versus the north–south status (sample size = N). We set the P-value threshold to 0.05.

We ranked association tests between north–south status and SNPs, haplotype blocks and 2-loci HLA haplotypes by −log₁₀(P-value) and considered the top 20 most strongly associated predictors in a leave-one-population-out approach to validate predictive ability on an independent southern population. We could not carry out leave-one-out analyses with the northern set because of small sample size; instead, we used the CEPH sample for prediction. The logistic regression model used was as follows:

where the probability of population membership is the outcome predicted by the linear combination of x_i of N markers/haplotypes in the equation, which are coded as the presence or absence of particular alleles/haplotypes carried by that individual. Prediction of individual population assignment for the CEPH population and for the removed southern population was carried by calculating the probability of being northern European using each individual's observed genotypes in the model. Misclassification rates were calculated by taking the number of individuals misclassified given their ‘true’ north/south label/total number of individuals in the independent population.

Current approaches to population differentiation detection using genome-wide sets of biallelic markers often apply unsupervised clustering algorithms, such as principal components analysis (PCA, eg, EIGENSTRAT,¹¹ KPCA from the R package kernlab¹²) or Bayesian methods such as implemented in STRUCTURE^{13, 14} or BAPS.^{15, 16} We applied EIGENSTRAT, KPCA and BAPS to data from the 166 SNPs.

Results

Association of classic HLA alleles by genotyped SNPs

Nearly all of the classic HLA-A, B and C gene alleles were strongly associated with at least one SNP. Table 1 lists the top 20 most strongly associated SNP–HLA allele pairs and Figure 2 shows the −log₁₀(P-values) for those SNPs passing Bonferroni correction (see Supplementary Table 1 for a full list of all pairs passing Bonferroni correction). As expected, most SNPs in strong LD with one or more alleles at a particular HLA locus are physically proximal to the corresponding locus, with peaks observed close to the physical location of each gene, although LD patterns extended across most of the region (see Figure 2). Only a few very low-frequency HLA alleles (4 HLA-A alleles; 10 HLA-B alleles; and 2 HLA-C alleles) were not observed to show Bonferroni-corrected association with genotyped SNPs. A total of 55 (33.1%) SNPs did not show corrected association with HLA alleles; of the 111 significantly associated SNPs, 72 (64.8%) showed association with ≤3 alleles. One SNP showed strong association with seven HLA alleles: rs1265059 (HLA-A^*29, HLA-B^*07, HLA-C^*0702, ^*06, ^*16, ^*0302 and ^*0303). The HLA-A^*29/C^*16 haplotype has a frequency of 2.8% in Northern Ireland and the HLA-B^*07/C^*0702 haplotype is frequent in the same population (17.0%) (frequencies from allelefrequencies.net), indicating that this SNP may be tagging common haplotypes. Note that the position of HLA-DRA1 is between SNPs 122 and 123 and that of HLA-DRB1, HLA-DQA1 and HLA-DQB1 are between SNPs 129 and 130, possibly explaining the strong association observed on the far right hand side of Figure 2.

Table 1 Counts, odds ratios (ORs) and –log₁₀(p-values) for association tests for the top 20 pairs of Bonferroni-corrected significantly positively associated ‘tagging’ pairs of HLA alleles and SNPs

Full size table

Histograms of counts of the number of minor alleles carried at SNPs passing Bonferroni's correction in HLA allele carriers and noncarriers clearly showed a bimodal distribution (Figure 3; also see Supplementary Figures S1-S10). Using the top five most strongly associated SNPs and the HLA haplotypes observed with >1% frequency in either northern or southern European populations (Supplementary Table 2), we observed high sensitivity and specificity in predicting whether an individual carried a particular classic HLA allele in both the test set and in the independent CEPH set for most HLA alleles tested (Table 2). In particular, the overall misclassification rate for HLA-A^*29 in the independent CEPH set was <0.01, with sensitivity near 1.0 (0.997) and 94.3% specificity. In addition, HLA-A^*01 showed <5% misclassification rate in the CEPH population (4.5%) and had 93.5% sensitivity and 99.9% specificity, indicating that genotyping even a small number of SNPs can provide information about the HLA allele carrier status, although not as complete information as direct HLA typing.

Table 2 Sensitivity, specificity and misclassification rates for five SNP logistic regression models predicting HLA allele carrier status

Full size table

Comparison of haplotype blocking algorithms

Using an r² threshold of 0.70 and a window size (M) of 3, r2blocks creates 12 blocks across the HLA region in the pooled sample of European populations. Reducing the r² threshold to 0.50 leads to an additional 7 blocks and 14 additional SNPs being assigned to blocks (Figure 4; Supplementary Table 3). The Gabriel block method using default parameters (lower bound D’ confidence interval ≥0.7 and upper bound confidence interval ≥0.98) defines 18 blocks. The two additional blocking methods (the four gamete rule (FGR) and solid spine of LD with default D’ threshold of 0.7) both create twice as many blocks over the region compared with r2blocks and the Gabriel method, and assign nearly half of the HLA genomic region to haplotype blocks. In northern European populations, r2blocks using an r² threshold of 0.50 creates 20 blocks over the HLA region and 19 blocks in southern European populations; block regions were generally consistent across the two sets of European populations. Using r2blocks with an r² threshold of 0.50 leads to a similar number of blocks and SNPs assigned to blocks as the Gabriel method, although the block boundaries are often different. Not surprisingly, the use of r2blocks with a higher r² (0.70) threshold leads to a more similar percentage of genome covered as the Gabriel method than the lower r² threshold. r2blocks with both r² thresholds and the Gabriel method show that LD is slightly lower in southern versus northern European populations, but with similar average block size, number of SNPs assigned to blocks and percentage of the genome assigned to blocks in northern Europeans. Of the 20 blocks assigned using r2blocks with an r² threshold of 0.50, more than half (11; 55%) are in genic regions.

HWE, F_ST, association tests and clustering of markers/haplotypes, and northern–southern European status

Excess deviations from HWE versus expected numbers of deviations can be induced by population structure. Of the 166 SNPs tested, 15 (9.0%) were out of HWE at the α=0.05 level, almost double the number expected by chance alone (8.3) (Supplementary Table 4 shows HWE, association test results and F_ST for all markers). In separate analyses of northern/southern populations, the number of SNPs out of HWE was much nearer the expected value (8 and 10, respectively), suggesting that the excess when using the combined populations is probably because of population structure.

Overall, F_ST values using the 20 SNPs, haplotypes and classic HLA gene alleles most strongly associated with north–south status were modest, as expected within European populations (Supplementary Table 5; see Supplementary Table 3 for the full list of markers). The largest F_ST value (0.056) was observed between northern and southern populations using this panel of 20 alleles/markers/haplotypes, which were selected to highlight north–south differences; similarly, the second largest F_ST value (0.050) was observed between southern populations and the CEPH sample. The smallest F_ST value (0.0024) was found between northern populations and the CEPH sample. Single SNP F_ST values were strongly negatively correlated with both allelic (r=−0.41, −log₁₀(P-value)=7.42) and genotypic (r=−0.42, −log₁₀(P-value) =7.81) association test P-values for north–south status, indicating that the allelic/genotypic tests are similar measures to F_ST (Supplementary Table 4). We note that single SNP F_ST estimates are approximately distributed as χ²₁ and thus have large variances. The largest single SNP F_ST for differentiation between northern and southern European populations was for rs411136 in SYNGAP1 (F_ST=0.29); interestingly, this same SNP showed the largest pairwise F_ST between any two populations, namely, for southern Europeans versus CEPH (F_ST=0.51). However, the same comparison between northern Europeans and CEPH produced an F_ST value of 0.015, suggesting that rs411136 may be an important AIM within European populations.

Of the 166 SNPs, 74 (44.6%) were significantly associated with north–south status using the uncorrected allelic or genotypic test, and, after Bonferroni correction for the 332 allelic and genotypic tests, 25 (15.1%) still showed significant associations (Table 3; see also Supplementary Table 4). The strongest association was observed with rs411136 in SYNGAP1, with an allelic test −log₁₀(P-value) of 32.42. As expected, the largest 3-locus haplotype frequency difference between northern and southern European populations was for haplotype HLA-A^*01-HLA-B^*08-HLA-C^*0701 with frequencies of 0.097 versus 0.034, respectively (−log₁₀(P-value)=4.16; Supplementary Table 2).¹⁷

Table 3 Association tests for Hardy–Weinberg equilibrium (HWE), minor allele frequencies and F_ST for 20 most strongly associated HLA alleles, blocks and haplotypes with north–south status

Full size table

Association tests between the 20 haplotype blocks defined using r2blocks with an r² threshold of 0.50 and the north–south status showed that, in nearly all cases, the haplotype block was more strongly associated than the individual SNPs comprising the block. Of the 20 haplotype blocks, 15 were associated with a north–south status. Six of the significantly associated blocks did not contain any individual SNPs that were significantly associated with north–south status. Of the 9 blocks containing at least 1 significantly associated SNP, 6 showed stronger association with the haplotype block containing that SNP than with all individual SNPs (Supplementary Table 4; Supplementary Figure 11). In fact, of the 20 most strongly associated HLA allele haplotypes, SNPs or block-based haplotypes, four were haplotypes created using our new methodology (Table 3).

KPCA and EIGENSTRAT analyses of single SNPs did not reveal tight clustering by north–south designation (Supplementary Figure 12). Clustering of individuals using BAPS on single SNPs resulted in a best-fitting solution of 11 clusters; an inspection of the proportion of individuals from northern and southern European populations in each cluster revealed a mixture of both; none of the clusters were comprised of purely northern or southern European samples (data not shown). However, BAPS clustering of the six population samples, instead of individuals, led to a best-fitting solution of two clusters exactly matching our northern and southern designations, confirming our a priori clustering of individuals.

Population membership prediction using HLA alleles, SNPs and blocks

We used logistic regression to test the predictive ability of the 20 most significantly associated HLA SNPs by using leave-one-population-out validation to predict north–south population membership on the independent population and on the CEPH population. To avoid multicollinearity induced by LD, we removed predictors that were strongly correlated with other predictors, retaining the predictor that was more strongly associated with north–south status, to reduce the model containing 20 variables to a model containing 7 predictors: rs411136, rs1265160, rs3096702, rs2256328, rs2855453, block 8 and HLA-B^*08/C^*0701. Even using such a limited number of predictors provided a perfect prediction for the Piedmont population and a low prediction error for the Sicily population (8.5%), although the error rates for the Bergamo (13.0%) and Catalan (18.0%) populations were higher (Table 4). The CEPH population prediction error rate was steady across all models at ∼22%; however, given that the genetic background of this population is not clearly defined, this may indicate some evidence for southern European admixture.

Table 4 North–south population membership prediction error for logistic regression models using the most associated SNPs, blocks and HLA-A/B and HLA-B/C haplotypes

Full size table

Discussion

Owing to the time-consuming and expensive process needed to carry out full classic HLA allele typing, we developed a simple strategy to identify inexpensively and easily genotyped SNP combinations that could predict classic HLA gene allele carrier status in a leave-one-population-out cross-validation using logistic regression models. We have also described a new haplotype blocking method and software based on r², which is probably more appropriate than a D’-based method in regions of strong linkage disequilibrium. In addition, we have shown that haplotype blocks created using the new haplotype blocking method, classic HLA gene alleles and neutral HLA region SNPs were useful for the differentiation of northern versus southern European populations, in agreement with previous work,¹⁸ and suggested particular SNPs that may be useful as AIMs.

An examination of the 15 associations in Supplementary Table 1 that show the opposite classic HLA allele-SNP pattern to that observed in the other 304 associations, namely the frequency of individuals carrying the HLA allele but not carrying the minor allele at the SNP is greater than the frequency of individuals not carrying the HLA allele but carrying the minor allele at the SNP, shows that the four SNPs associated with >1 HLA allele are associated with known HLA haplotypes of high frequency in Europeans. rs404240 shows this pattern with HLA-A^*01, HLA-B^*08 and HLA-C^*0701, which is the most frequent 3-locus haplotype in northern Europeans (0.034 in southern Europeans and 0.097 in northern Europeans); two additional SNPs, rs2001009 and rs2249099, also show this pattern with HLA-A^*01 and HLA-C^*0701. rs1800684 shows this pattern with HLA-B^*07 and HLA-C^*0702, a high-frequency haplotype in Europe (0.060 in southern Europeans and 0.16 in northern Europeans). rs404240 and rs1800684 are both synonymous, and rs404240 and rs2249099 are physically proximal to the HLA-A gene, whereas rs2001099 is physically proximal to HLA-DRA. The fact that the opposite pattern is much more frequently observed, namely that, for a given SNP, there are many individuals who do not carry the associated classical HLA allele but do carry the minor SNP allele, implies that most of these SNPs are older than the classical HLA alleles. We searched dbSNP build 36.3 (http://www.ncbi.nlm.nih.gov/projects/SNP/) for presumed ancestral (Pan troglodytes) alleles for these polymorphisms and found 26 out of 166 (15.7%) were in regions that could not be aligned with the chimpanzee sequence (indicated in Supplementary Table 4). Most of these SNPs were unassociated with the north–south status, although 3 of the 26 were strongly associated with the north–south status (rs2256328, rs2857205 and rs2747479). However, of these 26 SNPs, only 6 (23.1%) did not show Bonferroni-corrected association with at least one classic HLA gene allele, a smaller percentage than in the full set of 166 SNPs (31.1%), suggesting the existence of human-specific SNPs that probably arose on a particular HLA allele haplotype background, and which might be more informative of the classic HLA gene allele carrier status. The fact that these data suggest that most SNPs are older than the classic HLA gene alleles, most of which are common to humans and chimpanzees, implies that they are not likely to be good markers for LD-based associations. It seems likely that this is a problem shared by a high proportion of SNPs in the commonly used SNP databases, as the SNPs used in this study were selected only by location. The age of SNPs could account for the emphasis on building haplotype blocks using very high LD thresholds. Only in those cases in which LD is very high and SNPs are very closely linked will such blocking give meaningful results for relatively old SNPs, given that the average rate of decay of LD between two SNPs is 1–r per generation, where r is the recombination fraction between SNPs. For example, for a distance of 1000 bp, corresponding, on an average, to r=10⁻⁵, the LD would decay by a factor of 0.0034 in 500 000 generations, and thus to negligible levels in the separation time between humans and chimpanzees. The SNPs may show associations with more recent variation, as with HLA alleles, but these associations will be incomplete.

We show that, even with highly AIMs, unsupervised clustering algorithms were unable to detect substructure with our limited number of SNPs. Clustering algorithms should be used with caution when genome-wide data are unavailable, even if the SNPs selected are informative of ancestry.

The bimodal distribution of minor allele carriers of sets of SNPs that are strongly associated with classic HLA gene alleles and the use of logistic regression to predict HLA allele carrier status are computationally efficient and simple methods that do not require particular ‘tag SNPs’¹⁹ or previous database-based information,²⁰ and thus may be preferred when no previous data exist on a particular population. Even though our method and previously described methods show relatively high sensitivity and specificity for the prediction of a classic HLA gene allele carrier status, none of the proposed methods, including ours, is as accurate as HLA allele typing. Even with this caveat, our method may be helpful in preselecting a subset of individuals for full classic HLA gene allele typing in disease association studies, thus reducing genotyping time and costs.

Higher-order associations, such as associations between blocks of SNPs, may more accurately describe genetic diversity and historical recombination patterns of a particular region of the genome, and may be helpful for the assignment of classic HLA gene allele carrier status than previously reported approaches.^{17, 19, 20} Future work will develop a new meta-blocking algorithm to carry out higher-level blocking using blocks created by r2blocks, to be used as input in unsupervised and/or supervised clustering algorithms for the detection of population stratification based on the example of the HLA region in closely related populations. This meta-blocking algorithm may also be useful in predicting classic HLA gene allele carrier status.

References

Miretti MM, Walsh EC, Ke X et al: A high-resolution linkage-disequilibrium map of the human major histocompatibility complex and first generation of tag single-nucleotide polymorphisms. Am J Hum Genet 2005; 76: 634–646.
Article CAS PubMed PubMed Central Google Scholar
Shen R, Fan JB, Campbell D et al: High-throughput SNP genotyping on universal bead arrays. Mutat Res 2005; 573: 70–82.
Article CAS PubMed Google Scholar
Tonks S, Marsh S, Bunce M, Bodmer JG : Molecular typing for HLA class I using ARMS-PCR: further development following the 12th International Histocompatibility Workshop. Tissue Antigens 1999; 53: 175–183.
Article CAS PubMed Google Scholar
Bartlett S, Straub J, Tonks S, Wells RS, Bodmer JG, Bodmer WF : Alkaline-mediated differential interaction (AMDI): a simple automatable single-nucleotide polymorphism assay. Proc Natl Acad Sci USA 2001; 98: 2694–2697.
Article CAS PubMed PubMed Central Google Scholar
Barrett JC, Fry B, Maller J, Daly MJ : Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 2005; 21: 263–265.
Article CAS PubMed Google Scholar
Gabriel SB, Schaffner SF, Nguyen H et al: The structure of haplotype blocks in the human genome. Science 2002; 296: 2225–2229.
Article CAS PubMed Google Scholar
Weir BS, Cockerham CC : Estimating F-statistics for the analysis of population structure. Evolution 1984; 38: 1358–1370.
CAS PubMed Google Scholar
Guillot G, Mortier F, Estoup A : Geneland: a program for landscape genetics. Mol Ecol Notes 2005; 5: 1261–1280.
Article Google Scholar
Stephens M, Smith N, Donnelly P : A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 2001; 68: 978–989.
Article CAS PubMed PubMed Central Google Scholar
Stephens M, Donnelly P : A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet 2003; 73: 1162–1169.
Article CAS PubMed PubMed Central Google Scholar
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D : Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006; 38: 904–909.
Article CAS PubMed Google Scholar
Karatzoglou A, Smola A, Hornik K, Zeileis A : kernlab-An S4 Package for Kernel Methods in R. J Stat Soft 2004; 11: 1–20.
Article Google Scholar
Pritchard JK, Stephens M, Donnelly P : Inference of population structure using multilocus genotype data. Genetics 2000; 155: 945–959.
CAS PubMed PubMed Central Google Scholar
Falush D, Stephens M, Pritchard JK : Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 2003; 164: 1567–1587.
CAS PubMed PubMed Central Google Scholar
Corander J, Waldman P, Silanpää MJ : Bayesian analysis of genetic differentiation between populations. Genetics 2003; 163: 367–374.
CAS PubMed PubMed Central Google Scholar
Corander J, Marttinen P, Siren J, Tang J : Enhanced Bayesian modeling in BAPS software for learning genetic structures of populations. BMC Bioinformatics 2008; 9: 539.
Article PubMed PubMed Central Google Scholar
Bodmer JG : The HLA system: The HLA-DR antigens and HLA haplotypes in 2 populations. in Eriksson E (ed): Population Structure and Genetic Disorders. Acad Press: London, 1980, pp 211–238.
Google Scholar
Cavalli-Sforza LL, Menozzi P, Piazza A : The History and Geography of Human Genes. Princeton: Princeton University Press, 1994.
Google Scholar
de Bakker PI, McVean G, Sabeti PC et al: A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nat Genet 2006; 38: 1166–1172.
Article CAS PubMed PubMed Central Google Scholar
Leslie S, Donnelly P, McVean G : A statistical method for predicting classical HLA alleles from SNP data. Am J Hum Genet 2008; 82: 48–56.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The project was funded by the European Union (Linkage Disequilibrium in European Populations, 2001–2005, CT-2001–00916), the Wellcome Trust (support for KKN) and by Cancer Research UK (support for CB and ST). DNA samples were provided by project partners: Prof Howard Cann (Fondation Jean Dausset-CEPH, Paris, France), Prof Laurent Excoffier (Computational and Molecular Population Genetics Lab, Zoological Institute, University of Bern, Switzerland), Prof Antti Sajantila (Department of Forensic Medicine, Laboratory of Forensic Biology, University of Helsinki, Finland), Prof Alberto Piazza (Dipartimento di Genetica, Biologia e Biochimica, Universita di Torino, Italy), Prof Silvana Santachiara (Department of Genetics and Microbiology, University of Pavia, Italy) and Prof Jaume Bertranpetit (Biologia Evolutiva, CEXs, Universitat Pompeu Fabra, Barcelona, Spain).

Author information

Irina Evseeva and Kristin K Nicodemus: These authors contributed equally to this work

Authors and Affiliations

Department of Clinical Pharmacology, Old Road Campus Research Building, University of Oxford, Oxford, UK
Irina Evseeva, Kristin K Nicodemus, Carolina Bonilla, Susan Tonks & Walter F Bodmer
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
Kristin K Nicodemus
Cancer and Immunogenetics Laboratory, Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, Oxford, UK
Walter F Bodmer

Authors

Irina Evseeva
View author publications
You can also search for this author in PubMed Google Scholar
Kristin K Nicodemus
View author publications
You can also search for this author in PubMed Google Scholar
Carolina Bonilla
View author publications
You can also search for this author in PubMed Google Scholar
Susan Tonks
View author publications
You can also search for this author in PubMed Google Scholar
Walter F Bodmer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Walter F Bodmer.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

Supplementary Information accompanies the paper on European Journal of Human Genetics website

Supplementary information

Supplementary Figure 1 (PDF 3 kb)

Supplementary Figure 2 (PDF 6 kb)

Supplementary Figure 3 (PDF 4 kb)

Supplementary Figure 4 (PDF 2 kb)

Supplementary Figure 5 (PDF 4 kb)

Supplementary Figure 6 (PDF 3 kb)

Supplementary Figure 7 (PDF 4 kb)

Supplementary Figure 8 (PDF 6 kb)

Supplementary Figure 9 (PDF 4 kb)

Supplementary Figure 10 (PDF 3 kb)

Supplementary Figure 11 (PDF 108 kb)

Supplementary Figure 12 (JPG 398 kb)

Supplementary Table S1 (XLS 50 kb)

Supplementary Table S2 (DOC 34 kb)

Supplementary Table S3 (DOC 34 kb)

Supplementary Table S4 (DOC 335 kb)

Supplementary Table S5 (DOC 29 kb)

Supplementary Information (DOC 29 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Evseeva, I., Nicodemus, K., Bonilla, C. et al. Linkage disequilibrium and age of HLA region SNPs in relation to classic HLA gene alleles within Europe. Eur J Hum Genet 18, 924–932 (2010). https://doi.org/10.1038/ejhg.2010.32

Download citation

Received: 21 October 2009
Revised: 02 February 2010
Accepted: 11 February 2010
Published: 31 March 2010
Issue Date: August 2010
DOI: https://doi.org/10.1038/ejhg.2010.32

Subjects

Abstract

Similar content being viewed by others

Main

Materials and methods

Populations sampled

SNP selection

Genotyping

Statistical methods

Prediction of classic HLA alleles by SNPs

r2blocks: a haplotype blocking algorithm based on r2

Population differentiation: HWE, FST, association, prediction and clustering

Results

Association of classic HLA alleles by genotyped SNPs

Comparison of haplotype blocking algorithms

HWE, FST, association tests and clustering of markers/haplotypes, and northern–southern European status

Population membership prediction using HLA alleles, SNPs and blocks

Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links

r2blocks: a haplotype blocking algorithm based on r²

Population differentiation: HWE, F_ST, association, prediction and clustering

HWE, F_ST, association tests and clustering of markers/haplotypes, and northern–southern European status