Introduction

The structure of linkage disequilibrium (LD) across the human genome provides valuable information for the identification and localization of genetic variants predisposing to common disease by association studies. Studies of LD have revealed that most of the human genome is composed of regions with strong LD interspersed with boundaries of low LD (Abecasis et al. 2001; Daly et al. 2001; Gabriel et al. 2002). The International Haplotype Mapping Project (HapMap) was launched in 2002 to genotype at least one common single nucleotide polymorphism (SNP) for every 5 kb in four ethnically diverse populations: Yoruba in Ibadan, Nigeria (YRI); Utah residents with ancestry from northern and western Europe (CEU); Han-Chinese individuals in Beijing (CHB); and Japanese individuals in Tokyo (JPT) (The International HapMap Consortium 2003). The LD information provided by the HapMap project is useful for conducting an association study in a population that has a strong genetic affinity to one of the HapMap populations. In such cases, “tagging” SNPs, which uniquely identify common haplotypes within the haplotype block in the population to be studied, can be selected based on the information from the reference HapMap population. It has been suggested that the HapMap population can serve as a reference population for the selection of tagging markers in many populations (de Bakker et al. 2006). However, since the structure of LD varies among populations (Dunning et al. 2000; Reich et al. 2001), it is desirable to examine, prior to an association study, whether tagging SNPs from the reference HapMap population can work in the study population. Although the SNP genotype data have been made available for Northeast Asian populations (i.e., CHB and JPT), it is not clear whether the LD information from CHB and JPT is also useful for other Asian populations, such as Southeast Asians.

The human 5q31-33 region contains many loci related to the immune response (IL3, CSF2, IRF1, IL5, IL13, IL4), and a number of studies have been conducted to assess whether polymorphisms in this region are associated with autoimmune and infectious diseases (Heinzmann et al. 2000; Ohashi et al. 2003). In addition, the SLC22A4-5 (OCTN1–2) genes on 5q31, which code for organic cation transporters, were recently reported to be associated with inflammatory bowel disease (Peltekova et al. 2004) and rheumatoid arthritis (Tokuhiro et al. 2003). Although the structure of LD in the 5q31 region has been studied in several populations (Daly et al. 2001; Sakagami et al. 2004), no systematic survey with dense SNP markers has been performed in Southeast Asian populations.

The aims of this study were (1) to describe the LD and haplotype structures for a 472 kb region on human chromosome 5q31 in a Thai population, and (2) to evaluate the transferability of tagging SNPs selected from CHB and JPT HapMap populations for Thais. The results will provide important information for association studies to find medically relevant variations, not only in Thais, but also in other Southeast Asian populations.

Materials and methods

Population sample

Blood samples were obtained from 96 healthy Thai individuals living in Bangkok, Thailand. Genomic DNA was extracted from peripheral blood leukocytes by using the QIAamp Blood Mini Kit (Qiagen, Venlo, The Netherlands) in accordance with the manufacturer’s instructions. This study was approved by the institutional review board of the Faculty of Tropical Medicine, Mahidol University, and the Research Ethics Committee of the Faculty of Medicine, The University of Tokyo. Informed consent was obtained from all participants. The allele and genotype information of HapMap populations were retrieved from the HapMap database (http://www.hapmap.org).

SNP typing

A total of 77 SNPs within a 522-kb region on human chromosome 5q31-33 were genotyped for 96 Thai subjects using the DigiTag2 Assay (Nishida et al. 2007). The genotype call was determined using the SNPStar software (version 0.0.0.8, Olympus, Tokyo, Japan). The seventy-seven SNPs analyzed in this study are presented in Table 1.

Table 1 Minor allele frequencies of 77 SNPs in Thai and HapMap populations

Statistical analysis

The genotype frequencies for each SNP were checked for consistency between observed and expected values under Hardy–Weinberg Equilibrium (HWE) using a chi-square test. A P value of less than 0.05 was considered to be statistically significant in this study. A pairwise F st value for each SNP was calculated based on the following formula (Wright 1951): F st = (H A+B − [H A + H B]/2)/H A+B, where H A = 2p A(1 − p A), H B = 2p B(1 − p B), and =2([p A + p B]/2)(1 − [p A + p B]/2). Here, p A and p B represent the sample frequencies of the derived allele for each SNP in populations A and B, respectively. The population structure of the Thai population together with African (YRI), European (CEU), and Northeast Asian (CHB and JPT) populations was investigated by the STRUCTURE software, version 2.2 (Pritchard et al. 2000). In the STRUCTURE analysis, ten runs were performed at each K (from 3 to 5) under the linkage model with a burn-in of 10,000 iterations and a run-length of 10,000 iterations following the burn-in. The LD statistic (r 2) was calculated and visualized using Haploview software version 3.32 (Barrett et al. 2005). The tagging SNPs were selected using an aggressive pairwise tagging approach by the TAGGER program available in Haploview software.

Results

F st between Thai and HapMap populations

Minor allele frequencies (MAFs) of 77 SNPs on 5q31-33 in 96 Thai individuals are presented in Table 1 (the individual genotype data are available upon request). Of these SNPs, the genotype information of 50 SNPs was available for all four HapMap populations. Among the 50 SNPs, rs757537 was not in HWE in the Thai population, and thus 49 SNPs spanning 472 kb were analyzed in the following analyses.

The allele frequency was very similar between Thais and CHB or JPT in 49 SNPs, but not between Thais and CEU or YRI (Table 1). The pairwise F st values between Thais and HapMap populations are shown in Fig. 1. Low pairwise F st values were observed between Thais and CHB or JPT, whereas high values were observed between Thais and CEU or YRI. The mean F st values for 49 SNPs were 0.0042 between Thais and CHB, and 0.0096 between Thais and JPT. These results imply that the allele frequency in Southeast Asians such as Thais can be roughly inferred from the HapMap data of CHB and JPT, and thus the HapMap database is helpful for selecting SNP markers with high MAF for association studies in Southeast Asians.

Fig. 1
figure 1

F st values for 49 SNPs between Thai and HapMap populations

Population structure

To examine whether the genomic structure of 5q31 in Thais (Southeast Asians) was similar to that in CHB + JPT (Northeast Asians), we further applied a model-based clustering algorithm implemented in the STRUCTURE software. The algorithm placed the individuals of a predefined population into K clusters under the linkage model. Each individual had proportional membership in multiple clusters, the membership being defined as a coefficient summing to 1 across clusters. Based on the genotype data of 49 SNPs in five predefined populations, the best explanation for the population genetic structure was obtained assuming K = 5 under the linkage model (data not shown). Among five predefined populations, individuals had partial membership in five estimated clusters. In the three Asian populations (Thai, CHB, and JPT), similar patterns of genetic structure were observed (Fig. 2).

Fig. 2
figure 2

Estimated population structures across five populations. Each individual is represented by a thin vertical line, which is located in five clustered segments that represent the individual’s estimated membership fraction in each cluster. The representative result for ten runs is shown

LD structure of 5q31

Figure 3 shows the LD structure, as measured by r 2, among 49 SNPs in a 472-kb region on 5q31-33. There was no significant difference in the pattern of LD between the two Asian populations. Based on the default definition of confidence intervals (Gabriel et al. 2002) in the Haploview software, four and six haplotype blocks were identified across the 472 kb region in CHB + JPT and Thais, respectively. Block 1, spanning 121 kb, partially encompassed the SLC22A4, SLC22A5 and LOC441108 genes. Block 2, spanning 75 kb, encompassed the RAD50 gene.

Fig. 3A–C
figure 3

LD structure of chromosome 5q31-33 in CHB + JPT (Northeast Asians) and Thais (Southeast Asians). A A map of 49 SNPs. B Pairwise r 2 in CHB + JPT. C Pairwise r 2 in Thais. White, shades of gray, and black squares indicate no LD (r 2 = 0), intermediate LD (0 < r 2 < 1), and strong LD (r 2 = 1), respectively. The haplotype blocks indicated by bold black lines were defined via the default definition of confidence intervals (Gabriel et al. 2002) in Haploview software

In block 1, the same four common haplotypes (population frequency ≥0.05) were observed in Thais and in CHB + JPT. In addition, the same two common haplotypes were observed in block 2. In YRI, four common haplotypes were observed for the 26 SNPs in block 1 of the Asian populations, and only one of these haplotypes was found in the Asian populations. For block 2, containing seven SNPs, five common haplotypes were observed in YRI, and two haplotypes present in the Asian populations were included. Interestingly, the structure of LD and the haplotypes in CEU were found to be similar to those of the Asian populations (data not shown), regardless of the difference in allele frequency at each SNP. These observations imply that few historical recombination events have occurred in these regions after the ancestors of Asian and European populations diverged from the ancestors of Africans.

Transferability of tagging SNPs

We tested the transferability of tagging SNPs obtained from Northeast Asians (i.e., CHB + JPT), and then applied them to Thais. Since the Thai population showed genetic similarity to CHB and JPT, both populations were considered here. Out of 49 SNPs, 13 SNPs were selected as tagging SNPs based on the HapMap data of CHB + JPT. Among 36 non-tagging SNPs in Thais, 32 SNPs were found to be captured with r 2 ≥ 0.8 by the 13 tagging SNPs. The mean maximum r 2 of these 32 SNPs was 0.979. The remaining four untaggable SNPs showed a mean maximum r 2 of 0.679. From these results, we can say that the tagging SNPs obtained from CHB + JPT are highly transferable to the Thai population.

Discussion

Both pairwise F st and STRUCTURE analyses revealed genetic similarities between Southeast Asian (Thais) and Northeast Asian (CHB + JPT) populations. This result is in accordance with that reported in a previous study based on autosomal microsatellite markers (Rolf et al. 1998). In addition, the structure of LD on chromosome 5q31-33 and haplotypes in the LD blocks were also similar between Thais and CHB + JPT. Since the number of LD blocks is smaller in CHB + JPT than in Thais, and the average size of LD blocks is larger in CHB + JPT than in Thais, the genetic diversity in Northeast Asians is thought to be smaller than in Southeast Asians. This may reflect the effects of a bottleneck that occurred in the ancestors of Northeast Asians. Compared to the time of divergence between European and Asian ancestors, the divergence between Northeast Asian and Southeast Asian ancestors appears to have occurred more recently.

Our results indicate a high transferability of tagging SNPs on 5q31-33 from CHB + JPT to Thais. High transferability has also been reported for drug-related genes (Mahasirimongkol et al. 2006). We therefore conclude that tagging SNPs selected from CHB + JPT captures common variants well, at least in Thais and probably in Southeast Asians as a whole, and that the genotype information of CHB + JPT is very useful for association studies in Southeast Asians. A number of studies have also attempted to measure the transferability of tagging SNPs in non-HapMap populations, including Korean (Lim et al. 2006; Yoo et al. 2006), Spanish (Ribas et al. 2006), Estonian (Montpetit et al. 2006), and 12 population isolates from Europe (Service et al. 2007). Together with the present study, we conclude that tagging SNPs selected from the HapMap populations are highly portable to other populations from the same continent.