Introduction

The extent of linkage disequilibrium (LD) in the human genome provides valuable information on the identification of functional polymorphisms predisposing to human diseases. Because there are a limited number of functional polymorphisms across the genome, the positional cloning of genes for disease susceptibility depends on the polymorphisms that are in LD with functional polymorphism. Thus, the information on LD in a population is critical for the design of association mapping studies. Previous studies of LD patterns in the human genome have shown that LD appeared to vary among populations, chromosomal regions, and between pairs of markers in close proximity (Taillon-Miller et al. 2000; Abecasis et al. 2001; Reich et al. 2001).

Recent studies of human haplotype structures have demonstrated that the genome is comprised of regions of strong intermarker LD, or haplotype blocks, interspersed by presumed recombination hot spots (Taillon-Miller et al. 2000; Daly et al. 2001; Jeffreys et al. 2001; Patil et al. 2001; Dawson et al. 2002; Gabriel et al. 2002; Phillips et al. 2003). Haplotype blocks are characterized by low haplotype diversity, strong associations between alleles, and rare recombination. Analyses of the gene-based LD and haplotype patterns showed that the extent of LD and haplotype diversity varied depending on genes (Nickerson et al. 1998; Johnson et al. 2001; Tiret et al. 2002). Similar studies in extended regions also showed highly variable patterns of LD depending on a region (Daly et al. 2001; Jeffreys et al. 2001; Gabriel et al. 2002). A few studies have attempted to analyze the heterogeneous distribution of LD along a chromosome (Patil et al. 2001; Dawson et al. 2002; Phillips et al. 2003). Patil et al. (2001) analyzed the haplotype patterns along the entire chromosome 21 using rodent–human hybrid cell lines derived from 20 ethnically diverse individuals. They identified haplotype blocks on chromosome 21 for which >80% of chromosomes were represented by a few common haplotypes. In the analysis of human chromosome 22 with a marker density of one SNP per 15 kb, Dawson (2002) reported a highly variable pattern of LD along the chromosome, in which extensive regions of complete LD up to 804 kb in length were interspersed with regions of no detectable LD.

The aforementioned studies demonstrated the complexity of the LD and human haplotype structure. Although differences of LD patterns between populations and ethnic groups have been reported (Abecasis et al. 2001; Reich et al. 2001; Zavattari et al. 2000), little information is available on the haplotype structure in different populations other than the recent study by Gabriel et al. (2002). As a first step toward a better understanding of the haplotype differences between such populations with strong genetic affinities as Koreans and Japanese, we conducted a comparative study of the haplotype structure and LD for a 418-kb region on human chromosome 1p36.2. We chose this region based on the following facts: (1) it has been suggested to be one of the candidate regions for systemic lupus erythematosus (SLE) by genome-wide linkage analyses (Shai et al. 1999; Gaffney et al. 2000). In fact, one of the genes in the region, TNFR2, showed significant association with susceptibility to SLE (Komata et al. 1999); (2) TNFR2 has also been reported to associate with various diseases, including hypertension and hypercholesterolemia (Glenn et al. 2000), familial combined hyperlipidemia (Geurts et al. 2000; van Greevenbroek et al. 2000), Alzheimer’s disease (Perry et al. 2001), and rheumatoid arthritis (RA) (Barton et al. 2001; Fabris and Tolusso 2002; Kyogoku et al. 2003).

Materials and methods

Samples

Genomic DNAs from 96 healthy Korean individuals were isolated from peripheral blood leukocytes according to standard procedures with proteinase K-RNase digestion followed by phenol–chloroform extraction. Genomic DNAs from 96 unrelated, healthy Japanese were isolated from peripheral blood leukocytes using the QIAamp blood kit (Qiagen, Hilden, Germany). The present studies were reviewed and approved by the ethics committees of University of Ulsan College of Medicine and University of Tokyo, Graduate School of Medicine.

In order to minimize the required amount of genomic DNA, a whole genome amplification (WGA) method was applied to Japanese samples in the study. Primer extension preamplification (I-PEP)-PCR method by Dietmaier et al. (1999) was used for the WGA. Briefly, 20 ng of gDNA was amplified in the reaction mixture consisting of 3.6 U of a mixture of Taq polymerase and Pwo polymerase (Expand High Fidelity PCR System, Boehringer Mannheim, Japan), 16 μM totally degenerated 15-nucleotide-long primer (Sygma Genosys, Japan), 0.1 mM dNTP, and 2.5 mM MgCl2.

SNP screening

One hundred nineteen SNP sites registered in a 418-kb region of 1p36.2 were selected from dbSNP (http://www.ncbi.nlm.nih.gov/SNP) and JSNP (http://snp.ims.u-tokyo.ac.jp) databases. The occurrence of polymorphisms was assessed using 16 unrelated healthy Japanese samples by direct sequencing (ABI 3100 sequencer, Applied Biosystems, Japan). Among the 119 sites selected from the public databases, at least one heterozygous sample was observed for 79 SNP sites, among which 58 SNPs were selected for genotyping. A complete list of the 58 SNPs, their reference sequence numbers, positional information, minor allele frequencies in both populations, and the list of assay primers will be available upon request.

Typing method

Japanese samples were genotyped using the PCR-SSP-FCS method, as described elsewhere (Bannai et al. 2004). Briefly, the first PCR was performed to amplify a fragment including an SNP site using 1 μl of WGA products as templates. Then a sequence-specific primer (SSP)-PCR was performed with allele-specific, seminested fluorescence- (TAMRA or Cy5) labeled primers using the first PCR products as templates. The SSP-PCR products were then analyzed by fluorescence correlation spectroscopy (FCS) measurement using the single-molecule fluorescence detection system (MF10, Olympus Corporation, Japan). The primers used for the first PCR and SSP-PCR will be available upon request. Korean samples were genotyped by direct sequencing.

Statistical analysis

The genotype frequencies for each SNP were checked for consistency between the observed values and those expected from Hardy–Weinberg equilibrium using a commercial program SNP Alyze V 3.0 (Dynacom Co., Japan) in each population, respectively. Haploview version 2.05, as well as the SNP Alyze software package based on the expectation-maximization (EM) method (Excoffier and Slatkin 1995), were used to estimate the haplotype frequencies, the Lewontin’s coefficients D′ (Lewontin 1988), and correlation coefficient r (Hill and Robertson 1968). The block structures and their haplotype frequencies were estimated using Haploview version 2.05. Calculation of the gene-average LD was done as described (Tiret et al. 2002). Briefly, the gene-based median D′ and r2 were calculated by averaging the absolute value k(k−1)/2 pair-wise disequilibrium coefficients D′ and r2 (k means number of biallelic polymorphisms within a gene). The SNP Alyze program was used to assess the statistical significance of haplotype profile differences and individual haplotype frequency differences between the two populations.

Results

Patterns of linkage disequilibrium

We studied the extent of LD by choosing a region of 418 kb (12.1–11.7 Mb, build 34) on chromosome 1p36.2. Total 119 SNPs were chosen from public databases (dbSNP, JSNP). Initial screening was performed in 16 healthy Japanese individuals. Out of 119 SNPs, 79 were found to be polymorphic in these samples. To reduce the typing cost, 60 SNPs (designated C1–C64) were selected from 79 polymorphic SNPs based on the following criteria: (1) one of the two markers in close proximity; (2) genotype frequency distribution in agreement with Hardy–Weinberg’s equilibrium (P>0.05). Then the selected 60 SNPs were genotyped in 96 unrelated Korean and Japanese individuals each. Of 60 SNPs, two were nonpolymorphic in the Korean samples. Of 58 SNPs, 42 and 40 were polymorphic with minor allele frequency >10% in the Korean and Japanese samples, respectively, and the minor allele frequencies of SNPs C19 and C23 were less than 10% in the Japanese only. The 42 markers showed an average spacing of 10 kb; however, they were not evenly spaced (Fig. 1). The SNPs with minor allele frequency <0.10 were omitted from the LD analysis.

Fig. 1
figure 1

The location of SNPs examined and pairwise D′ between SNPs with minor allele frequency of >10%

To investigate patterns of LD in the region, two measures of LD (D′ and r2) were estimated between all pair-wise combinations of markers in each population. Lewontin’s standardized coefficient D′=1 indicates absolute LD when one or two haplotypes are missing (Lewontin 1988) whereas r=1 indicates absolute LD when two of four haplotypes are missing (Hill and Robertson 1968). The D′ value for pairs of 42 SNPs in both population samples were shown in Fig. 1. Upon the pair-wise LD analysis, the LD strength varied considerably depending on the regions.

We have assessed the pattern of LD by calculating average D′ and r2 in successive segments of an average of 74.6 kb (11.9 kb overlap) (Fig. 2). The general patterns of LD appeared similar between the two populations except in the region near TNFR2 (TNFRSF1B). A high LD was observed in both populations across the three genes toward the telomeric region. In contrast, in the TNFR2 (TNFRSF1B) region, a high LD was observed within the gene and in Koreans only.

Fig. 2
figure 2

Distribution of linkage disequilibrium on chromosome 1p36.2. The solid and dotted lines present sliding-window plots of average D′ and r2, respectively, in mean interval of 74.6 kb with 11.9 kb overlap for markers with minor allele frequency ≥0.10

We adapted the criteria of an LD block as any series of two or more markers in a contig for which all pair-wise values of D′>0.8 on average with 95% confidence bound between 0.7 and 0.98 (Gabriel et al. 2002). Using the Haploview program, two common haplotype blocks were identified for a 418-kb region of chromosome 1p36.2 in both population samples and spanned the intervals from 8.3 kb to 47.1 kb in length. As shown in Fig. 3, a block of very strong LD (D′>0.8) was observed in the Korean sample, C8–C10–C11–C13–C14–C15–C16, spanning 13.4 kb from introns 7 to 1 of the TNFR2 (TNFRSF1B) gene. This block was smaller in the Japanese sample, C10–C11–C13–C14–C15, spanning 8.3 kb from exon 6 to intron 1 of the TNFR2 (TNFRSF1B) gene. The second block of strong LD in the Japanese sample was the longest observed in both populations, C41–C42–C43–C44–C45–C46–C47–C49–C52–C53–C54, ranging from intron 11 of the MFN2 gene to intron 6 of PLOD gene, 47.1 kb. In the Korean sample, this block was divided into two blocks, C40–C41–C42–C43–C44–C45–C46–C47–C49 and C52–C53–C54, 35.2 kb and 3.2 kb, respectively. Three additional blocks consisting of two markers were identified in the Korean sample only and spanned the interval from 5.9 kb to 14.7 kb in length.

Fig. 3
figure 3

Haplotype distribution in each LD block. Tag SNPs for Koreans and Japanese were shown in stars and diamonds, respectively. Difference in the haplotype profile between the populations was assessed through permutation test

On the other hand, slightly different block structures were identified when the 24 and 26 SNPs with minor allele frequency ≥0.2 in Koreans and Japanese, respectively, was used to aid identification of LD blocks (data not shown). Of note was a long LD block consisting of 14 markers, C41–C42–C43–C44–C45–C46–C47–C49–C52–C53–C54–C61–C63–C64, observed in the Japanese sample across the 80-kb region from intron 11 of the MFN2 gene to intron 2 of MGC33867, including three genes toward the telomeric region. In the Korean sample, three blocks, C31–C38, C40–C41–C42–C43–C44–C45–C46–C47–C49, C52–C53–C54–C61–C63–C64, of 8.3, 35.2, and 35.8 kb, respectively, were identified in the telomeric region.

Genotype and haplotype diversity

To investigate the haplotype diversity within each block, haplotypes were inferred based on unphased SNP data using the Haploview program (Fig. 3). The same program was used to identify tag SNPs. To avoid problems associated with the differences in block boundaries, we chose to include the markers common to both population samples in the two blocks. In block 1 (C10–C11–C13–C14–C15), of the two (frequency ≥5%) haplotypes predicted, they accounted for 99.5% and 96.2% of all those seen in the Korean and Japanese samples, respectively (Fig. 3). And C10 and two SNPs (C11 and C14) were sufficient to represent the two haplotypes in the Korean and Japanese samples, respectively. In block 2 (C41–C42–C43–C44–C45–C46–C47–C49), four common haplotypes accounted for 93.8% and 88.2% of the Korean and Japanese samples, respectively. The order of the most and second-most frequent haplotypes in the Japanese samples were reversed in the Korean samples. Four SNPs, C41, C42, C43, and C45 for Korean and C41, C42, C46, and C47 for Japanese, would be sufficient enough to define the four most common haplotypes. As shown in Fig. 3, block 2 showed statistically significant differences in the haplotype profile between the two populations based on 1,000 permutations (P=0.015).

Regardless of the criteria of LD blocks, if genes happened to be in the region of poor LD, it becomes necessary to analyze gene-based haplotypes. We tried to analyze gene-based haplotypes of the six genes, including TNFR2 (TNFRSF1B), CD30 (TNFRSF8), FLJ12438, MFN2, PLOD, and MGC33867. Median D′ and r2 for each gene were shown in Table 1. As was expected from the analysis of LD patterns (Figs. 1, 2), CD30 showed poor median LD values. For CD30, seven haplotypes comprised of nine SNPs were identified, with the most frequent one of about 14% (Table 2). On the other hand, for MFN2, where all eight SNP markers were in strong LD, the most frequent haplotype showed an average of 38% in the two populations. The median LD values for TNFR2 (TNFRSF1B) were the second lowest (Table 1). TNFR2 (TNFRSF1B) also showed the largest differences in median LD values between the populations. SNP Alyze software package was used to assess the statistical significance of haplotype profile differences between the samples. As shown in Table 2, TNFR2 (TNFRSF1B) alone showed statistically significant differences in the haplotype distribution between the Japanese and Korean samples based on 1,000 permutations (P=0.002). A few haplotypes, including one from TNFR2, two from CD30, and one from FLJ12438, showed statistically significant differences in the comparison of individual haplotypes between the two populations (Table 2).

Table 1 Median D′ and r2 values for each gene
Table 2 Distribution of gene-based haplotypes

Discussion

Recent studies have shown that the genome is broken into blocks of strong haplotype structure separated by shorter regions of shattered haplotype structure. So far, only a few studies have attempted to analyze haplotype structures in different populations (Gabriel et al. 2002; Shifman et al. 2003; Stenzel et al. 2004). As the International Haplotype Map (HapMap) Project, including the Japanese and Han Chinese is ongoing, it is to our interest whether such a map could be useful for the Korean population as well. One of the potential problems with our study would be ascertainment bias in SNP frequency because all the SNPs typed have been chosen based on 16 Japanese samples. Another problem would be that the accuracy of the haplotype inference method used in this study is not known.

Various analytical approaches were available to define haplotype blocks, and a block definition used in the present study was rather stringent (Gabriel et al. 2002). In the present study encompassing the 400-kb region of chromosome 1p36.2, we were able to identify from two to six blocks, depending on the criteria of a LD block, spanning the interval from 3.2 kb to 47.1 kb in length. The boundaries for the two major blocks obtained with SNPs of a minor allele frequency of 0.10 varied between these closely related populations. Because of the differences in LD strength and block boundaries, different tag SNPs were identified between the two populations for each block. Furthermore, haplotype profiles of the second block showed statistically significant differences between the populations. When the SNPs of a minor allele frequency threshold of 0.20 instead of 0.10 were included in an LD block, we were able to identify only one long block toward the telomeric region in the Japanese samples whereas three blocks were identified in the Korean samples spanning the same and neighboring regions.

Regardless of the criteria of an LD block, it appeared that most cores of haplotype blocks in the Korean samples coincided with those in the Japanese. Although it still remains to be seen whether the populations used in this study reflect the divergence of both populations, our data of considerable similarities in LD strength and the kinds of haplotype appear to reflect strong genetic affinities between the two populations (Tokunaga et al. 1996).

For the genes in the region where LD block structure cannot be identified, it becomes necessary to analyze gene-based haplotypes. Median LD value was the lowest for CD30 (TNFRSF8) in both populations, and the most frequent haplotype showed a population frequency of about 14% compared to 50% for FLJ12438 with 11 markers. The ethnic difference in median LD values was the largest for TNFR2 (TNFRSF1B) (Table 1). Median LD was higher in Koreans than in Japanese. This was the region that showed significant differences in LD strength between the populations (Fig. 2). However, this difference in LD was mainly due to the markers C7, C8, C16, C17, and C18. As shown in Fig. 3, in block 1 (from exon 6 to intron 1 of the TNFR2 (TNFRSF1B) gene, C10–C11–C13–C14–C15), it seemed that there was no significant difference in haplotype profiles between the two populations. However, when all the markers of the TNFR2 (TNFRSF1B) gene were included, the difference in LD was reflected in the significance level of differences in the TNFR2 haplotype profile between the populations (P=0.002). This data suggested that if a candidate gene such as TNFR2 were used for an association study in Japanese, it could be hard to obtain reproducible data in Koreans.

Taken together, the HapMap might facilitate comprehensive genetic association studies of human diseases for some regions where significant similarities in LD and haplotype structure were present between the populations of interest and those used in the construction of HapMap. As reports on the population differences in LD pattern across the human genome demonstrated that chromosome region-specific effects appear more important than population-specific effects in influencing the extent of LD (Zavattari et al. 2000), further studies on various regions across the human genome are needed to address the population differences in block boundaries and haplotype frequencies.