Introduction

Microsatellite markers are widely used to search for disease-associated genes by linkage analysis and for diversity studies in human genetics (Schellenberg et al. 1992; Bowcock et al. 1994; Thomson et al. 1999). Because microsatellite markers typically have greater than ten alleles, they are more polymorphic and more informative than SNP markers, which are usually biallelic. For use in linkage analyses, a locus should be highly polymorphic so that the alleles inherited from each parent can be distinguished from one another. The use of multiallelic markers offers greater statistical power, allowing detection of linkage disequilibrium and genetic linkage by way of parametric analysis of large families or nonparametric analysis of sibling pairs (Ott and Rabinowitz 1997; Tan et al. 2002).

As the use of microsatellite markers for linkage mapping becomes more widespread, primer sets for genome-wide screening or chromosome-specific mapping have become commercially available. Markers have been optimized for multiplex PCR and fragment analysis, and technology for analyzing microsatellite polymorphisms has been automated for high-throughput screening. Genotype information, including allelic frequency and heterozygosity, is available through various databases, but these data have been derived primarily from Caucasian samples. Microsatellite markers are generally selected based on their heterozygosity and recombination frequency in Caucasians and thus may not be suitable for genetic analyses of other ethnic groups. Indeed, significant differences have been identified in the allelic distribution of microsatellite markers in Caucasian and Asian populations (Yamane-Tanaka et al. 1998; Ikari et al. 2001; Mizutani et al. 2001; Tan et al. 2002; Lu et al. 2004). Because allelic information differs between ethnic groups and certainly influences the statistical analysis, it is problematic to use information from other populations to study disease susceptibility genes in the Korean population.

In the present study, we report heterozygosity and allelic information for 400 dinucleotide repeat markers in 578 Korean subjects. The data are less informative compared with data compiled for the Caucasian population, but the data should be useful for genetic linkage studies in Koreans and other populations.

Materials and methods

Subjects and DNA preparation

Blood samples were collected from 578 sibs in 249 Korean families. Of these, 171 families had type-2 diabetes and 78 families had hypertension. We obtained written informed consent from all subjects. DNA was isolated from blood samples using the QIAamp DNA blood kit (Qiagen, Valencia, CA) according to the manufacturer’s instructions.

Microsatellite marker genotyping

Genotyping was performed using the ABI PRISM Linkage Mapping Set MD-10 (Applied Biosystems), which is comprised of 400 microsatellite markers. All markers were dinucleotide repeats. PCR amplification using a fluorescently labeled forward primer and tailing reverse primer for each marker (95°C for 2 min, then 35 cycles at 95°C for 20 s, 58°C for 40 s, 72°C for 30 s, and 72°C for 45 min) was performed in a 384-well format in a total of 5 µl. Electrophoresis was performed on ABI 3730 Automated Sequencers (Applied Biosystems) using a standard protocol. The use of GeneScan 500 Liz (Applied Biosystems) as the internal size standard assisted in polymorphic fragment length calling and allowed more accurate allele calling and unambiguous comparison of data across experimental conditions.

Genotypes were initially scored using GENEMAPPER 3.7 software (Applied Biosystems) and reviewed independently to confirm the accuracy of allele calling. All genotyped markers were checked for incompatibilities using MARKERINFO from the SAGE package (version 5.2) (Elston 1992). Genotypes of the CEPH standard 1347-02 were used for quality control.

Statistical analysis

Allelic frequencies for each marker were computed using RECODE (ftp://www.linkage.rockefeller.edu/softwair/linkage) and FREQ from the SAGE package. Heterozygosity of each marker was determined in 249 unrelated individuals and 578 sib individuals. Heterozygosity of markers was assessed by comparing the expected value against the observed number of heterozygotes using the Fisher exact test (SAS software). The polymorphism information content (PIC) was also calculated using SAS software. Marker information for Caucasians, Japanese and Taiwanese was obtained from the CEPH database (http://www.cephb.fr) and other published data (Ikari et al. 2001; Lu et al. 2004). GENIBD from the SAGE package was used to estimate the mean ratio of IBD alleles among sib pairs at each autosomal marker.

Results

Distribution of allele frequencies in Koreans

We genotyped 400 markers from 578 individuals. All markers could be amplified. Individuals with ambiguous genotypes were not included in the analysis. Hardy–Weinberg equilibrium was evaluated using the Fisher exact test; 23 of the 400 loci did not meet the criteria for Hardy–Weinberg equilibrium (P < 0.01).

The frequency was calculated for each autosomal and X chromosome marker using 249 unrelated individuals and 188 unrelated females, respectively. Table 1 summarizes the size range and number of each allele in Koreans. The number of alleles ranged from 4 (D4S1575) to 25 (D4S402). The average number of alleles was 11.26 (SD ± 3.21).

Table 1 Allele information and heterozygosities for 400 microsatellite markers in the Korean population compared with Taiwanese and Caucasians

Heterozygosity varies between populations

Heterozygosities for the 400 marker loci in Koreans (this study), Taiwanese and Caucasians are summarized in Table 1. Heterozygosity ranged from 0.19 (D4S1575) to 0.93 (D15S205), with an average of 0.72 (SD ± 0.12), for Koreans and ranged from 0.21 (DXS8055) to 0.93 (D7S636), with an average of 0.73 (SD ± 0.12), for Taiwanese. Fifty-two markers (13%) had a heterozygosity of <0.6 in the Korean population, and 48 markers (12.03%) had a heterozygosity of <0.6 in the Taiwanese population. Heterozygosity in the Caucasians ranged from 0.58 (D22S539) to 0.91 (D14S68), with an average of 0.79 (SD ± 0.06). Most markers (99.5%) had a heterozygosity of ≥0.6 in the Caucasian population. Overall, heterozygosity in the Korean population shared greater similarity with the Taiwanese population than with the Caucasian population.

A detailed comparison of heterozygosity in the four populations is shown in Fig. 1. Four hundred ABI MD-10 markers were analyzed for Koreans (this study), Japanese (Ikari et al. 2001) and Caucasians (Ikari et al. 2001; http://www.cephb.fr); 399 ABI MD-10 markers (all except D3S1311) were analyzed for Taiwanese (Lu et al. 2004). At a level of heterozygosity ranging from 0.7 to 0.8, frequencies of the markers in the four populations were similar. At other levels of heterozygosity, however, frequencies varied among the different populations. Heterozygosities of markers from Korean, Taiwanese and Japanese populations differed considerably from Caucasians. The Asian populations had similar frequencies at heterozygosity levels of <0.5, 0.5–0.6 and 0.7–0.8, but Koreans differed slightly from Taiwanese and Japanese at heterozygosity levels between 0.6 and 0.7 and ≥0.8.

Fig. 1
figure 1

Distribution of heterozygosities for 400 microsatellite markers in Korean, Taiwanese (Lu et al. 2004, 399 markers), Japanese (Ikari et al. 2001) and Caucasian (Ikari et al. 2001; http://www.cephb.fr) populations

Distribution of heterozygosities relative to mean IBD values

To confirm the frequency of each level of heterozygosity with a maximum number of individuals, we analyzed the distribution of heterozygosity with 578 sib individuals from 249 families. For autosomal markers, 578 individuals were used, and for X chromosome markers, 326 females were used. The heterozygosity distribution of 249 unrelated individuals and 578 sib individuals is shown in Fig. 2. There were no differences at any level of heterozygosity (P = 0.93).

Fig. 2
figure 2

Distribution of heterozygosities for 400 microsatellite markers in 249 unrelated individuals and 578 sib individuals from 249 Korean families

The estimated mean IBD was calculated for each of the 382 autosomal markers using 429 sib pairs from 249 families. The test statistic had a standard normal distribution under the null hypothesis. Thus, an alternative linkage hypothesis is given when the mean IBD value is >0.5. Of the 382 markers, 44 showed positive linkages with a mean IBD >0.5 (Table 1). A comparison of heterozygosity distributions for all markers and the 44 markers with a mean IBD >0.5 is shown in Fig. 3. The distribution for markers with a mean IBD >0.5 was similar to the distribution for all 382 autosomal markers (P = 0.97).

Fig. 3
figure 3

Distribution of heterozygosities for all autosomal markers and 44 markers with a mean IBD > 0.5

Discussion

We constructed a comprehensive dataset of allelic frequencies and heterozygosities of 400 microsatellite markers in the Korean population, information that will be useful for mapping disease-associated genes in Koreans. Overall, the levels of heterozygosity were similar between the Korean, Taiwanese and Japanese populations. However, there were slight differences in several levels of heterozygosity between these three datasets, suggesting that the Korean dataset will be a powerful resource for genetic studies in Koreans. Heterozygosity for most markers in the Asian populations was lower than that in the Caucasian population. Ten to 13% of the markers examined showed heterozygosities lower than 0.6 in the Asian populations. This level of heterozygosity was almost never observed in the Caucasian population. Thus, the Caucasian database may not be suitable for other ethnic groups. Due to significant differences in heterozygosities and allelic frequencies between populations, markers should be optimally selected for each study population in order to maximize information content and power (Ikari et al. 2001; Tan et al. 2002).

We used 249 unrelated individuals to evaluate the level of heterozygosity for each marker. We also used 578 sib individuals from 249 families to confirm the frequency of each level of heterozygosity using the maximum possible number of individuals. Computer simulation studies using sample sizes of 10–200 individuals and allele numbers ranging from 2 to 27 revealed that the number of individuals genotyped has a minimal influence on heterozygosity (Tan et al. 2002). In our study, we found no differences in the heterozygosity distributions, and thus 249 subjects were sufficient to evaluate the levels of heterozygosity.

A large number of markers (23 of 400 loci) did not meet the Hardy–Weinberg equilibrium in our study. This may reflect sampling errors, population substructure, original genetic variations of the subjects, or the presence of low-frequency alleles (Mizutani et al. 2001; Lu et al. 2004). We used 249 unrelated Korean samples to analyze population stratification with the STRUCTURE program. Fst values in K = 2 were 0.003 and 0.0015, and in K = 3 were 0.0028, 0.0036 and 0.0040 (data not shown), suggesting that the markers we used were not sufficient to detect sub-structures in the Korean population or that the 249 subjects we used were from one local region.

Heterozygosity and PIC are commonly used to indicate how informative a marker is. Highly polymorphic markers with heterozygosities ranging from 80 to 90% have been developed for linkage analysis (Ott and Rabinowitz 1997). In this study, the percentage of the markers with 80–90% heterozygosities for Korean population was 29%, while the same value for Caucasian was 52%. In addition, the percentage of PIC values with 0.7 or higher was only 60% (Table 1). Although some markers displayed low heterozygosities and PIC values, we found that the estimated IBDs from less informative autosomal markers could be also used for positive disease linkages. The distribution of heterozygosities for the 44 markers with a mean IBD >0.5 was similar to that for all 382 autosomal markers (Fig. 3). We found that it was unnecessary to exclude markers with low heterozygosities, at least when using the IBD sharing method of model-free linkage analysis.

In conclusion, heterozygosity information gleaned from this study will be a useful reference for genome-wide screens of Koreans and comparative studies of microsatellite markers among different ethnic populations. Furthermore, the present study may contribute to linkage studies involving the IBD sharing method of model-free analysis.