Introduction

Genetic variations have been used for the identification of disease-related genes. Among those genetic variations, single nucleotide polymorphisms (SNPs) are the most common genetic variations between individuals, existing at a frequency of approximately 1 in every 300–1000 bases in the human genome (Brookes 1999; Cargill et al. 1999). Therefore, SNP can be used to facilitate genetic mapping studies that may lead to a better understanding of the genetic basis for complex diseases such as high blood pressure, diabetes, asthma and inflammatory diseases (Johnson et al. 2000; Horikawa et al. 2000; Hugot et al. 2001).

A large number of SNPs are deposited in the public SNP database, dbSNP (Sherry et al. 2001), at the U.S. National Center for Biotechnology Information (NCBI). Almost 5 million SNPs of the estimated 10 million SNPs have been identified to date and over 3 million SNPs have currently been assigned as "rsSNPs" at dbSNP (NCBI dbSNP build 110). However, there are many sequencing errors in SNP discovery steps and ethnic differences in SNP allele frequency patterns. In addition, most of the SNPs are located in intergenic regions. Although any SNPs can be used as useful genetic markers to identify disease loci, ultimately the functional SNP identified in the regulatory or coding region of the genes may be more important. Therefore, in order to test the accuracy of SNP information present in the public SNP database and also to compare ethnic differences in SNP allele frequency, approximately 458 SNP (mainly cSNP) sites of 161 disease candidate genes were collected from the public SNP database and the frequencies of the selected SNP sites were tested in the Korean population.

Materials and methods

Selection of candidate SNPs and DNA samples

Disease candidate genes mainly associated with immune responses were selected for the following diseases: atopic dermatitis, asthma, cardiovascular diseases, gastritis–hepatitis, and cancers. The candidate genes were chosen on the basis of their potential relevance to the selected common diseases. A large number of SNPs in coding regions (cSNPs) and some untranslated regions from the selected disease candidate genes were collected in the publicly available dbSNP database (http://www.ncbi.nlm.nih.gov/SNPs). A total of 43 healthy Korean women aged 34–62 years (53.2 ±6.3, mean ± standard deviation), who did not have any pathological symptoms at the time of interview and blood test, were randomly selected. Informed written consent for participation was obtained from each individual. Blood was drawn into an ACD-A tube and the lymphocytes were isolated and transformed with EB virus. Genomic DNA was isolated from the EB-virus transformed lymphocytes with the standard method.

SNP genotyping for allele frequency

SNP genotyping was performed by SNP-IT™ assays using SNPstream 25K™ System (Orchid Biosciences, New Jersey, USA). Briefly, the genomic DNA region spanning the polymorphic site was PCR-amplified using one phosphothiolated primer and one regular PCR primer. The amplified PCR products were then digested with exonuclease. The 5′ phosphothiolates were used in this study to protect one strand of the PCR-product from exonuclease digestion. The single-stranded PCR template generated from exnuclease digestion was overlaid onto a 384 well plate that pre-coated covalently with the primer extension primers, SNP-IT™ primers. These SNP-IT™ primers were designed to hybridize immediately adjacent to the polymorphic site. After hybridization of template strands, SNP-IT™ primers were then extended by a single base with DNA polymerase at the polymorphic site of interest. The extension mixtures contained two labeled terminating nucleotides (one FITC, one biotin) and two unlabeled terminating nucleotides. The final single base incorporated was identified with serial colorimetric reactions with anti-FITC-AP and streptavidin-HRP, respectively. The results of blue and/or yellow color developments were analyzed with an ELISA reader and the final genotyping (allele) calls were made with the QCReview™ program.

Korean-SNP database

The Korean-SNP database was constructed at the National Genome Research Institute (National Institute of Health, Korea). All SNP allele frequency data described in this study are currently available at the Korean-SNP database (http://152.99.72.69/~SNP/ksnp.html).

Results and discussion

Selection of SNPs in disease candidate genes from the public SNP database

Prior to association studies by using SNP for searching complex disease genes, we initially collected several disease candidate genes relevant to asthma, atopic dermatitis, hepatitis, and cancers. The lists of disease candidate genes (total 161 genes) were obtained from several disease genome centers in Korea and a total of 458 SNPs of the disease candidate genes were selected from the public dbSNP database. All SNPs and names of selected disease candidate genes used in this study are available at the Korean-SNP database web site as described in Materials and methods.

Distribution of SNP allele frequencies of disease candidate genes in the Korean population

To investigate the distribution of SNP allele frequency in the Korean population, a total of 458 SNP sites, selected from 161 disease candidate genes, were genotyped in 43 unrelated female Korean individuals using SNP-ITTMmethodology. As shown in Fig. 1(A), among 458 SNP sites tested in this study, 201 SNP sites were polymorphic in the Korean population, indicating that 43.9% of SNP sites selected from the public SNP database were polymorphic in the Korean population. The allele frequency distribution of 201 polymorphic SNPs with respect to the frequency of the minor allele was shown in Fig. 1(B). The allele frequencies of 201 polymorphic SNPs in the Korean population were not uniformly distributed. The highest proportion (33.8%) of the rare SNPs having a minor allele frequency of less than 10% was observed (Fig. 1B), perhaps representing the true distribution of SNPs in nature. The average minor allele frequency of 21.3% in the Korean population was similar with a previous study in the Japanese population that reported an average minor allele frequency of 24% (Haga et al. 2002). In addition, among 201 polymorphic SNPs, approximately two thirds (66%) had greater than 10% minor allele frequency in the Korean population, indicating that those SNP sites can be used as useful genetic markers for searching complex disease genes in the association studies.

Fig. 1A, B.
figure 1

Summary of SNP genotyping in the Korean population. A total of 458 SNPs were selected from the public SNP database and the presence of SNPs was determined in the Korean population using the SNP-ITâ„¢ system. Among 458 SNPs tested in this study, 201 SNPs were polymorphic in the Korean population (A). The distribution of minor allele frequency of verified 201 SNPs was shown (B)

Ethnic differences in SNP allele frequency

At the present time, the public SNP database is very useful means for checking the presence of SNP allele frequency in a particular population, especially major ethnic groups such as Caucasians, Africans and Asians. In order to compare the allele frequency of those ethnic groups with that of Korean population, both The SNP Consortium (TSC; http://snp.cshl.org) database and JSNP database (http://snp.ims.u-tokyo.ac.jp) were used. Among 458 SNPs tested in this study, only 7 and 32 SNPs were matched with TSC data and JSNP data, respectively, which have allele frequency data of other ethnic groups. This result indicates that SNPs available at the public SNP database have very limited numbers of allele frequency data in a particular population. Furthermore, this result, together with previous available data showing only limited SNP characterization using the Korean population (Lee et al. 2001), suggests that, using the Korean population, an extensive characterization of publicly available SNPs should be undertaken prior to use of those candidate SNPs for disease association studies.

It is interesting that the 7 SNPs overlapping with TSC data showed striking differences in allele frequency among ethnic groups (Table 1). For example, no SNP polymorphism of the EGF-R gene (rs#884225) was observed in Caucasians and Africans, whereas the same SNP site was polymorphic with frequencies of 0.2 and 0.37 in Asians and Koreans, respectively. In contrast, the SNP of the NAT2 gene (rs#1208) was nearly monomorphic in Asians and Koreans, whereas this SNP was polymorphic with frequencies of 0.29 and 0.33 in Caucasians and Africans. In the majority of 7 SNPs, similar allele frequency patterns were found between Africans and Caucasians except one SNP (IL-12Rβ2, rs#1495963). The latter IL-12Rβ2 gene was virtually monomorphic in Caucasians, whereas this gene is polymorphic in the other three ethnic groups, having similar allele frequencies. Furthermore, a major "C" allele of CD70 (rs#1862511) in Caucasians and Africans was inversed to a minor allele in Asians and Koreans. As shown in the EGF-R gene (rs#884225) and IL-12Rβ2 gene (rs#1495963), considerable differences in allele frequency between Koreans and Asians were found. These differences may be due to sampling variability since only Japanese and Chinese, but not Koreans, were included among the Asian samples of TSC data.

Table 1. Comparison of allele frequency of overlapping Korean SNP data with TSC data

Although significant differences in SNP allele frequency were detected among different ethnic groups (Table 1), the comparison of allele frequencies between Koreans and Japanese showed a high similarity of SNP allele frequency patterns having less than 10% allele frequency differences in all SNPs tested except EGF (rs#2302135) (Table 2). Recently, it has been reported that the Asian population had the smallest number of distinct SNP haplotypes. Furthermore, allele frequency patterns between Korean and Japanese in our data were shown to be comparable with previous data within the Japanese population (Okuda et al. 2002), suggesting that both Korean and Japanese populations may share a common origin of ancestry, as expected from the close geographical location of the two countries. Our study has, however, one important limitation, which is a finding that the ethnic differences in allele frequencies between Koreans and Japanese are based on only a small sample size. Since genotype distribution with small sample size is sometimes different from true distribution with large populations, further studies, using a larger sample size, need to be conducted to reach an accurate evaluation.

Table 2. Comparison of allele frequency of verified Korean SNPs and JSNP database data

In summary, a total of 458 SNPs selected from the 161 disease candidate genes were characterized in the Korean population. Our results will be further utilized to determine the experimental strategies for studying complex diseases using SNPs selected from publicly available SNP databases.