Introduction

The cytochrome P450 19A1 (CYP19A1) gene encodes an aromatase that is responsible for the conversion of androgens to estrogens. CYP19A1 is located on chromosome 15q21 and spans 123 kb, including nine protein coding exons and a large 5′-untranslated region of 93 kb with alternative tissue-specific promoters.1 Several untranslated first exons are expressed under the control of a tissue-specific promoter and joined to the second exon through a common splice acceptor site.2 The expression of aromatase in humans has been demonstrated in gonads, adipose, placenta, skin, brain and bone. Given that estrogens have an important role in cell proliferation,3 many studies have suggested that variation in estrogen levels contributes to the risk of estrogen-dependent diseases, such as breast cancer,4 endometrial cancer5 and prostate cancer.6 In addition to cancer association studies, CYP19A1 has been implicated in the risk of obesity in Chinese women,7 in the risk of essential hypertension in males,8 as well as in bone mass and fracture risk.9 One of the important features of aromatase is evidenced by the fact that selective aromatase inhibitors are increasingly used to treat postmenopausal women with estrogen-responsive breast cancer.10 Therefore, CYP19A1 coding for aromatase has been an attractive target for identification of common genetic polymorphisms that may account for different levels of estrogen in humans. However, no efforts to resequence the CYP19A1 gene for a Korean population have been made. To determine the distribution of CYP19A1 alleles and haplotypes, we conducted direct DNA sequencing of the CYP19A1 gene in a Korean population for the first time. Analysis of the CYP19A1 allele distribution and haplotype structure with linkage disequilibrium (LD) was performed.

Materials and methods

Study subjects

DNA samples from 50 unrelated, healthy Koreans were obtained from the DNA repository bank at INJE Pharmacogenomics Research Center for direct DNA sequencing (Inje University College of Medicine, Busan, Korea).11 The Institutional Review Board (IRB) of Busan Paik Hospital (Busan, Korea) approved the research protocol for the use of human DNA from blood samples.

Sequencing and identification of SNPs

Genomic DNA was prepared from peripheral whole blood using the QiAamp blood kit (Valencia, CA, USA). Primers for PCR amplification were identical to those used previously.12 All exons, exon–intron boundaries and relevant promoter regions were PCR amplified. PCR contains primers at 0.2 μmol l−1 each, 2 U of r-Taq polymerase (Takara Bio, Shiga, Japan), and each dNTP at 0.2 mmol l−1 per reaction. PCR cycle parameters include initial denaturation (94 °C, 5 min, 1 cycle), 35 cycles of a denaturation/annealing/extension (94 °C, 30s/60 °C, 30s/72 °C, 30 s) and a final extension at 72 °C for 10 min. PCR was performed on a GeneAmp PCR 9700 (Applied Biosystems, Foster City,CA, USA). Amplified products were purified using a PCR purification kit (NucleoGen, Ansan, Korea), and sequencing was performed using an ABI Prism 3700XL Genetic Analyzer (Applied Biosystems). A software package, PC Gene (Oxford Molecular, Campbell, CA, USA), was used to identify variants with single-nucleotide substitutions in heterozygous or homozygous mutations.

Linkage disequilibrium and haplotype analysis

Hardy–Weinberg equilibrium and LD were analyzed by SNPAlyze software (version 4.1; Dynacom, Yokohama, Japan). D and rho square (r2) values were used to access pairwise LD between single-nucleotide polymorphisms (SNPs), as described previously.11, 13

Tag SNP selection

Nineteen variations identified by direct DNA sequencing were applied to select tagging SNPs. The program Tagger was used to select tagging SNPs, which combines the simplicity of pairwise methods with the potential efficiency of multimarker approaches (http://www.broad.mit.edu/mpg/tagger/). The efficiency of tagged SNPs depends on the LD between itself and the tag SNP, as measured by the pairwise correlation coefficient (rp2). We selected a set of tag SNPs in which all known common variants (minor allele frequency >0.05) had an estimated rp2>0.8 with at least one tag SNP. Some SNPs are weakly correlated with other single SNPs but may be efficiently tagged by a haplotype defined by multiple SNPs, reducing the total number of tag SNPs needed. Therefore, we attempted to define the correlation coefficient between each SNP and a haplotype of tagging SNPs as rs2>0.8.

Result and Discussion

Genetic polymorphisms in CYP19A1 may change the function of the aromatase enzyme, resulting in altered levels of estrogen biosynthesis. Thus, its polymorphism may be implicated in the risk of estrogen-related diseases. Although CYP19A1 is a key enzyme in the process of estrogen synthesis, no resequencing efforts have yet been made in people of Korean descent. In this study, direct DNA sequencing analysis of the CYP19A1 gene in 50 Koreans revealed a total of 19 variations. A summary of their frequencies identified is provided in Table 1. The locations of SNPs in relation to the genomic structure of CYP19A1 are shown in Figure 1. χ2-tests were used to compare observed variants with expected variants in the study population. No deviation from Hardy–Weinberg equilibrium was observed for the SNPs identified (P>0.05). Frequencies of functionally important and relatively well-studied CYP19A1 variants in different ethnic groups were summarized in Table 2. Several studies have been conducted to address the association between CYP19A1 (TTTA)n polymorphisms and breast cancer risk as a genetic marker.14, 15 We observed that the most frequent CYP19A1 (TTTA) allele was (TTTA)7 (66%), followed by (TTTA)11 (30%), (TTTA)12 (3%) and (TTTA)13 (1%) (Table 3). Kim et al.16 reported that the frequency of the low type, which means that the TTTA copy number was under 10, was 55.5% and that of the high type (TTTA copy number with over 10) was 44.5% in 102 Korean control subjects. The similar report in a Korean population17 is described in Table 3. Although (TTTA)8 was not detected, this allele was the third most frequent allele in Caucasian and African-American populations, occurring at 12.5 and 2.5%, respectively.12 In the Korean population, the frequency of (TTTA)12 was 3%; however, this allele was detected in 10% of Japanese subjects,14 indicating that the allele frequency in one population cannot be assumed to be equally applicable in a similarly defined population. The promoter SNP −278C>T, known as a very rare allele in Caucasian and African-American populations, had an occurrence rate of 31% in our study, which is similar to that observed in other Asian subjects.12 The −278C>T allele may represent one of the Asian dominant alleles in racial comparisons. A 790C>T (Arg264Cys), reported at a lower frequency in Caucasians, was found at 16% frequency in our study. The 790C>T variant (Arg264Cys) has been reported to have a positive association with increased breast cancer risk in Asians.18 However, the results of another study did not support this hypothesis, and instead suggested that this allele is associated with a lowered risk of breast cancer.4 The 1531C>T (rs10046) allele has been studied extensively, which includes its potential gender-specific association with essential hypertension,8 increased level of estrogen,19 breast cancer risk in Japanese populations20 and endometrial cancer risk in Chinese women.21 This variant, observed at 56% frequency in this study and located in haplotype block 3, exhibited a haplotype structure with IVS7-79A>G, IVS6-106 T>G, IVS6+36A>T and IVS5-16T>G (D′=1, r2=0.92–0.96) in block 3 (Figures 2a and b). A set of tag SNPs, which will be useful for association mapping studies, were determined (Figure 2b). It is suggested that eight tag SNPs would be needed to track all important haplotype blocks in the CYP19A1 gene in Koreans. Nineteen variations were used to characterize LD structures at the CYP19A1 locus, resulting in three LD blocks (Figure 2a). The LD blocks identified were compared with the same SNPs reported in the HapMap database. Two SNPs (−77G>A and −196A>C), located in the 1.6 promoter, are in strong LD in Chinese and Japanese populations, as well as in the Korean group examined here (data not shown). These two SNPs were not linked with 1531C>T (rs10046) in Koreans (D′=0.48, 0.48; r2=0.17, 0.17). However, a relatively strong LD was observed in Japanese (D′=1, 0.94; r2=0.65, 0.64) and Chinese (D′=0.87, 0.87; r2=0.5, 0.5) populations between these two loci, as well as in 1531C>T (rs10046). It may be of interest to determine whether these mutations in the 1.6 promoter can cause population or environmental differences in the expression of the aromatase protein in the given populations.

Table 1 Distribution of CYP19A1 single-nucleotide polymorphisms in a Korean population
Figure 1
figure 1

Genomic organization of CYP19A1 and the location of SNPs. Nine coding exons are indicated by black boxes. The first noncoding exons and 3′ and 5′-untranslated regions are indicated by white boxes. The values of D′ and r2 between SNPs are described. Position numbers refer to the +1 ATG start codon of GenBank accession number NC_000015.

Table 2 Distribution of functionally important CYP19A1 variants in different populations
Table 3 Allele frequencies of the CYP19A1 (TTTA)n polymorphism among controls in different populations
Figure 2
figure 2

Linkage disequilibrium (LD) map of CYP19A1 single-nucleotide polymorphisms obtained from normal healthy subjects (n=50). The 19 CYP19A1 variations identified were included in the LD analysis using the statistics D and r2 values.13 (a) Haploview of CYP19A1 SNPs along with their locations in the CYP2C9 gene. Red depicts a significant linkage between the pair of SNPs. Numbers inside the square refer to the D′ value multiplied by 100. (b) CYP19A1 SNPs and their occurrences in common haplotype structures in three blocks. Eight tag SNPs are marked by star symbols. The frequency of each haplotype is shown at the edge. Thick lines between haplotypes indicate the most common crossings between haplotypes and thinner lines signify less common crossings.

In summary, we resequenced the CYP19A1 gene for the first time in a Korean population and identified 19 variations. Frequencies, haplotypes, LD structures and tagging SNPs were determined. Genetic information with respect to the degree of LD in Koreans might be useful for future genotype–phenotype association studies, especially in terms of breast cancer and hormone-related diseases.