Introduction

Cytochrome P450 3A4 (CYP3A4, MIM 124010), together with CYP3A5 (MIM 605325), CYP3A7 (MIM 605340) and CYP3A43 (MIM 606534), are members of the CYP3A subfamily forming a gene cluster on chromosome 7q22. CYP3A4, as the most abundantly expressed P450 isoenzyme in human liver, contributes to the oxidative metabolism of approximately 50% of clinically-used drugs, with a broad range of substrate specificity. For instance, CYP3A4 is involved in the oxidation of certain antibiotics, calcium channel blockers, antidepressants, immunosuppressants, HMG-CoA reductase inhibitors (HMGs), antihistamines and protease inhibitors, as well as some endogenous steroids, such as cortisol, testosterone and estradiol.

The enzyme activity of CYP3A4 shows wide interindividual variability (up to 60-fold),1 resulting in therapeutic failure, unpredictable adverse side effects or severe drug toxicity. The enormous variation observed in drug metabolism is mainly due to the combined effect of genetic polymorphisms, regulation of gene expression and interaction with drugs or environmental chemicals.2 Genetic polymorphisms, accounting for as much as 90% of interindividual variability in CYP3A4 activity,3 are therefore of clinical value in predicting an individual's ability to respond to certain therapeutic agents. Moreover, CYP3A4 polymorphisms can also assist prediction of disease predisposition.4 CYP3A4*1B allele (CYP3A4-V, rs2740574), a −392A>G transition in the promoter region, has been reported to be significantly associated with HIV infection;5 this allele can also result in an increased risk of estrogen negative breast cancer6 and prostate cancer;4, 7, 8 CYP3A4*4 allele with the substitution of Ile118Val is significantly associated with the prevalence of type 2 diabetes mellitus;9 CYP3A4*18 allele which contains a Leu293Pro substitution has also been shown to be associated with low bone mineral density and increased prostate cancer risk.4, 10

Given these important clinical implications, genetic polymorphisms of CYP3A4 are extensively studied in different ethnic populations and exhibit remarkable interethnic difference in distribution and frequency spectrum. Thus far, a total of 20 allelic variants have been identified and characterized (http://www.imm.ki.se/CYPalleles/cyp3a4.htm), eight of which lead to reduced catalytic activity or functional deficiency in vitro. The most prevalent variant is CYP3A4*1B, with allele frequency ranging from 04% in Asians and Europeans to 82% in Africans.11, 12 The elimination of CYP3A4*1B in non-African populations has been suggested to be a consequence of selective advantage.13 This variant might moderately enhance CYP3A4 gene expression, by virtue of reduced binding ability of a transcriptional repressor.14 CYP3A4*18 is the most frequent variant in Asian populations (about 1% in Chinese, 1.3% in Japanese and 2.1% in Malaysian),15, 16, 17, 18 but absent both in Caucasians and Africans. One recently discovered rare variant, CYP3A4*20—with a premature stop codon yielding a truncated protein and completely devoid of functional activity—occurs less than 0.06% in Caucasians19 and has not been reported in Asians or Africans.

Han Chinese is the largest single ethnic group in the world, constituting over 92% of the population of mainland China and about 19% of the global human population. Han Chinese has significant internal difference as a result of ongoing demographic expansion, accompanied by assimilation of various regional ethnicities and tribes,20, 21, 22, 23 which can be divided into two genetically differentiated groups, northern Han and southern Han, separated by the Yangtze River.20, 21, 22 Hitherto, only a few studies focused on single-nucleotide polymorphism (SNP) prevalence analysis of CYP3A4 in Han Chinese. Hsieh et al.24 firstly screened SNPs in 102 Taiwan Chinese by PCR-single-strand conformation polymorphism and restriction fragment length polymorphism. Nevertheless, Taiwan Chinese are not appropriate to serve as a good representative for Han Chinese in mainland. Then, Wen and Liu et al.11, 25 surveyed polymorphisms of CYP3A4 in Chinese by genotyping, a method that cannot detect novel alleles which might be specific to Han Chinese. Recently, Du et al.26 used direct sequencing, but with a small amount of sample (20 Han Chinese). Therefore, the genetic polymorphisms of CYP3A4 gene in Han Chinese are far from being studied thoroughly, and more investigation of CYP3A4 polymorphisms in Han Chinese is imperatively needed.

Considering the ethnic-specific distribution of genetic polymorphisms and their important roles in optimizing therapeutic efficiency and modifying clinical presentation, we screened CYP3A4 variants in 100 Han Chinese individuals by direct sequencing. All of our samples are collected in Beijing, which is a good representative of northern Han. And we also estimated allele and genotype frequencies, reconstructed haplotypes and analyzed functional activity of novel polymorphisms.

Materials and methods

Subjects and genomic DNA extraction

A total of 100 cell lines of unrelated healthy individuals (51 male and 49 female) of Han Chinese ancestry were obtained from two sources: Han Chinese in Beijing from the HapMap Project27 (42 cell lines) and blood samples of unrelated Han Chinese individuals in Beijing (remaining 58 cell lines).28 The 58 participants (30 male and 28 female; mean age 26) were judged to be in healthy condition, based on a physical examination. Before blood and information collection, the participants signed informed consent forms. To ensure their Han Chinese origin, they were interviewed regarding their ancestral status for up to three previous generations. The project was approved by the Ethnical Committee at the Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.

We used salting-out method to extract genomic DNA from Epstein-Barr virus-transformed lymphoblastoid cells, established from peripheral blood.29

PCR amplification and sequencing

To screen the SNPs in CYP3A4, we sequenced gene regions, including its promoter, exons, surrounding introns and 3′ untranslated region (3′UTR). The PCR was performed in a volume of 15 μl, containing 30 ng of genomic DNA, 1 × PCR TaKara buffer (Mg2+ plus), 2.5 pmol of each primer, 25 pmol dNTPs and 1 unit of Taq DNA polymerase (TaKara Bio, Dalian, China). The cycling program was initiated with heat denaturation at 95 °C for 5 min, followed by 30 cycles of 94 °C for 30 s, 55–60 °C for 30 s, 72 °C for 1 min and a final extension at 72 °C for 10 min, using a thermal cycler (GeneAmp PCR System 9700, Applied Biosystems, Foster City, CA, USA). The PCR products were enzymatically purified using shrimp alkaline phosphatase (SAP; Amersham Biosciences, NJ, USA), followed by incubation at 37 °C for 60 min, after which the shrimp alkaline phosphatase enzyme was inactivated at 85 °C for 15 min. Then the purified PCR products were subjected to direct sequencing by using the DYEnamic ET terminator cycle sequencing kit (GE Healthcare, Chalfont St Giles, UK) on the ABI Prism 3730xl DNA Analyzer (Applied Biosystems). All the PCR and sequencing primers were listed in Supplementary Table S1. Singleton mutations, occurring only once in 100 samples, were verified with a new PCR amplification, followed by bidirectional sequencing with coverage of the SNP.

Data analysis

We used Phred, Phrap, Consed and Polyphred programs (University of California; http://elcapitan.ucsd.edu/hyper/polyphred.usage.html) for base calling, quality assessment and polymorphism determination from DNA sequencing.30, 31, 32, 33 Genetic polymorphisms of CYP3A4 gene were named according to the genomic reference sequence AF280107.1.

We employed Haploview4.1 (BROAD Institute; http://www.broad.mit.edu/mpg/haploview/) to estimate allele frequencies, test Hardy–Weinberg equilibrium and analyze linkage disequilibrium (LD) structure.34 LD structure was displayed by GOLD heatmap color scheme,35 and LD blocks were defined by the ‘solid spine of LD’ algorithm,34 with a minimum D’ value of 0.9.

We used PHASE2.1 (University of Washington; http://www.stat.washington.edu/stephens/phase.html) for haplotype reconstruction and haplotype frequency estimation.36, 37

We used PITA,38 an miRNA target prediction algorithm based on hybridization energy and site accessibility, to predict potential miRNAs that target CYP3A4 and harbor certain SNP in their binding sites.

We combined two prediction programs, PANTHER (the National Institute of General Medical Sciences; http://www.pantherdb.org/)39, 40 and PolyPhen (Brigham and Women's Hospital, Harvard University; http://genetics.bwh.harvard.edu/pph/),41 to predict possible impact of novel nonsynonymous SNP on protein function in the consideration of both sequence similarity and structural information. PANTHER subPSEC scores <−3.0 (Pdeleterious>0.5) and PolyPhen PSIC score difference >2.0 indicate mutations as probably damaging protein function.39, 40, 41 PyMOL 0.99rc6 (Delano Scientific LLC.; http://www.pymol.org/) was utilized for visualization of CYP3A4 three-dimentional structure (PDB ID: 2J0D),42 which was downloaded from the RCSB Protein Data Bank (PDB, http://www.rcsb.org/pdb/home/home.do). CASTp (Computed Atlas of Surface Topography of proteins, http://sts-fw.bioengr.uic.edu/castp/calculation.php) was employed for protein pocket identification and measurement of pocket volume change influenced by novel amino acid substitution.43

We performed multiple neutrality tests using DnaSP v5 (http://www.ub.edu/dnasp/)44 to detect evidence of positive selection on the CYP3A4 gene. Tajima's D (TD),45 Fu and Li's F (FL-F)46 (with a chimpanzee sequence as outgroup), and Fay and Wu's H (FW-H)47 (also with outgroup) were calculated by a sliding window of 25 bp to test for deviations from a neutral equilibrium frequency distribution at all loci. We additionally investigated evidence for natural selection by examining the integrated haplotype score (iHS)48 that was obtained from the program haplotter (http://haplotter.uchicago.edu/).

Results and discussion

Allele and genotype frequency of CYP3A4 in Han Chinese

We detected 11 genetic variants in Han Chinese, with allele frequencies complied well with the Hardy–Weinberg equilibrium (P>0.05), which include two in promoter, four in exons, three in introns and two in 3′UTR (Table 1). Of the 11 genetic variants, three novel polymorphisms were identified, they are 4231 A>C in intron2, 20148 A>G in exon 10 and 26908 G>A in 3′UTR. Most of the CYP3A4 polymorphisms in Han Chinese are relatively rare: nine SNPs occur below a frequency of 5%, except two commonly SNPs, rs28969391 (32.5%) and rs2242480 (25%). rs28969391 is a 26707T deletion in the 3′UTR with little chance to be involved in the post-transcriptional regulation, as it is situated in a region not targeted by any known human miRNAs according to PITA prediction. Indeed, SNPs in miRNA target sites rarely reach such a high frequency, as they are usually subjected to more stringent functional constraints and would be removed by purifying selection eventually.49, 50 rs2242480 is located in the 10th intron, 12 nt downstream of the donor site (IVS10+12). This polymorphism has been reported to increase the lipid-lowering efficacy of atorvastatin,51 although it remains unclear whether splicing efficiency would be influenced in the presence of a G>A mutant, adjacent to 5′ splice site.

Table 1 Genetic polymorphisms of CYP3A4 in Han Chinese

Moreover, five alleles were identified in Han Chinese, they are CYP3A4*1, *5, *6, *18 and *21, with occurrence of 97, 0.5, 1, 1 and 0.5%, respectively (Table 2). We further compared allele frequencies of CYP3A4 among different ethnic groups and demonstrated remarkable interpopulation difference, with regard to allelic distribution (Table 3). CYP3A4*1B, the most frequent allele in Africans (82%),12 has significantly lower incidence in Caucasians (4%)12, 52 and is near extinction in Asians.53, 54 This allele is present in about 0.5% of Han Chinese population, a higher incidence than in Malays18 and Japanese.54 The top two frequent alleles estimated in Chinese by Du et al., CYP3A4*15 (16%) and *14 (7%), were neither detected in our samples. The significant differences of allele frequencies between the two investigations may result from involvement of ethnic minorities in Du’s samples (consisting of 20 Han, 30 She and 10 Dong subjects) and substantial genetic diversity within Han Chinese. In our Han Chinese population, CYP3A*18 remains the most common coding variant, which is in agreement with previous investigations in Asian populations.15, 16, 17, 18, 54

Table 2 Frequencies of CYP3A4 alleles and genotypes in Han Chinese
Table 3 Ethnic differences in CYP3A4 allele frequencies

Among seven observed genotypes (Table 2), the wild type absolutely predominates in Han Chinese, accounting for approximately 94% of the populations; the other six genotypes—heterozygous *1/*1B, *1/*5, *1/*6, *1/*18, *1/*21 and *6/*18—each contributes an insignificant percentage (1%). The low prevalence of mutant genotypes indicates relative infrequency of poor metabolizer in Han Chinese, which is in line with previously reported low-frequency of CYP3A4 poor metabolizer in Asians.55

Haplotypes of CYP3A4 in Han Chinese

Haplotypes were inferred with 11 detected SNPs by the PHASE program. Our reconstruction of haplotypes strengthened previous findings that a few common haplotypes of CYP3A4 occupy a huge proportion of the total chromosomes examined,54 whereas a large subset of haplotypes carries low levels of variation.56 Of 14-phased haplotypes (Figure 1), two major haplotypes, CYP3A4*1A and *1v, constitute 59 and 19.5% of the total haplotype diversity, respectively, whereas 11 haplotypes occur at a frequency below 5%, bearing no more than three variants.

Figure 1
figure 1

Haplotypes of CYP3A4 in Han Chinese. AF280107.1 is defined as the genomic reference sequence of CYP3A4 wild-type allele. Blue and yellow cells show common alleles and rare alleles, respectively.

Seven haplotypes labeled with small alphabetical letters were newly assigned (Figure 1). Haplotype inference revealed the involvement of novel polymorphisms in CYP3A4*1u, *1w and *21: haplotype of *1u contains novel substitution of A>C at nucleotide position 4231; the *1w haplotype is composed of 20230G>A, 26707T>- and new variant 26908G>A; and novel allele *21 harbors nonsynonymous SNP 20148A>G in exon 10, resulting in Cys taking the place of Tyr at amino acid 319 (Y319C).

According to haplotype combinations, CYP3A4 haplotypes containing the IVS10+12G allele (such as *1A) show strong linkage with CYP3A5*3, whereas 88% of the CYP3A4 haplotypes with the IVS10+12A allele (such as *1G) are tightly linked to CYP3A5*1 in Japanese.54 The close linkage of haplotypes between CYP3A4 and CYP3A5 was also demonstrated in Chinese populations,57 which indicates that genotyping at the IVS10+12 position in CYP3A4 might be easy to predict the existence of CYP3A5*3.54 We observed about 75% of CYP3A4 haplotypes carry the IVS10+12G allele, implying a large proportion of CYP3A5*3 in the Han Chinese population.

The novel nonsynonymous SNP altering enzyme function

The substitution of Cys for Tyr in codon 319 (Y319C) lies on the helix I,42 a domain which flanks the heme group and contains a substrate recognition site (SRS4).58 As shown in the Supplementary Table S2, Tyr319 is a highly conserved residue in the cytochrome P450 family throughout eukaryote evolution from nematode to human. In order to predict the functional impact of the amino acid mutation, we used two in silico prediction algorithms, PolyPhen and PANTHER, which combine protein structure and sequence evolution analysis to achieve a more comprehensive result. According to PANTHER prediction, the Y319C variant stands a good chance of impairing the protein function with low subPSEC score (−6.90091) and significantly high Pdeleterious value (0.98018). Likewise, the variant was also considered to be probably damaging, because of the enlarging of a cavity based on PolyPhen analysis (PSIC score difference=3.453). The CASTp analysis provided us with a better understanding of the fatal transformation in cavity resulting from the nonsynonymous SNP (Figures 2a and b). The wild-type cavity (Pocket ID=144) covers 89.4 Å2 of surface area and has a volume of 135.08 Å3, which is composed of atoms from 10 residues: Tyr319, Glu320, Leu366, Phe367, Tyr399, Arg403, Glu412, Pro474, Leu475 and Leu477 (Figure 2a). Surprisingly, the replacement of Tyr319 with Cys yields an increase in surface area of 270.1 Å2 and a three-fold expansion in volume (Pocket ID=158, surface area=359.5Å2, volume=376.86Å3). The dramatic change of cavity in the mutant is due to recruitment of atoms from additional 11 residues: Thr323, Pro467, Glu470, Thr471, Gln472, Ile473, Lys487, Pro488, Val489, Val490 and Leu491 (Figure 2b).

Figure 2
figure 2

The Tyr319Cys substitution enlarges cavity configuration (a, b) and disrupts two hydrogen bonds with Gln472 and Leu475 on the loop between β3-3 and β4-1 (c and d). (a) The pocket in white is formed by the atoms from 10 residues: Tyr319, Glu320, Leu366, Phe367, Tyr399, Arg403, Glu412, Pro474, Leu475, Leu477. (Pocket ID=144, surface area=89.4Å2, volume=135.08Å3). And the atoms from Tyr319 are shown in yellow. (b) The pocket in green is formed by the atoms from 21 residues: Cys319, Glu320, Thr323, Leu366, Phe367, Tyr399, Arg403, Glu412, Pro467, Glu470, Thr471, Gln472, Ile473, Pro474, Leu475, Leu477, Lys487, Pro488, Val489, Val490, Leu491 (Pocket ID=158, surface area=359.5Å2, volume=376.86Å3). And the atoms from Cys319 are shown in blue. (c) Tyr319 is shown in green. (d) Cys319 is shown in yellow.

Furthermore, Tyr319 has a critical role in maintaining perfect position of the loop between β3-3 and β4-1. This β-β loop is held in place through four extra loop hydrogen bonds (Figure 2c). Two of them are formed from Tyr319 to the oxygen atom of Gln472 and the nitrogen atom of Leu475, with bond lengths of 3.01 and 3.05 Å, respectively, whereas the other two are donated by the bonding interactions between Ile369 and Leu483, and between Leu172 and Val489, respectively. The three-dimensional structure of the β-β loop in Figure 2c showed that the center of the loop (L477-P485) is available for approaching ligands, termed as a ligand-accessible region.59 Both X-ray structure analysis and molecular modeling for CYP3A4 have revealed ligand inducible of heme pocket and extreme flexibility of ligand-accessible regions, so that diverse substrates can be accommodated in active site by adopting multiple ligand-bound conformations.42, 59 From this point of view, Tyr319 together with Leu172 and Ile369, might act as anchors of the β-β loop, not only keeping the flexibility of the loop for its participation in the regulation of the heme pocket size when interacting with various drugs, but also restrict the freedom of the loop to avoid ligand-accessible region losing connection from ligands. Thus, once Tyr319 is replaced by Cys, its interaction with Gln472 and Leu475 will be abolished (Figure 2d), leaving only two hydrogen bonds unable to adjust and stabilize the β-β loop.

Therefore, the highly conserved residue in important domain, together with striking alteration in cavity configuration as a result of this single amino acid substitution, raises the possibility of producing malfunction protein by the new variant.

The novel SNP in 3′UTR influencing miRNA post-transcriptional regulation

We detected a new polymorphism, which is a G-to-A transition at nucleotide position 26908 located in the 3′UTR region. Accumulating evidence revealed that variants in the 3′UTR could be functional impairment if they happen to be located in miRNA-binding sites,49, 60 because SNPs will interfere with miRNA function and result in loss of miRNA-mediated regulation of drug metabolism genes. For this reason, we employed PITA to predict liver-expressed miRNAs,61, 62, 63, 64, 65 whose binding sites harbor this polymorphism site. A total of 14 miRNAs containing complementary site of the SNP in their seed sequences meet the threshold of interaction energy for miRNA:target pairing (ΔΔG−5 kcal/mol) (Table 4, Supplementary Figure S1). The strengthened or declined miRNA binding caused by SNP variations may have consequences of corresponding downregulation or upregulation of mRNA stability and translation.66, 67 Therefore, the novel polymorphism in predicted miRNA target sites may be a candidate causal variant responsible for altering expression of CYP3A4 protein.

Table 4 Prediction of novel SNP influencing miRNAs targeting

Footprint of positive selection on CYP3A4 in Han Chinese

In non-African populations, the CYP3A locus was reported under strong positive selection,13, 56 leading to sudden loss of neutral genetic variation when beneficial alleles arise nearby. The near-fixation frequency of derived alleles of both CYP3A4*1B and CYP3A5*3, coupled with their unusually geographic patterns correlating with latitude, were thought to be results of salt homeostasis and rickets, the underlying selecting factors in European, Caucasians and Asians.13, 56 In Han Chinese, we also observed a number of characteristic features in sequence variations: low polymorphism levels with a fraction of high-frequency-derived alleles, a marked skew toward rare variants and dramatic differences in allele frequency between Han Chinese and Africans. All of these distinct traits give us a clue towards selective sweep on CYP3A4 locus in Han Chinese.

Thereby, both TD45 and FL-F46 tests were employed to test the observed mutation frequency distribution for deviations from neutral expectations. It is well known that the expected values of TD and FL-F are equal to zero under neutral equilibrium model. However, balancing selection or population subdivision may cause an excess of intermediate frequency variants and positive TD and FL-F values, whereas positive selection or population growth may lead to an excess of low frequency variants and negative TD and FL-F values.45, 46, 68 For the Han Chinese population, although the obtained negative value of TD failed to reach statistical significance (TD=−1.44935, P>0.10, Figure 3a), the bias in the spectrum of allele frequencies measured by FL-F test is significant (FL-F=−2.43436, P<0.05, Figure 3b), which indicates that the CYP3A4 locus is subject to positive selection in Han Chinese.

Figure 3
figure 3

The distribution of Tajima’s D (TD) (a), Fu and Li’s F (FL-F) (b), and Fay and Wu's H (FW-H) (c) along the CYP3A4 gene.

To obtain more evidence of positive selection, we used FW-H test47 on the basis of the excess of high-frequency-derived alleles, immediately after a selective sweep. FW-H test statistic is also expected to be negative, whereas the target of selection is typically within the deepest valley of the statistic.47 Indeed, the FW-H test statistic is negative across the CYP3A4 locus (FW-H=−2.65457, normalized FW-H=−2.19284), further confirming a recent episode of positive selection on CYP3A4. Moreover, we observed two prominent valleys in the FW-H statistic, with the deeper one in the locus of CYP3A4*1B (Figure 3c), implying the derived allele of CYP3A4*1B that promotes interaction with transcription repressors and leads to a decrease in CYP3A4 gene expression that has been favored by natural selection in the Han Chinese population. Our results are consistent with previous findings of detecting a selection against the CYP3A4*1B allele.13

Notably, the minor valley of the FW-H statistic occurs on the SNP rs2242480 (Figure 3c). To assess the evolutionary characteristics of this polymorphism, we conducted integrated haplotype score (iHS) statistic48 by Phase II HapMap data,69 a measure designed to detect extended haplotypes due to selective sweep. In principle, an extreme iHS value (iHS>2) of an SNP allele can be taken as evidence for a powerful selection signal.48 However, the iHS value is 0.443 in Africans (YRI), −0.756 in East Asians (ASN) and −1.591 in Europeans (CEU), respectively. The decay of extended haplotype homozygosity in YRI, ASN and CEU is shown in Figure 4. None of them display deviation from neutral expectation. Hence, the SNP rs2242480 is possibly neutral or may be influenced by genetic hitchhiking effect.

Figure 4
figure 4

Decay of haplotypes (above half) and decay of haplotype homozygosity (below half) in the population of Africans (YRI) (a), East Asians (ASN) (b) and Europeans (CEU) (c) using rs2242480 as core single-nucleotide polymorphism (SNP) in a region of 1 Mb. The allele marked in blue is the ancestral allele and in red is derived. For the decay of haplotypes, horizontal lines are haplotypes and SNP positions are marked below the haplotype plot.

In addition, according to the ‘solid spine of LD’ method,34 the 11 SNPs formed a single LD block (D′>0.9), spanning 27 kb of the CYP3A4 locus (Figure 5). As the presence of intense LD is a strong signature of recent selective sweep,70 it also supports a role for positive selection on CYP3A4 in the Han Chinese population.

Figure 5
figure 5

Linkage disequilibrium (LD) in GOLD heatmap colur scheme35 by Haploview4.1. LD block was defined by the solid spine of LD method (D′>0.9).

Taken together, our results confirm that the unusual low polymorphism levels and the skew towards rare variants of CYP3A4 gene in Han Chinese could be caused by the action of positive selection.