Introduction

Dihydropyrimidine dehydrogenase (DPD) is an inactivating and rate-limiting enzyme for 5-fluorouracil (5-FU), which is used in various therapeutic regimens for gastrointestinal, breast and head/neck cancers (Grem 1996). While the antitumor effect of 5-FU is exerted via anabolic pathways responsible for its intracellular conversion into anti-proliferative nucleotides, DPD affects 5-FU availability by rapidly degrading it to 5, 6-dihydrofluorouracil (DHFU) (Heggie et al. 1987). The importance of DPD in 5-FU metabolism was also highlighted by a lethal drug interaction between 5-FU and the antiviral agent sorivudine. Due to inhibition of DPD by a sorivudine metabolite, severe systemic exposure to 5-FU caused several acute deaths in Japan (Nishiyama et al. 2000).

5-FU catabolism occurs in various tissues, including tumors, but is highest in the liver (Naguib et al. 1985; Lu et al. 1993). Wide variations in DPD activity (8- to 21-fold) were shown in Caucasians, and 3–5% of Caucasians had reduced DPD activity (Etienne et al. 1994; Lu et al. 1998). This variability, which is partially attributed to genetic defects of the DPD gene (DPYD), leads to differential responses of cancer patients, resistance to or increased toxicity of 5-FU (van Kuilenburg 2004). Complete DPD deficiency is also associated with the inherited metabolic disorder, thymine-uraciluria, which is characterized by neurological problems in pediatric patients (Bakkeren et al. 1984).

To date, at least 30 variant DPYD alleles have been published, with or without deleterious impact upon DPD activity (Gross et al. 2003; Ogura et al. 2005; Seck et al. 2005; van Kuilenburg 2004; Zhu et al. 2004). Of these variations, a splice site polymorphism, IVS14 + 1G>A, which causes skipping of exon 14, is occasionally detected in North Europeans with allele frequencies of 0.01–0.02 (van Kuilenburg 2004). Detection of IVS14 + 1G>A in patients suffering from 5-FU-associated grade 3 or 4 toxicity revealed that 24–28% of them were heterozygous or homozygous for this single nucleotide polymorphism (SNP) (van Kuilenburg 2004). However, this SNP has not been reported in Japanese and African-Americans. Recently, Ogura et al. (2005) have shown that a Japanese population exhibits a large degree of interindividual variations in DPD activity of peripheral blood mononuclear cells. They also identified a novel variation, 1097G>C (Gly366Ala), in a healthy volunteer with the lowest DPD activity and demonstrated that the 366Ala variant has reduced activity towards 5-FU in vitro. At present, however, information on variant alleles with clinical relevance in Japanese is limited and cannot fully explain polymorphic DPD activity.

In this study, we searched for genetic variations in DPYD by sequencing 5′ regulatory regions, all exons and surrounding introns from 341 Japanese subjects. Fifty-five variations including nine novel nonsynonymous ones were identified. Then, linkage disequilibrium (LD) and haplotype analyses were performed to clarify the DPYD haplotype structures in Japanese.

Materials and methods

Human DNA samples

Three hundred and forty-one Japanese subjects in this study included 263 cancer patients and 78 healthy volunteers. All 263 patients were administered 5-FU or tegafur for treatment of various cancers (mainly stomach and colon) at the National Cancer Center, and blood samples were collected prior to the fluoropyrimidine chemotherapy. The healthy volunteers were recruited at the Tokyo Women’s Medical University. DNA was extracted from the blood of cancer patients and Epstein-Barr virus-transformed lymphoblastoid cells derived from healthy volunteers. Written informed consent was obtained from all participating subjects. The ethical review boards of the National Cancer Center, the Tokyo Women’s Medical University and the National Institute of Health Sciences approved this study.

PCR conditions for DNA sequencing

To amplify 22 exons (exons 2–23) of DPYD, multiplex PCRs were performed by using four sets of mixed primers (mix 1 to mix 4 of “first PCR” in Table 1). Namely, five exonic fragments were simultaneously amplified from 50 ng of genomic DNA using 0.625 units of Ex-Taq (Takara Bio. Inc., Shiga, Japan) with 0.20 μM primers. Because of the high GC content in exon 1 of DPYD, this region was separately amplified from 50 ng of genomic DNA with 2.5 units of LA-Taq and 0.2 μM primers (listed in Table 1) in GC buffer I (Takara Bio. Inc.). The first PCR conditions were 94°C for 5 min, followed by 30 cycles of 94°C for 30 s, 58°C for 1 min, and 72°C for 2 min; and then a final extension for 7 min at 72°C. Next, each exon was amplified separately from the first PCR products by nested PCR (2nd PCR) using the primer sets (0.2 μM) listed in “second PCR” of Table 1. The second PCR conditions were the same as those of the first PCR, and LA-Taq (2.5 units) for exon 1 and Ex-Taq (0.625 units) for exons 2–23 were used. All PCR primers were designed in the flanking intronic sites to analyze the exon-intron splice junctions. The PCR products were treated with a PCR Product Pre-Sequencing Kit (USB Co., Cleveland, OH) and sequenced directly on both strands using an ABI BigDye Terminator Cycle Sequencing Kit (Applied Biosystems, Foster City, CA) with the primers listed in “sequencing” of Table 1. Excess dye was removed with a DyeEx96 kit (Qiagen, Hilden, Germany). The eluates were analyzed on an ABI Prism 3700 DNA Analyzer (Applied Biosystems). All novel SNPs were confirmed by sequencing of PCR products generated from new genomic DNA amplifications. The genomic and cDNA sequences of DPYD obtained from GenBank (NT_032977.7 and NM_000110.2, respectively) were used as reference sequences. SNP positions were numbered based on the cDNA sequence, and adenine of the translational initiation site in exon 1 was numbered +1. For intronic polymorphisms, the position was numbered from the nearest exon.

Table 1 Primer sequences for human DPYD

Linkage disequilibrium (LD) and haplotype analyses

Hardy-Weinberg equilibrium and LD analyses were performed by SNPAlyze software (Dynacom Co., Yokohama, Japan), and pairwise LD parameters between variations were obtained as the |D’| and rho square (r 2) values. Some haplotypes were unambiguously identified from subjects with homozygous variations at all sites or a heterozygous variation at only one site. Diplotype configurations were inferred by LDSUPPORT software, which determines the posterior probability distribution of the diplotype for each subject based on the estimated haplotype frequencies (Kitamura et al. 2002). Although the nomenclature for nonsynonymous DPYD alleles (DPYD*1 to DPYD*13) have been already publicized (McLeod et al. 1998; Collie-Duguid et al. 2000; Johnson et al. 2002), several reported alleles remain unassigned. To avoid confusion with the previous DPYD allele nomenclature, our block haplotypes in this study were tentatively defined by using ‘#’ instead of ‘*’. A group of haplotypes without any amino acid change is designated as #1, and the haplotype groups bearing already defined alleles, DPYD*5 (Ile543Val), DPYD*6 (Val732Ile), DPYD*9 (Cys29Arg) and DPYD*11 (Val335Leu), were numbered by using the corresponding Arabic numerals, #5, #6, #9, and #11, respectively. Other haplotypes with known nonsynonymous SNPs such as 496A>G (Met166Val) or with the novel nonsynonymous SNP were represented by ‘#’ plus amino acid positions followed by variant residues (for example, #166V). Subtypes within each haplotype group were consecutively named with small alphabetical letters depending on their frequencies. Haplotypes ambiguously inferred in only one patient were indicated in the Fig. 3 legend. Combinations of block haplotypes were analyzed by Haploview software (http://www.broad.mit.edu/mpg/haploview/index.php) (Barrett et al. 2005), and the long-range (whole gene) haplotypes spanning all blocks were inferred by Hapblock software (www.cmb.usc.edu/msms/HapBlock/) (Zhang et al. 2005).

Typing data on DPYD from unrelated 44 Japanese and 30 Caucasian trios were also obtained from the HapMap project (HapMap release 19: http://www.hapmap.org/). The LD profiles and haplotypes of the HapMap data were obtained by Marker beta in Gmap Net (http://www.gmap.net/marker) using its four (1254711, 1254712, 1254713, and 1254714) and six (1166276, 1166277, 1166278, 1166279, 1166280, and 1166281) datasets covering DPYD genomic regions for Japanese and Caucasians, respectively.

Drawing of protein structures

The coordinate data (1gth) of the crystal structure of pig DPD (Dobritzsch et al. 2002) was obtained from the Protein Data Bank. Protein Explorer (http://proteinexplorer.org) (Martz 2002) was used to display the structural features of pig DPD and depict three-dimensional views.

Results

DPYD variations found in a Japanese population

We identified 55 variations, including 38 novel ones by sequencing the promoter regions (up to 613 bp upstream from the translational initiation site), all 23 exons and their flanking regions of DPYD from 341 Japanese subjects (Table 2). The distribution of the variations consisted of 4 in the 5’ flanking region, 21 (5 synonymous and 16 nonsynonymous ones) in the coding exons (Fig. 1) and 30 in the introns. Since we did not find any significant differences in allele frequencies between healthy volunteers and cancer patients (P > 0.05 by χ2 test or Fisher’s exact test) except for one variation, IVS14 + 19C>A, (P = 0.027 by Fisher’s exact test); the data for all subjects were analyzed as one group. All detected variations except for 451A>G (Asn151Asp) and IVS13 + 40G>A were in Hardy-Weinberg equilibrium (P ≥ 0.24).

Table 2 Summary of DPYD SNPs detected in a Japanese population
Fig. 1
figure 1

Twenty-one variations detected in the coding exons are depicted in the schematic diagram of the DPYD gene. Fourteen novel variations are enclosed by squares. The recombination spots were estimated based on the LD profiles obtained from Japanese data in the HapMap project and indicated by arrows. The borders (between introns 8 and 18 of the DPYD) and core region (between introns 12 and 16) of FRA1E identified by Hormozian et al. (2007) are indicated as an open and closed box, respectively

Thirteen novel variations in the coding region (enclosed by a square in Fig. 1) contain four synonymous SNPs, 474T>C (Phe158Phe), 639C>T (Asp213Asp), 1752A>G (Thr584Thr), and 2424T>C (Ser808Ser) and nine nonsynonymous SNPs, 29C>A (Ala10Glu), 325T>A (Tyr109Asn), 451A>G (Asn151Asp), 733A>T (Ile245Phe), 793G>A (Glu265Lys), 1543G>A (Val515Ile), 1572T>G (Phe524Leu), 1666A>C (Ser556Arg), and 2678A>G (Asn893Ser). 451A>G (Asn151Asp), 325T>A (Tyr109Asn), and 2678A>G (Asn893Ser) were found at frequencies of 0.009, 0.003 and 0.003, respectively. The others were detected as single heterozygotes (allele frequencies = 0.0015).

In the 5′ flanking region, all four detected SNPs (-609C>T, -477T>G, -266C>A, -243G>A) were newly found at relatively high allele frequencies (0.006–0.05). However, these SNPs were not located near the proposed cis-regulatory promoter elements (Shestopal et al. 2000). The remaining 21 novel variations were found in intronic regions. Of these SNPs, IVS5–115G>A, IVS12–11G>A, and IVS14-123C>A were detected with allele frequencies of 0.021, 0.038, and 0.155, respectively, but others were rare (<0.01). They were not located in the exon-intron splicing junctions or branch sites.

Seventeen variations were already reported. The ID numbers in the dbSNP databases or references for these SNPs are described in Table 2. The well-known nonsynonymous SNPs, 1627A>G (*5, Ile543Val), 2194G>A (*6, Val732Ile), 85T>C (*9, Cys29Arg), and 1003G>T (*11, Val335Leu), were found in this study at allele frequencies of 0.283, 0.015, 0.029, and 0.0015, respectively. The allele frequencies of two reported SNPs, 496A>G (Met166Val) and 2303C>A (Thr768Lys), were 0.022 and 0.028, respectively. Recently, 1774C>T (Arg592Trp) was reported from a Korean population (Cho et al. 2007), and its allele frequency was 0.0015 in this study. Nine intronic variations, IVS10-15T>C, IVS13 + 39C>T, IVS13 + 40G>A, IVS15 + 75A>G, IVS16-94G>T, IVS18-39G>A, IVS21 + 136G>C, IVS22-58G>C, and IVS22-69G>A, and one synonymous variation, 1896T>C (Phe632Phe), were found with various allele frequencies (0.003–0.378, Table 2). The variations previously detected in Japanese (Kouwaki et al. 1998; Yamaguchi et al. 2001; Ogura et al. 2005), 62G>A (Arg21Gln, *12), 74G>A (His25Arg), 812delT (Leu271X), 1097G>C (Gly366Ala), 1156G>T (Glu386X, *12), and 1714C>G (Leu572Val), were not found in our study. This might be due to their low frequencies.

Linkage disequilibrium (LD) analysis and haplotype block partition

LD analysis was performed by r 2 and |D′| using 18 SNPs (allele frequency ≥0.01) (Fig. 2). Strong linkages were observed in four pairs of SNPs: between -477T>G and 85T>C (Cys29Arg) (r 2 = 0.7025), between 496A>G (Met166Val) and IVS10-15T>C (r 2 = 0.7964), between 1627A>G (Ile543Val) and IVS13 + 39C>T (r 2 = 1.0), and between IVS14-123C>A and IVS15 + 75A>G (r 2 = 1.0). In addition, two known rare SNPs, IVS22-69G>A (rs290855) and IVS22-58G>C (rs17116357), were perfectly linked (r  = 1.0) (data not shown). As for |D′| values, only 43 pairs (28%) out of 153 pairs gave |D′| = 1.0, indicating that a number of recombinations had occurred within this gene. This is not surprising because DPYD is a huge gene of at least 950 kb in length with 3 kb of coding sequences. However, it was difficult to estimate past recombination events in DPYD from our data alone because our variations were mostly limited to exons and surrounding introns.

Fig. 2
figure 2

Linkage disequilibrium (LD) analysis of DPYD. Pairwise LD between 18 common SNPs (>0.01 in allele frequencies) is expressed as r 2(upper) and |D′| (lower) by a 10-graded blue color. The denser color indicates higher linkage. The haplotype block partition based on LD measure |D′| of HapMap data in Japanese is also indicated

To define haplotype blocks, we utilized the HapMap data because SNPs were comprehensively genotyped with an average density of 1 SNP per 1.8 kb. Of 1,002 variations of DPYD genotyped by the HapMap project, 474 SNPs were polymorphic for 44 unrelated Japanese subjects. When the LD profiles for Japanese were obtained by Marker using the HapMap data, strong LD (|D′| > 0.75) clearly decays within introns 11, 12, 13, 14, 16, 18, and 20 (data not shown), suggesting that recombination had occurred in these regions. Based on these findings, the SNPs detected in our study were divided into six haplotype blocks (Figs. 1, 2). Block 1, the largest block, ranges from the 5′-untranslated region (5′-UTR) to intron 10 (347 kb), and includes 22 variations. Block 2 includes eight variations from IVS12-11G>A in intron 12 to IVS13 + 40G>A in intron 13. Block 3 includes six variations from IVS13-47_48insTA in intron 13 to IVS14 + 100T>G in intron 14. Block 4 contains only three SNPs, IVS14-123C>A, IVS14-21C>A and IVS15 + 75A>G, and ranges from intron 14 to intron 15. Block 5 consists of IVS16-94G>T and four rare variations from intron 16 to exon 18. Although the HapMap data showed a decline in LD in intron 20, we defined a block ranging from intron 18 to intron 22 as block 6 because only rare variations (allele frequencies <0.01) were detected downstream of intron 20 (exon 21, intron 21, and intron 22). The block partitioning based on the HapMap data fitted our SNPs well: more than 70% of SNP pairs in each block (block 1–6) gave pair-wise |D′| values greater than 0.8 (Fig. 2).

Haplotype estimation

Using 22, 8, 6, 3, 5, and 11 variations in blocks 1 to 6, 23 (block 1), 8 (block 2), 7 (block 3), 3 (block 4), 6 (block 5), and 11 (block 6) haplotypes were identified or inferred (Fig. 3). Probabilities of diplotype configurations in all six blocks were 100% for over 97% of the subjects. To discriminate our block haplotypes from the previously assigned alleles or haplotypes (DPYD*1 to *13), the mark, #, was used to indicate block haplotypes.

Fig. 3
figure 3

Block haplotypes in DPYD of block 1 (a), block 2 (b), block 3 (c), block 4 (d), block 5 (e), and block 6 (f) in a Japanese population. The nucleotide positions were numbered based on the cDNA sequence (A of the translational start codon is +1) or from the nearest exon. White cell wild-type, gray cell nucleotide alteration. §The haplotypes were inferred in only one patient and ambiguous except for marker SNPs

In block 1, the most dominant haplotype without any variation was #1a (0.818 in frequency), followed by #1b (0.045), #9c (0.022), and #1c (0.021). As suggested by LD (Fig. 2), #9c, the major subtype of the #9 group bearing 85T>C (Cys29Arg), also harbored -477T>G in the 5′-UTR. Known nonsynonymous SNP, 496A>G (Met166Val), was assigned to three haplotypes, #9d, #166Va, and #166Vb.

In block 2, four haplotypes, #1a (0.529), #5a (0.245), #1b (0.176), and #5b (0.038), were major in Japanese and accounted for 99% of all inferred haplotypes. Two subtypes of the #5 group, #5a and #5b, both of which harbored Ile543Val (*5) and IVS13 + 39C>T, were distinguished by a novel intronic SNP, IVS12-11G>A.

As for block 3, in addition to #1a (0.848), #1b harboring the synonymous SNP, 1896T>C (Phe632Phe), was found at a relatively high frequency (0.138).

Block 4 is simple and comprises only three haplotypes, #1a (0.845), #1b (0.154) and #1c (0.0015). The second frequent haplotype, #1b, harbored perfectly linked SNPs, IVS14-123C>A and IVS15 + 75A>G.

Block 5 contained IVS16-94G>T, the most frequent SNP among the 55 SNPs found in this study, which was assigned to #1b with a frequency of 0.374. This block also contained the known nonsynonymous SNP, 2194G>A (Val732Ile, *6), which was assigned to #6a (0.015).

In block 6, the most dominant haplotype was #1a (0.915). It was followed by #1b (0.032) with IVS18-39G>A and #768K (0.028) with 2303C>A (Thr768Lys).

The HapMap data include nine SNPs that we detected (Table 2). Of them, six, 85T>C (rs1801265), 496A>G (rs2297595), 1627A>G (rs1801159), 1896T>C (rs17376848), IVS16-94G>T (rs7556439) and IVS18-39G>A (rs12137711), were suitable for haplotype tagging SNPs (htSNPs) to capture the block haplotypes, block 1 #9, block 1 #166V, block 2 #5, block 3 #1b, block 5 #1b, and block 6 #1b, respectively. IVS21 + 136G>C (rs11165777) and IVS22-69G>A (rs290855)/IVS22-58G>C (rs17116357), were the marker SNPs for block 6 #1e and #1f, respectively, but very rare (allele frequencies = 0.003) in Japanese. The six SNPs, especially 85T>C (rs1801265) and 496A>G (rs2297595), were in strong LD (r 2 > 0.8) with other HapMap SNPs in Japanese (Table 3), indicating that many HapMap SNPs were concurrently linked on the same haplotypes.

Table 3 Linkages of haplotype-tagging SNPs with HapMap SNPs for DPYD

Next, the combinations of block haplotypes (inter-block haplotypes) were analyzed focusing on the haplotypes with frequencies of >0.01 in each block (Fig. 4). Between blocks 1 and 2, both #1a and #1b in block 1 were complicatedly associated with various haplotypes in block 2. It should be noted that #9c in block 1 was linked either with block 2 #1b (0.016 in absolute frequency) or with block 2 #5a (0.006, not shown in Fig. 4). #1c in block 1 was completely linked with block 2 #1a. #151D in block 1 (not shown in Fig. 4), which was a rare haplotype (0.009) harboring 451A>G (Asn151Asp), was completely linked with #5a in block 2.

Fig. 4
figure 4

The combinations of block haplotypes in Japanese. Thick lines represent combinations with frequencies over 10%, and thin lines represent combinations with frequencies of 1.0–9.9%

Between blocks 2 and 3, both #5b and #1b in block 2 were mostly linked with #1a in block 3, whereas both #1a and #5a in block 2 were complicatedly linked with #1a, #1b, or other rare haplotypes such as #1c (not shown in Fig. 4) in block 3. Between blocks 3 and 4 and between blocks 4 and 5, no strong associations of block haplotypes were observed except for the linkage of block 5 #6a to block 4 #1a. Between blocks 5 and 6, most of #1b and all of #6a in block 5 were linked with #1a in block 6. Although #1a in block 6 was associated with various haplotypes in block 5, #1b in block 6 was completely linked with #1a in block 5.

Among the six blocks, the following combinations were major: #1a (block 1)–#1a (block 2) –#1a (block 3)–#1a (block 4)–#1a (block 5)–#1a (block 6) (0.239 in frequency), #1a#5a#1a#1a#1b#1a (0.081), #1a#1a#1a#1a#1b#1a (0.075), #1a#5a#1a#1a#1a#1a (0.070), #1a#1b#1a#1a#1a#1a (0.060) and #1a#1a#1b#1a#1a#1a (0.051).

Ethnic differences in distributions of DPYD SNPs and haplotypes

We compared SNP and haplotype distributions in Japanese with those in other ethnic groups reported in the literature or HapMap project. Notably, IVS14 + 1G>A (*2), 1897delC (Pro633GlnfsX5, *3), 1601G>A (Ser534Asn, *4), 295_298delTCAT (Phe100SerfsX15, *7), 703C>T (Arg235Trp, *8), 2983G>T (Val995Phe, *10), 62G>A (Arg21Gln, *12), 1156G>T (Glu386X, *12), and 1679T>G (Ile560Ser, *13) were not found in this study. Furthermore, several SNPs showed marked differences in allele frequencies among Japanese and other ethnic groups (Table 4).

Table 4 Allele frequencies of common DPYD SNPs in different populations

The allele frequency of 85T>C (Cys29Arg, *9), the tagging SNP for block 1 #9, was quite different between Asians and Caucasians. Its allele frequency in Japanese (0.029 in this study) and Taiwanese (0.022) (Hsiao et al. 2004) was much lower than that in Caucasians (0.185–0.194) (Seck et al. 2005; Morel et al. 2006).

The SNP 496A>G (Met166Val) in block 1 is found at a lower allele frequency in Japanese (0.022) than in Caucasians (0.080) (Seck et al. 2005). Seck et al. (2005) inferred two haplotypes harboring 496A>G (Met166Val) from 157 Caucasians: hap5 (#9d in this study) harboring additional 85T>C (Cys29Arg) and IVS10-15T>C and hap11 concurrently harboring IVS10-15T>C alone with frequencies of 0.040 and 0.014, respectively. In our haplotype analysis, #166Va (0.012) corresponding to hap11 (0.014) was found with a similar frequency in Japanese, whereas the frequency of #9d (0.006) was much lower than that of the corresponding haplotype, hap5 (0.040) in Caucasians.

1627A>G (Ile543Val, *5) in block 2 was found with comparable allele frequencies among Japanese (0.283 in this study), Caucasians (0.14-0.275) (Seck et al. 2005; Ridge et al. 1998a), African-Americans (0.227) (Wei et al. 1998), and Taiwanese (0.210–0.283) (Wei et al. 1998; Hsiao et al. 2004).

The allele frequency (0.015) of 2194G>A (Val732Ile, *6) in block 5 in our Japanese population is slightly lower than that previously reported in Caucasians (0.022-0.058) (Seck et al. 2005; Ridge et al. 1998a) and Finish (0.067) (Wei et al. 1998), but is comparable to that in Taiwanese (0.012-0.014) (Wei et al. 1998; Hsiao et al. 2004) and African-Americans (0.019) (Wei et al. 1998).

Ethnic differences in the allele frequencies were also observed with synonymous and intronic variations (Table 4). The allele frequency of 1896T>C (Phe632Phe), which tags block 3 #1b, was higher in Japanese (0.139 in this study) than in Caucasians (0.035) (Seck et al. 2005). Hap13 assigned in 157 Caucasians by Seck et al. (2005) is the counterpart of block 3 #1b, and its frequency (0.012) was much lower than that in Japanese (0.138).

In contrast, IVS10-15T>C linked to 85T>C (*9) or 496A>G (#166V) within block 1 showed a lower allele frequency in Japanese (0.018) than in Caucasians (0.127). Seck et al. (2005) assigned hap7 as the haplotype containing IVS10-15T>C alone with a haplotype frequency of 0.03 in Caucasians. In Japanese, however, the corresponding haplotype was not found.

Allele frequencies of IVS18-39G>A and IVS22-69G>A, which are tagging SNPs for block 6 #1b and #1f, respectively, are lower in Japanese (0.032 and 0.003, respectively) than in Caucasians (0.105 and 0.183, respectively).

Taken together, our data demonstrated considerable differences in the haplotype distributions in blocks 1, 3 and 6 between Japanese and Caucasians.

Discussion

This study provides Japanese data on the genetic variations of DPYD, a gene encoding a key enzyme catalyzing degradation of the well-known anticancer drug 5-FU. Nine novel (Ala10Glu, Tyr109Asn, Asn151Asp, Ile245Phe, Glu265Lys, Val515Ile, Phe524Leu, Ser556Arg, and Asn893Ser) and seven known nonsynonymous variations (Cys29Arg, Met166Val, Val335Leu, Ile543Val, Arg592Trp, Val732Ile, and Thr768Lys) were found in our Japanese population (Table 2 and Fig. 1). The association analysis between the genotypes and 5-FU pharmacodynamics is now on-going.

Uneven distributions of coding SNPs over 23 DPYD exons were pointed out in the previous review by van Kuilenburg (2004). The author indicated that 81% of all reported variations were confined to exons 2–14, representing 61% of the coding sequences, and typical hotspots of variation were localized in exons 2, 6, and 13. Our Japanese data also revealed that 17 out of 21 coding variations (81%) were localized in exons 1–14, and that more than three variations were detected in exons 5, 13, and 14 (Fig. 1). Recently, Hormozian et al. (2007) have reported that the common chromosomal fragile site on 1p21.2, FRA1E, spans 370 kb of genomic sequence between introns 8 and 18 of DPYD, and that its core region with the highest fragility is located between introns 12 and 16. The instability at the core of FRA1E might be associated with the high mutational rates and recombinogenic nature from intron 12 to 14 of DPYD (Fig. 1).

To estimate potential functional consequences of the amino acid substitutions, we examined whether the positions of amino acid changes are located in highly conserved areas or potentially critical regions of the molecule (for example, substrate recognition sites or binding regions of prosthetic groups). We also considered the locations of the residues in a three-dimensional (3D) framework provided by the crystal structures of pig DPD, which have recently been determined in complexes with NADPH and substrate (5-FU) (Dobritzsch et al. 2001) or inhibitors (Dobritzsch et al. 2002). The amino acid sequences of pig and human DPD are 93% identical (Mattison et al. 2002), and the substituted residues and their neighboring residues are conserved between both enzymes. From these points of view, it is speculated that at least two substitutions (Glu265Lys and Arg592Trp) might impact the structure and function of DPD as discussed below.

Glu265 is located on the loop following to the third β sheet (IIβ3) in the FAD binding domain II (Dobritzsch et al. 2001). Glu265 is conserved among four mammalian species (human, mouse, rat, and pig), although it is replaced with aspartic acid in bovine and Drosophila melanogaster DPDs (Mattison et al. 2002). In the 3D structure of pig DPD (Fig. 5a), Glu265 is in close proximity to Lys259. The substitution, Lys259Glu, was detected in the patient exhibiting severe mucositis during cyclophosphamide/methotrexate/5-FU chemotherapy (Gross et al. 2003). Furthermore, the adjacent Leu261 interacts via the main chain atoms with the N6, N1, and N3 atoms of adenine of FAD, and has an important role in the proper orientation of the adenine moiety in the FAD-binding pocket (Dobritzsch et al. 2001). Moreover, the carboxyl group (Glu265-Oε)might form hydrogen bonds to the main chain nitrogen of Ser260 next to Leu261. Thus, the change in polarity from negative to positive by the novel Glu265Lys substitution is likely to cause structural changes affecting proper binding of FAD.

Fig. 5
figure 5

Stereo view of the variation sites in pig DPD (accession code of the Protein Data Bank: 1gth). Glu265 (a), Arg592 (b) and their adjacent residues are shown as ball-and -stick models with oxygens in red, nitrogens in blue, carbons in gray and sulfur in yellow. The adenosine moiety of the cofactor FAD is also shown in pink (a)

Arg592 is located at one (IVβc) of the additional four-stranded antiparallel β sheets (IVβc-βf) inserted at the top of a typical (α/β)8 barrel fold in the FMN-binding domain IV (Dobritzsch et al. 2001). Arg592 is completely conserved among the above-mentioned six species (Mattison et al. 2002), suggesting its functional importance. Arg592 closely contacts Met599 (2.9 Å) and Gln604 (2.8 Å) in the same subunit and Ser994 (2.9 Å) in another subunit (Fig. 5B). The substitution of tryptophan for Arg592 is likely to weaken these interactions due to altered hydrophobicity and electrostatic changes. Arg592Trp was recently reported from a Korean population with an allele frequency of 0.004, although its functional significance remains to be confirmed (Cho et al. 2007).

As for known DPYD alleles, their distributions in several populations are becoming more evident by recent reports. For example, IVS14 + 1G>A (*2) (van Kuilenburg 2004), 295_298delTCAT (Phe100SerfsX15, *7) (Seck et al. 2005), 1679T>G (Ile560Ser, *13) (Collie-Duguid et al. 2000; Morel et al. 2006) 2846A>T (Asp949Val) (Seck et al. 2005; Morel et al. 2006), all of which are associated with decreased DPD activities, are detected in Caucasians with allele frequencies of 0.01–0.02, 0.003, 0.001 and 0.006–0.008, respectively. However, none of them were detected in our Japanese samples, while 1003G>T (Val335Leu, *11) and 2303C>A (Thr768Lys) have been found only in Japanese, indicating that variations with clinical relevance do not overlap between Caucasians and Japanese.

2303C>A (Thr768Lys), which was originally found in a Japanese female volunteer with very low DPD activity (Ogura et al. 2005), is relatively frequent in Japanese (allele frequency = 0.0279). Functional characterization in vitro revealed that 768Lys caused thermal instability of the variant protein without changing its affinity for NADPH or kinetic parameters toward 5-FU. Therefore, they might cause 5-FU-related toxicities in Japanese.

1003G>T (Val335Leu, *11) was found in a Japanese family with decreased DPD activity by Kouwaki et al. (1998). By in vitro expression in E. coli, they demonstrated that the variant protein with Leu335 showed a significant loss of activity (about 17% of the wild-type protein). Dobritzsch et al. (2001) suggested from the 3D structure of pig DPD that Val335Leu, in spite of a conservative change, disturbs packing interactions in the hydrophobic core formed by IIIβ3 and IIIα3 within the Rossman-motif, thereby affecting NADPH binding. In our study, heterozygous 1003G>T (Val335Leu) was found from a patient administrated 5-FU (allele frequency = 0.0015), who also has seven other variations: IVS12–11G>A, 1896T>C (Phe632Phe), and IVS16–94G>T are heterozygous, and 1627A>G (Ile543Val), IVS13 + 39C>T, IVS14–123C>A, and IVS15 + 75A>G are homozygous, indicating that at least Val335Leu is linked to Ile543Val (*5).

On the other hand, Caucasians and Japanese share four variations: *5 (Ile543Val), *9 (Cys29Arg), Met166Val, and *6 (Val732Ile), although their allele frequencies were different, especially for *9 (Table 4). Because they have not necessarily correlated with phenotypic changes (e.g., differences in DPD enzyme activity, 5-FU pharmacokinetics and pharmacodynamics) (Collie-Duguid et al. 2000; Johnson et al. 2002; Zhu et al. 2004; Seck et al. 2005; Ridge et al. 1998a, 1998b; Hsiao et al. 2004), all of these variations are generally accepted as common polymorphisms that result in unaltered function. Consistent with this, van Kuilenburg et al. (2002) suggested that the substitution Cys29Arg on the protein surface was unlikely to alter DPD activity. However, conflicting results were reported regarding *9 (Vreken et al. 1997, van Kuilenburg et al. 2000), *6 (van Kuilenburg et al. 2000), and Met166Val (van Kuilenburg et al. 2000; Gross et al. 2003). To interpret these inconsistencies, haplotype analysis of DPYD might be helpful. Especially for *9 and Met166Val in Japanese, functional involvement of −477T>G (block 1 #9c and #9e), −243G>A (block 1 #9d), IVS10-15T>C (block 1 #9d and #166Va) and many other HapMap SNPs linked to *9 and Met166Val (Table 3) needs clarification.

The HapMap project provides genotype data of more than 1,000 sites located mostly in the intronic regions of DPYD for four different populations (Nigerian, Chinese, Japanese and Caucasians). HapMap data on 44 unrelated Japanese subjects showed that 476 variations are polymorphic, whereas 529 are monomorphic, and the average density of polymorphic markers is 1 SNP per 1,772 bp. In contrast, our study focused on exons and surrounding introns to detect variations, and only nine variations overlapped with the HapMap data. Therefore, we could not utilize the HapMap data to further identify common subtypes of #1 to be discriminated by many intronic HapMap SNPs in each block. However, most of the frequent SNPs are unlikely to be associated with substantially decreased DPD activity because DPD activity in the healthy Japanese population (N = 150) showed a unimodal Gaussian distribution (Ogura et al. 2005).

On the other hand, in 60 unrelated Caucasian subjects in the HapMap project, 617 are polymorphic, whereas 383 are monomorphic. LD profiles of these polymorphisms were compared between Caucasians and Japanese by using the program Marker (http://www.gmap.net/marker). Strong LD (|D′| > 0.75) clearly decays within introns 11, 12, 13, 14, 16, 18, and 20 in Japanese, whereas, similar decays are observed within introns 13, 14, 18, and 20, but are not obvious within introns 11, 12, and 16 in Caucasians (data not shown). Moreover, strong LD decays within intron 3 in Caucasians. Therefore, the LD blocks are considerably different between Japanese and Caucasians. Along with the marked differences in allele frequencies of several variations (Table 4), these results suggest that the haplotype structures in DPYD are quite different between the two populations.

In conclusion, we found 55 variations, including 38 novel ones, in DPYD from 341 Japanese subjects. Nine novel nonsynonymous SNPs were found, some of which were assumed to have impact on the structure and function of DPD. As for known variations, we obtained their accurate allele frequencies in a Japanese population of a large size and showed that variations with clinical relevance do not overlap between Caucasians and Japanese. In Japanese, 2303C>A (Thr768Lys) and 1003G>T (Val335Leu) might play important roles in 5-FU-related toxicity. Along with differences in haplotype structures between Japanese and Caucasians, these findings suggest that ethnic-specific tagging SNPs should be considered on genotyping DPYD. Thus, the present information would be useful for pharmacogenetic studies for evaluating the efficacy and toxicity of 5-FU in Japanese and probably in East Asians.