Introduction

The anticoagulant warfarin is administered for the prevention and treatment of thromboembolic disorders that have the potential of causing pulmonary embolisms and strokes, and for reducing the risk of death from heart attacks. With a 1.45-fold increase in prescriptions between 1998 and 2004,1 warfarin has become the most broadly used anticoagulant worldwide. Although warfarin therapy has the potential to be highly effective, there are risks of incorrect dosing that are lethal in some cases. Among patients over 65 years of age, warfarin is one of three drugs responsible for approximately one-third of adverse drug events treated in the emergency room.2 In addition, US death certificates reviewed by Wysowski et al.1 indicate that between 2003 and 2004, anticoagulants were the leading cause of death among drugs that cause adverse effects. These statistics can be attributed to the high interindividual variability in warfarin dosing requirements and the combination of genetic and non-genetic factors that contribute to an individual's sensitivity to the medication.3

Dosing of warfarin attempts to maintain the international normalized ratio of prothrombin time between 2.0 and 3.0, in between which there is minimal risk of over or under anticoagulation.4, 5 With a narrow therapeutic window and large interindividual variability, clinical factors, such as age, height, sex, diet and drug–drug interactions6, 7, 8 explain 126 to 15%9 of the observed variability in warfarin dosing. Genetic factors have also been shown to affect warfarin maintenance.3, 10 Previous studies have demonstrated that two genes, VKORC1 and CYP2C9, which are involved in the vitamin K-dependent clotting pathway, explain an additional 30–54% of variance observed in warfarin dosing.6, 9, 11

Warfarin exists as a mixture of two stereoisomers, R and S.8, 12 S-warfarin is approximately three times as active as the R-enantiomer12 and works by antagonizing the vitamin K-dependent clotting pathway. Vitamin K epoxide reductase subunit 1 (VKORC1), having a major role in the vitamin K pathway, is the target protein of warfarin.13 Within the VKORC1 gene, the single nucleotide polymorphism (SNP) rs9923231 has been extensively studied among European and Asian populations. Original research by Wadelius10 showed that VKORC1 predicts 30% of the variance observed in warfarin dose. Another critical gene is CYP2C9, which codes for the isoenzyme responsible for metabolizing S-warfarin. The gene CYP2C9 has been cited as accounting for 80–85% of the elimination of warfarin.5 Specifically two variants of this highly polymorphic gene, CYP2C9*2 (rs1799853) and CYP2C9*3 (rs1057910) have been shown to reduce warfarin metabolism and lower the mean warfarin dose requirements,14, 15, 16 thereby significantly increasing an individual's risk of having complications related to bleeding.17 A review by Schelleman et al.18 illustrates the large number of CYP2C9*2 and *3 allele frequency studies that have focused on populations of European ancestry, in which these genetic variants are common. Together they have been found to account for 1210 to 18%6 of the variation in warfarin dose. Recent research has identified a third gene, CYP4F2 as influencing an individual's sensitivity to warfarin.9, 19, 20 CYP4F2 codes for a vitamin K1 oxidase that causes a decrease in the metabolism of vitamin K1.21 Several studies have shown that carriers of the derived T allele at the non-synonymous SNP rs2108622 require higher warfarin doses than subjects with the ancestral C allele.22 Studies on European populations have indicated that CYP4F2 rs2108622 may account for an additional 1–7% of observed variation in warfarin dosing.9, 22, 23 The effect of this CYP4F2 polymorphism seems to be substantially smaller than the effect of the VKORC1 and CYP2C9 variants, and the clinical relevance of CYP4F2 has recently been questioned.24

The traditional approach for warfarin dosing is based on an initial fixed dose, followed by stabilization through trial and error.25 Typically, in individuals of Asian ancestry the initial dose is 3.5 mg and in individuals of European ancestry is 5 mg or 10 mg.25, 26 In this respect, it is important to mention that several studies have characterized individuals of African ancestry as requiring higher average warfarin doses and Asians as requiring lower doses than Europeans.3, 26 Research has recognized the influence that genetics has on the metabolism of warfarin and led to a revision of warfarin labels by the US Food and Drug Administration in 2007. New package inserts have been designed to include information on the effects of VKORC1 and CYP2C9 polymorphisms. Replacing the fixed dose approach, multiple regression models and various dosing algorithms, which incorporate VKORC1 and CYP2C9 variants along with clinical and environmental information have been developed and successfully explain anywhere from 53–65% of variation in warfarin dose among Europeans and Asians.7, 10, 19, 25, 27 In general, there has been reduced success in groups, such as African Americans, in which dosing algorithms have only explained up to 36% of the variation in warfarin dose.28 This has been attributed to greater genetic diversity among people of African ancestry and the possible influence of other variants in the VKORC1 and CYP2C9 genes.18 In contrast, a study examining the predictive capability of a new algorithm for warfarin dosing in Brazil has had much more success. When applied to Brazilian patients of self-identified ‘race/color’, researchers observed that it explained 51% of the total variance in warfarin dose.29 This compares well with the power of other dosing algorithms developed for European and Asian populations, and works equally well among white and black Brazilians.29 The development of a universally useful algorithm will, however, rely on further research into the frequency of VKORC1, CYP2C9 and CYP4F2 variants in a greater number of populations and characterization of other genetic variants associated with warfarin dosing.

The majority of warfarin association studies have focused on individuals of European, Asian and, to a lesser extent, African ancestry. Improvement in the safety of warfarin therapy requires a comprehensive description of the occurrence of the key alleles known to influence the metabolism and dosing of warfarin. In this study, we present a more complete picture of the worldwide frequency distribution of four genetic polymorphisms known to influence warfarin dosing: VKORC1 (rs9923231), CYP2C9 (rs1799853 and rs1057910) and CYP4F2 (rs2108622). We characterized these SNPs in DNA samples from the Centre Etude Polymorphism Humain (CEPH) Human Genome Diversity Project (HGDP), which includes 963 individuals from seven geographic regions: Africa, the Middle East, Europe, South/Central Asia, East Asia, the Americas and Oceania. Additionally, we also analyzed these polymorphisms in 316 DNA samples of individuals of European, East Asian and South Asian ancestry living in Canada. We show that the VKORC1 rs9923231 polymorphism shows a unique allele frequency distribution, with the derived T allele showing very high allele frequencies in East Asian populations. Given this unusual distribution, we carried out three tests of selection in the HapMap East Asian sample, and report that the VKORC1 gene is an outlier for two of the tests, probably as a result of positive selection in East Asian populations.

Materials and methods

DNA samples

We obtained DNA samples of 963 individuals from 52 world populations from the Foundation Jean Dausset-CEPH in Paris. Each individual's geographic origin was available through a database containing information on HGDP-CEPH participants. Individuals were grouped into seven geographic regions: Africa (N=108), Middle East (N=164), Europe (N=159), South/Central Asia (N=204), East Asia (N=233), America (N=64) and Oceania (N=31). Information on sample sizes of individual populations is available in Supplementary Table 1. All samples met the ethical requirements set forth by the HGDP and the Morrison Institute for Population Resource Studies at Stanford University.

The second set of samples used in this study is comprised of 316 subjects that were recruited between 2007 and 2009 at the Molecular Anthropology Laboratory, University of Toronto at Mississauga (Mississauga, Canada). Geographic origin was determined through a questionnaire whereby questions pertaining to the participant's, their parent's and grandparent's place of birth were recorded. Information provided was used to classify individuals into three broad geographic regions: 121 from Europe, 100 from East Asia and 95 from South Asia. The study was approved by the University of Toronto Health Sciences Research Ethics Board.

Genotyping

Four SNPs known to influence the metabolism of warfarin (rs9923231, rs1799853, rs1057910 and rs2108622) were genotyped at KBiosciences (http://www.kbioscience.co.uk/) using a KASPar assay based on fluorescent genotyping and allele-specific PCR. To verify genotyping quality, 170 samples were genotyped as blind duplicates, and the concordance rate between the samples and duplicates was 100%.

Statistical analysis

Tests evaluating deviations from Hardy–Weinberg proportions and hierarchical analyses of molecular variance were carried out with the program Arlequin.31 Tests of allele frequency differences between pairs of populations were done with the program Arlequin, or alternatively, with the program Genepop (http://kimura.univ-montp2.fr/~rousset/Genepop.htm). Three different tests of selection were used to evaluate evidence of positive selection in the HapMap East Asian sample for the gene VKORC1: the locus-specific branch length (LSBL) test, the log of the ratio of heterozygosities (lnRH) test and Tajima's D test.32, 33, 34 The results reported here are based on a genome-wide analysis of the HapMap data, which includes approximately 4 million SNPs (http://hapmap.ncbi.nlm.nih.gov/). The LSBL test evaluates whether genetic markers within a genomic region show unusual levels of differentiation with respect to the genome average. This test apportions the genetic variation observed in East Asian, European and West African populations for each SNP, and identifies markers with high levels of genetic differentiation in the East Asian sample. The lnRH test highlights genomic regions with low levels of genetic diversity in the population of interest in comparison with other population groups. This statistic was calculated for a two-way population comparison between East Asians and Europeans, and East Asians and West Africans, using an overlapping sliding window size of 100 000 base pairs (bp) and moving in 25 000-bp increments along a chromosome. Regions of the genome with negative Tajima's D values (excess of rare alleles with respect to neutral expectations) are also a hallmark of positive selection. However, negative values of D can also result from demographic events, such as population expansions. For this reason, it is important to compare local values of Tajima's D with the empirical levels observed in the genome. As for the lnRH analysis, Tajima's D was calculated for each population using an overlapping sliding window size of 100 000 bp with a 25 000-bp offset. The statistical significance for each of the LSBL, lnRH and Tajima's D statistics was based on the genome-wide empirical distribution, using the formula PE (x)=(number of loci>x)/(total number loci). The three statistics used in this analysis have been described in more detail in the paper by Bigham et al.35

Results

No significant deviations from the Hardy–Weinberg proportions were observed in any of the HGDP-CEPH samples for the CYP2C9 and CYP4F2 polymorphisms. For the VKORC1 rs9923231 SNP, the She from East Asia (P=0.047), Sardinians from Europe (P=0.049) and Papuan from Oceania (P=0.014) showed departures from Hardy–Weinberg proportions, but the P-values were not significant when multiple testing was considered. For the samples collected in Canada, only the individuals of European ancestry showed a nominally significant deviation from Hardy–Weinberg proportions (P=0.001).

Four polymorphisms were genotyped in the HGDP-CEPH DNA samples. The allele frequencies of each marker are shown in Figures 1, 2, 3 and 4 for all 52 populations. Table 1 reports the average allele frequency of each marker for each geographical region. Tables reporting the P-values for allele frequency comparisons between major geographic groups and between individual populations are provided as Supplementary information (Supplementary Tables S2–S9). The average allele frequencies are in general agreement with the information available for the HapMap samples (http://hapmap.ncbi.nlm.nih.gov/). The derived T allele of the VKORC1 rs9923231 SNP has been associated with lower required warfarin dosing and higher risks of bleeding in multiple studies. The derived T allele has very low frequencies in African populations (lower than 10%), with the notable exception of the San (33%). The frequencies of the T allele are intermediate in Europe (30–65%), the Middle East (41–51%), Central/South Asia (17–61%), Oceania (23–39%) and the Americas (14–75%). Finally, the frequencies of the derived allele are very high in East Asian populations, with frequencies ranging from 75 (She) to 100% (Han and Oroqen). The CYP2C9*2 (rs1799853T) and particularly the CYP2C9*3 (rs1057910 C) alleles lead to reduced warfarin metabolism, and therefore increased risk of bleeding. The distribution of the CYP2C9 alleles is very different from that observed for the VKORC1 polymorphism. The CYP2C9*2 allele is primarily restricted to European (2–29%), Middle Eastern (11–20%) and Central/South Asia populations (2–16%), and it is mostly absent in other population groups. Exceptions are the North Eastern Bantu from Africa (4%), the Yakut from East Asia (2%) and the Maya (2%). Similarly, the highest frequencies of the CYP2C9*3 allele are also observed in Europe (4–21%), the Middle East (3–11%) and Central/South Asia (5–15%). The allele is not observed in Africa or most populations from the Americas, except the Pima (7%). In Oceania, the allele is not present in Melanesians, but in Papua New Guinea the frequency is 12%. Finally, in East Asia, the allele is absent in many populations, but reaches frequencies of 10% or higher in some populations, such as the Tu, Tujia and Xibo. Finally, the non-synonymous CYP4F2 rs2108622 polymorphism has been associated with warfarin dosing. Homozygotes for the C allele (coding for valine) require less warfarin than homozygotes for the T allele (coding for methionine), with heterozygotes requiring intermediate doses. The frequency of the minor T allele is on average low in Africa and the Americas (7 and 10%, respectively). In these geographic areas, the allele is absent in some populations (Southern Bantu, Mandenka and Biaka Pigmy in Africa, and Colombians in the Americas), and it is present at frequencies lower than 23% in other populations. In contrast, the T allele is present at higher frequencies in other regions: Europe (20–44%), Middle East (27–45%), Central/South Asia (26–40%), East Asia (6–57%), with the highest average frequencies found in Oceania (61%).

Figure 1
figure 1

Frequency of VKORC1 rs9923231T and C alleles in 52 Human Genome Diversity Project-Centre Etude Polymorphism Humain (HGDP-CEPH) populations.

Figure 2
figure 2

Frequency of CYP2C9 rs1799853T and C alleles in 52 Human Genome Diversity Project-Centre Etude Polymorphism Humain (HGDP-CEPH) populations.

Figure 3
figure 3

Frequency of CYP2C9 rs1057910 C and A alleles in 52 Human Genome Diversity Project-Centre Etude Polymorphism Humain (HGDP-CEPH) populations.

Figure 4
figure 4

Frequency of CYP4F2 rs2108622T and C alleles in 52 Human Genome Diversity Project-Centre Etude Polymorphism Humain (HGDP-CEPH) populations.

Table 1 Average allele frequencies for four SNPs tested among CEPH samplesa

The same four polymorphisms were analyzed in a sample of Canadians of European, East Asian and South Asian ancestry (allele frequencies are reported in Table 2). In general, the frequencies in the Canadian sample are in agreement with those observed in the relevant HGDP-CEPH geographic groups. For VKORC1 rs9923231, the lowest frequencies of the derived T allele are observed in the Canadian South Asian sample (17% vs an average of 28% in the HGDP-CEPH Central/South Asian sample), followed by the European sample (38% vs an average of 51% in the European HGDP-CEPH sample). The highest frequencies for the T allele were observed in the East Asian sample (86 vs 90% in the HGDP-CEPH sample). For the CYP2C9*2 allele (rs1799853T allele), there is excellent agreement between the Canadian and HGDP-CEPH frequencies. The frequency of the T allele is less than 1% in the East Asian Canadian sample (same as in the HGDP-CEPH sample), 5% in the South Asian Canadian sample (vs 7% in the HGDP-CEPH Central/South Asian sample) and 17% in the European sample (vs 13% in the HGDP-CEPH sample). For the CYPC29*3 allele (rs1057910 A allele), the highest frequency is observed in the Canadian South Asian sample (12 vs 10% in the HGDP-CEPH sample), followed by the European Canadian sample (4 vs 9% in the HGDP-CEPH sample) and the East Asian Canadian sample (2 vs 4% in the HGDP-CEPH sample). Finally, similar to what is observed in the HGDP-CEPH samples, the highest frequencies for the CYP4F2 rs2108622 minor allele are found in the South Asian Canadian sample (48 vs 35%), and they are lower in the European and East Asian Canadian samples (24 vs 30%, and 22 vs 29%, respectively). Tables reporting the P-values for allele frequency comparisons between the Canadian samples and the relevant HGDP-CEPH populations are provided as Supplementary information (Supplementary Tables S10–S13).

Table 2 Allele Frequency for four SNPs tested among Canadian samples

A hierarchical analysis of molecular variance is presented in Table 3. This analysis provides information on the percentage of total variation due to genetic differences between the seven geographic groups (America, East Asia, Central/South Asia, Europe, the Middle East, Oceania and Africa), differences between populations within each of the geographic groups and genetic variation within populations. Calculations were made for all four polymorphisms, and we also report an average over all loci (Table 3). Overall, variation within populations accounted for the largest percentage of variation (80.41%), followed by variation due to differences among the major geographic groups (17.67%). The genetic differences among populations within groups explained a very small percentage of the total variation (1.92%). We observed differences in the patterns of genetic differentiation for the four markers. In particular, the variant rs9923231 within VKORC1 shows much higher genetic differentiation between geographic groups (31.63%) than the other markers. The percentage of variation among groups was around 7% for CYP2C9 rs1799853 and CYP4F2 rs2108622, and lower than 2% for the CYP2C9 rs1057910 polymorphism.

Table 3 Hierarchical analysis of molecular variance for the four loci analyzed in the study

Given the unusual pattern of allele frequencies observed for the VKORC1 rs9923231-derived T allele, which shows very high allele frequencies in East Asian populations, three tests of positive selection were applied to the VKORC1 gene using genome-wide data available for the HapMap East Asian, European and West African samples (approximately four million SNPs). The results of these tests are illustrated in Table 4. The VKORC1 gene showed extreme distributions for each of these population genetic parameters with respect to the genome average. The LSBL test shows that this gene has many markers with extreme levels of differentiation in the East Asian sample. Similarly, the lnRH test indicates that the genetic variation in East Asia for VKORC1 is reduced when compared with that observed in the European sample. The VKORC1 gene is also an outlier for the Tajima's D statistic that has negative values that depart from the average genomic distribution. The implications of these results are addressed in the discussion.

Table 4 Results of three tests of positive selection in the East Asian HapMap sample

Discussion

This study examined the allele frequencies of four genetic polymorphisms known to influence warfarin dosing (VKORC1 rs9923231, CYP2C9 rs1799853, CYP2C9 rs1057910 and CYP4F2 rs2108622) in a total of 963 individuals from the HGDP-CEPH samples and 316 Canadians of diverse geographic ancestry. The ultimate goal was to improve our understanding of the worldwide allele frequency distribution of these polymorphisms and to discuss the results in light of the known effect of these markers on warfarin dosing and the differences in average warfarin dosing that have been reported in the literature.3, 26, 36, 37 For example, Dang et al.26 reported mean weekly warfarin doses of 24 mg for Asian Americans, 31 mg for Hispanics, 36 mg for whites and 43 mg for African Americans, and a recent study by the International Warfarin Pharmacogenetics Consortium36 reported average stable therapeutic warfarin doses of 21 mg per week in Asians, 31.5 mg per week in whites and 40 mg per week in blacks. Most of the previous studies on the polymorphisms analyzed in our study have focused on European and East Asian populations, and data for other geographical regions such as South/Central Asia, Africa, the Middle East, Oceania and the Americas are quite limited. Unnecessary bleeding complications during initial warfarin dosing and knowledge that genetics influences an individual's reaction to warfarin highlight the need for more extensive studies in these areas.

The VKORC1 gene is believed to be the most important individual predictor of warfarin dose.29, 38 Previous research has shown that the derived T allele of the rs9923231 polymorphism, which is located in the promoter region of the gene, is associated with lower warfarin dose requirements. Moreover, there is evidence indicating that the rs9923231 T allele is associated with a reduction of VKORC1 mRNA levels of up to 70%, in comparison to the wild-type allele.13 This variation in mRNA levels may therefore be causing some of the variability observed in warfarin dosing among individuals. The significance of this polymorphism is further evidenced by studies showing that it may be accountable for all of the variability in warfarin metabolism that is due to the VKORC1 haplotype.30 Among the HGDP-CEPH samples, the derived T allele at the VKORC1 polymorphism was present in most world populations, except the Southern Bantu and the Pygmies from Africa. The low frequency of the derived T allele observed among African populations (<10% with the exception of the San who had a frequency of 33%) is consistent with previous research indicating that individuals of African ancestry require a higher average warfarin dose than those of European or Asian descent.3, 26 In contrast, the East Asian populations included in the HGDP-CEPH panel had very high frequencies for the derived T allele, ranging from 75 (among the She) to 100% (Han and Oroqen). Canadians of East Asian ancestry showed a similar frequency for the T allele (86%). These findings are also in agreement with studies indicating that, on average, warfarin dose requirements among East Asian individuals are lower than in African and European subjects.3, 37 The average frequency of the derived T allele in European populations was 51% in the HGDP-CEPH samples and 38% in the sample of Canadians of European ancestry. These values are in the middle of those observed among the African and East Asian populations and also reflect current population-based trends in warfarin dosing. Therefore, a correlation is observed between average warfarin dose requirements and the observed frequency of rs9923231 alleles in African, European and East Asian populations. With respect to other population groups, the frequencies of the derived T allele in the Middle East are similar to those observed in Europe. The average frequency in the HGDP-CEPH Central/South Asian populations is lower than in Europe, and we also observed that the frequency of the derived T allele in the Canadian sample of South Asian ancestry (17%) is lower than the frequency in the sample of European ancestry (38%). Finally, it is important to mention that the frequency of the derived T allele in Native American populations shows considerable variation (14–75%). Perini et al.29 recently analyzed two Amerindian populations, the Guarani and Kaingang from Brazil, and they estimated that 60% of Guarani and 40% of Kaingang individuals have the variant T allele.29 Together with our findings, this suggests that a high percentage of individuals of Native American ancestry may require reduced warfarin doses.29 Our hierarchical analysis of molecular variance indicates that the VKORC1 rs9923231 SNP shows an extremely high differentiation between geographic groups, with more than 31% of the total variation due to differences between the seven groups included in the analysis. This geographic differentiation is much more pronounced than that observed for the other three SNPs analyzed in this study (Table 3).

The CYP2C9 gene was the first gene recognized as influential in the metabolism of warfarin. Two non-synonymous polymorphisms within this gene, CYP2C9*2 (430C>T causing the substitution R144C) and CYP2C9*3 (1075A>C leading to an I359 L substitution) have been the focus of most studies. Past research has indicated that CYP2C9*3 (derived C allele) is more influential on warfarin dose requirements than the CYP2C9*2 (derived T allele) variant.15, 16, 17 The results from genotyping the HGDP-CEPH and Canadian samples are in accordance with literature on the allele frequency of the CYP2C9*2 and *3 variants among European and East Asian populations. The CYP2C9*2 allele has been most extensively studied among European populations in which it appears at a frequency of approximately 10%,37 which is comparable to our findings of frequencies between 13% and 17% in the HGDP-CEPH and Canadian European samples, respectively. Previous research has indicated that the CYP2C9*2 variant is extremely rare if not absent in East Asian populations.37 With the exception of the Yakut (2%), we observed that the derived T allele was absent in all East Asian populations from the HGDP-CEPH sample and appeared at a frequency of 0.5% among the Canadian East Asians. With regards to the CYP2C9*3 allele in these regions, the frequency of the derived C allele was higher in the European populations (average HGDP-CEPH=9% and Canadian samples=4%) than in East Asians (CEPH=4% and Canadian samples=1%). These figures are in general agreement with those reported by Kim et al.,37 who found that the frequency of the CYP2C9*3 variant was 8% in Europeans and 2% in East Asians. With respect to the distribution of these polymorphisms in other population groups, it is interesting to note that, whereas the frequency of the CYP2C9*2 allele is lower in Central/South Asian populations than in Europe (average HGDP-CEPH in Central/South Asia=7%, and Canadian South Asian sample=5%), the opposite is true for the CYP2C9*3 allele (average HGDP-CEPH in Central/South Asia=10%, and Canadian South Asian sample=12%). In African, Melanesian and Native American populations CYP2C9*2 and CYP2C9*3 tend to be absent or at very low frequencies, with the exception of CYP2C9*3 in Papuans (12%).

The most recent polymorphism reported to influence warfarin dosing has been the non-synonymous rs2108622 SNP (C/T polymorphism) located within the CYP4F2 gene. Caldwell et al.19 studied three independent white US cohorts and found a 4–12% increase in warfarin dose per T allele. On the basis of the frequency distributions of the derived T allele (30% among Europeans and Asians vs approximately 7% in Africans), these authors suggested that the expected contribution of this SNP to stable warfarin dose in Africans will be lower than in Europeans and Asians. After correcting for the effects of VKORC1 and CYP2C9, a recent genome-wide association study on Swedish patients has confirmed the association of rs2108622 with warfarin dose.9 However, another study by Zhang et al.24 in the United Kingdom concluded that there was no clear association between this polymorphism and stable warfarin dose. Our study indicates that the derived T allele is present at relatively high frequencies in most worldwide populations. The main exceptions are African and Native American populations, in which the derived T allele has relatively low frequencies or is absent. The highest frequencies observed for the T allele are found in Melanesian populations (>60%). If further research confirms the effect of rs2108622 on warfarin dose, it has the potential to be useful in future dosing algorithms.

As described above, the VKORC1 rs9923231 SNP has a very unusual allele frequency distribution. This polymorphism shows very high levels of geographic differentiation. In particular, the derived T allele is almost absent in Africa and has extremely high frequencies in East Asian populations. We applied three tests of positive selection to the East Asian HapMap sample to explore whether this unusual pattern of polymorphism could be the result of positive selection in East Asia. The three tests of selection are based on different characteristics of the data: genetic differentiation (LSBL test), genetic diversity (lnRH test) and allele frequency distribution (Tajima's D). Interestingly, we observed significant results for the VKORC1 gene in the three tests (Table 4). These tests are based on a comparison of the values of the aforementioned statistics for the VKORC1 gene with the genome-wide empirical distribution. These analyses show that VKORC1 has an extreme genetic differentiation in the East Asian sample, with eight markers in the top 5% of the empirical distribution. Furthermore, when comparing the genetic diversity (heterozygosity) of the East Asian and the European HapMap samples, the VKORC1 gene shows a relative reduction of genetic diversity in East Asia with respect to what is observed in the rest of the genome. Finally, the analysis of the Tajima's D statistic shows an excess of rare alleles (negative Tajima's D) in the VKORC1 gene with respect to the genome-wide average. It is important to note that the average Tajima's D value in the HapMap East Asian sample is positive (average value for the empirical data=1.74) because the HapMap data set is biased toward common polymorphisms. It is well known that negative Tajima's D values may be the result of positive selection or demographic processes such as population expansions. However, demographic processes such as population expansions would be expected to affect patterns of variation at all loci in the genome, whereas natural selection will act only on specific loci. Therefore, conditioning on the genome-wide distribution helps to differentiate between these two possibilities. Interestingly, Carlson et al.39 also identified a genomic region including the VKORC1 gene as a potential target of selection in East Asians using a sliding window analysis of Tajima's D based on a different data set (Perlegen, Mountain View, CA, USA). Although we cannot entirely eliminate the possibility that our results are due to stochastic processes, the extreme distributions observed for three different parameters point to positive selection as a potential explanation. It would be extremely interesting to explore which potential selective factors could have driven the T allele to such high frequencies in East Asian populations. The protein encoded by VKORC1 has a critical role in the vitamin K cycle and the activation of vitamin K-dependent proteins. In addition to the well-known effect of these proteins in hemostasis, recent evidence indicates that vitamin K-dependent proteins are present in numerous tissues and may be important in bone mineralization, regulation of calcification and other biological processes.40, 41, 42

In conclusion, we report the allele frequency distribution of four SNPs known to affect warfarin metabolism in a large sample representing individuals from around the world. To date, most of the studies have been carried out in European and East Asian populations. Therefore, it is important to provide information on samples from other geographic regions to better understand the potential impact of these polymorphisms at the population level. Our data also have interesting implications in terms of the application of warfarin-dosing algorithms in different populations. Differences in allele frequency distribution between groups can have an impact on the amount of variance in required warfarin dosing explained by the algorithms. Given the relative lack of polymorphism of the VKORC1 rs9923231 and the CYP2C9*2 and *3 SNPs in African populations, it is not surprising that application of algorithms including these markers in cohorts of individuals of African ancestry explain less variation in warfarin dose than in cohorts of individuals of European ancestry, as it has been reported in recent studies.18, 30 Unfortunately, the VKORC1, CYP2C9 and CYP4F2 genes remain poorly characterized in many population groups, and there may be other variants within these genes affecting warfarin dose that would be relevant to include in the pharmacogenetic-based dosing algorithms. For example, Scott et al.43 recently described a variant (CYP2C9*8) present in an African-American male with a lower than predicted warfarin dose (based on prior genotyping of CYP2C9 and VKORC1 alleles). On further characterization of this variant in an African-American sample, the authors reported that CYP2C9*8 is the most frequent variant in this sample, and predicted that incorporation of this polymorphism in pharmacogenetic-based dosing algorithms could potentially reclassify the predicted metabolic phenotypes of almost 10% of African Americans. The main goal of our study was to provide worldwide allele frequency information for the most commonly studied variants influencing warfarin dosing (CYP2C9 rs1799853 and rs1057910, and CYP4F2 rs2108622). However, it is important to note that, in addition to the variants analyzed in this study, recent pharmacogenetic-based dosing algorithms now include other polymorphisms such as CYP2C9*5 and *6. Future studies will be required to characterize the frequency of these and other polymorphisms (for example, CYP2C9*8) in worldwide populations. Ideally, existing algorithms should be validated in different populations and efforts directed to identify new functional variants that could be included in pharmacogenetic-based dosing algorithms. The dramatic advances in high-density microarray and next generation sequencing technologies and the consequent reduction in costs will make it possible to obtain a much better understanding of the genetic factors influencing warfarin dosing in different populations and the development of pharmacogenetic-based algorithms that will reduce adverse reactions to warfarin.