Pharmacogenomically relevant markers of drug response and adverse drug reactions are known to vary in frequency across populations. We examined minor allele frequencies (MAFs), genetic diversity (FST) and population structure of 1156 genetic variants (including 42 clinically actionable variants) in 212 genes involved in drug absorption, distribution, metabolism and excretion (ADME) in 19 populations (n=1478). There was wide population differentiation in these ADME variants, reflected in the range of mean MAF (ΔMAF) and FST. The largest mean ΔMAF was observed in African ancestry populations (0.10) and the smallest mean ΔMAF in East Asian ancestry populations (0.04). MAFs ranged widely, for example, from 0.93 for single-nucleotide polymorphism (SNP) rs9923231, which influences warfarin dosing to 0.01 for SNP rs3918290 associated with capecitabine metabolism. ADME genetic variants show marked variation between and within continental groupings of populations. Enlarging the scope of pharmacogenomics research to include multiple global populations can improve the evidence base for clinical translation to benefit all peoples.
Rapid advances in genomic science have led to identification by regulatory agencies of a growing list of clinically important biomarkers for drug response and toxicity. For example, the US Food and Drug Administration (FDA) now maintains a table of pharmacogenomic biomarkers in drug labels,1 while the UK Medicines and HealthCare Products Agency and the European Medicines Agency both have mechanisms for considering such biomarkers for targeted therapy and drug safety warnings. Clinically validated pharmacogenomic biomarkers can help physicians optimize drug selection, dose and treatment duration while averting adverse drug reactions.2 However, the drive to position pharmacogenomics as a core element in personalized medicine still suffers from limited data. For example, it is estimated that over 90% of drugs currently used in clinical practice lack valid and predictive biomarkers for therapeutic effects and/or avoiding severe side effects.3 Another limitation is that our understanding of the distribution of human pharmacogenomic variation remains limited due in part to the poor representation of ethnically diverse samples from various parts of the world in such studies. A more comprehensive understanding of the genetic landscape of the absorption, distribution, metabolism and excretion (ADME) genes across global populations with different ancestral backgrounds can facilitate the translation of pharmacogenomics data to clinical practice and public health policy. Achieving this task is now more feasible with increasing access to genotyping and sequencing technologies, as well as the availability of gene chips specifically designed to assess polymorphic alleles of drug metabolizing enzymes and other genes involved in the ADME of drugs.4, 5 In this study, we provide a comprehensive analysis of human genetic variation on ∼200 ADME genes in 19 global populations, including the largest set of African ancestry populations studied for pharmacogenomics. We describe the range of variation observed at multiple layers spanning from continental groups to the individual. We discuss the impact of the observed variation on clinical decision making, as well as the utility of such data for regulatory purposes, including testing recommendations.
Materials and methods
A total of 1478 individuals from 19 populations with ancestry from different parts of the world were included in this study (Supplementary Table 1). Fifteen of these populations were from the 1000 Genomes Project (http://www.1000genomes.org/) sample collection. The populations (and their designated labels) were: Yoruba in Ibadan, Nigeria (YRI); Luhya in Webuye, Kenya (LWK); Maasai in Kinyawa, Kenya (MKK); African ancestry in Southwest USA (ASW); Utah residents with Northern and Western European ancestry from the Centre d'Etude du Polymorphisme Humain (CEPH) collection (CEU); Toscans in Italy (TSI); British from England and Scotland (GBR); Finnish from Finland (FIN); Iberian populations in Spain (IBS); Han Chinese in Beijing, China (CHB); Han Chinese South (CHS); Japanese in Tokyo, Japan (JPT); Mexican ancestry in Los Angeles, California (MXL); Puerto Rican in Puerto Rico (PUR); and Columbians in Medellin (CLM; Figure 1). The remaining four populations were obtained from ongoing studies in West Africa and the United States as follows: three groups—Igbo from Nigeria (IGBO); Akan from Ghana (AKAN) and Gaa-Adangbe from Ghana (GAA)—were obtained from participants in the Africa America Diabetes Mellitus study6 and the fourth group comprised African Americans from the metropolitan Washington, DC area that participated in the Howard University Family Study (HUFS).7
Samples from five population groups (IGBO, AKAN, GAA, MKK and HUFS) were directly genotyped using the Affymetrix DMET Plus platform (Santa Clara, CA, USA) at the National Human Genome Research Institute Microarray Core laboratory at the National Institutes of Health (Bethesda, MD, USA) as described in the Supplementary Methods. For the other 14 groups, DMET Plus markers were extracted from the 1000 Genomes as described (Supplementary Methods). To facilitate continental level and other comparisons, populations were grouped as follows: continental African samples (YRI, IGBO, GAA, AKAN, LWK and MKK) were designated as AFR; continental African samples plus the African-American populations (ASW and HUFS) were designated AFR+AA; continental European and Centre d'Etude du Polymorphisme Humain samples (CEU, TSI, GBR, FIN and IBS) were designated as EUR; continental East Asian samples (CHB, CHS and JPT) were designated as EAS; and Latin American samples (MXL, PUR and CLM) were designated as AMR. Data management is further described (Supplementary Methods).
The intersection of the DMET assay markers with the 1000 Genomes phase 1 data set yielded a final analytic data set comprising 1156 markers from 212 ADME-related genes shared across all 19 populations. In addition, we analyzed an ‘actionable’ subset of 42 variants identified by the FDA or the Pharmacogenetics for Every Nation Initiative (www.pgeni.org) as important pharmacogenomic biomarkers. Markers in this subset are listed in various drug labels1 and have supporting evidence of clinical utility in the pharmacogenomics database: PharmGKB (www.pharmgkb.org). Additional information describing the actionable pharmacogenomics markers included in this study is available in the Supplementary Methods.
Minor allele frequencies (MAFs) were computed using the variant allele of pharmacogenomic effect as the reference. The maximum MAF minus the minimum MAF for a given allele within a specified group of populations was calculated to define the range of difference (ΔMAF). Density plots for comparing ΔMAF were drawn using the R software package (www.r-project.org). Pairwise FST was used as a measure of population differentiation for a given marker between populations. FST values range from zero to one with one meaning that the two populations being compared are completely separated and zero means no divergence (that is, the populations are freely sharing genetic materials through interbreeding). More information on FST estimates is described (Supplementary Methods).
Principal components of ancestry were computed by decomposing the centered genotype matrix of the entire data set (1478 individuals and 1156 markers). The number of significant principal components was estimated using the minimum average partial test.8 ADME variants are a subset of overall human genetic variation and there is abundant evidence that they are under selection.9 Therefore, we evaluated the similarity between the distribution of the studied variants and a random set of non-ADME markers across the genome. Procrustes analysis,10 a method for comparing spatial maps of human population genetic variation, was used to conduct a statistical comparison of the shape of the distribution of the ADME markers against that of ∼13 000 randomly sampled single-nucleotide polymorphisms (SNPs; Supplementary Methods).
Global variation of ADME SNPs
The mean ΔMAF for all 1156 SNPs tested was 0.25 across all populations; similarly, the mean ΔMAF was 0.29 for the selected 42 ‘actionable’ variants across all samples (Table 1). About 80% of the 1156 SNPs studied showed a ΔMAF of 0.05 across all populations, illustrating the diversity in allele frequencies across these pharmacogenomically relevant variants. Notably, frequencies for the reference alleles (that is, MAFs) of the selected 42 clinically ‘actionable’ markers varied widely (Figure 2 and Supplementary Figure 1). Some markers covered nearly the entire frequency range. A good example of such a marker is SNP rs9923231 with a global MAF that ranged from 0.02 to 0.95; this SNP has been shown to influence warfarin dosing11—an anticoagulant prescribed to prevent blood clots. In contrast, some variants (for example, SNP rs3918290 with global MAF range of 0–0.01), displayed similar allele frequencies across all population groups regardless of geography, ancestry or ethnicity; SNP rs3918290 is located in the dihydropyrimidine dehydrogenase gene (DPYD) and is associated with adverse effects from the chemotherapeutic agent capecitabine.12 Capecitabine is converted to 5-fluorouracil, which inhibits DNA synthesis in the targeted tumor.
Inter- and intra-continental variation
Comparison of all variants or just the clinically ‘actionable’ variants revealed that the AFR+AA populations displayed the greatest difference in ΔMAF as expected from the well-documented rich genetic diversity of African ancestry populations. Although 90% of all ADME markers had similar frequency differences (less than a ΔMAF 0.2) within continental ancestral groups (Supplementary Table 2 and Supplementary Figure 2), the distribution of MAF differed when comparing the number of markers for a given ΔMAF interval between continental ancestral groups. For example, 361 (31%) variants had less than a 0.05 difference in frequency across AFR+AA populations compared with 553 (48%) and 799 (69%) for EUR or EAS populations, respectively (Supplementary Table 2). These differences in proportions were statistically significant (P<1 × 10−4 for all pairwise comparisons: AFR+AA versus EUR, AFR+AA versus EAS, EUR versus EAS). For variants with a ΔMAF0.20, the majority were observed in AFR+AA populations (129 SNPs, 11%) compared with EUR, EAS or AMR populations, which had only 63 (5%), 7 (<1%) or 43 (4%) SNPs within that range, respectively. Moreover, for a given ΔMAF interval, markers were not necessarily shared across ancestral groups. Of the 63 SNPs in EUR populations with a ΔMAF0.20, only 19 SNPs were also observed with the same frequency range in AFR+AA samples, 5 SNPs in AMR samples and none in EAS samples. This implies that for a given continental ancestry grouping, different sets of ADME markers were responsible for a given range of allele frequency differences between ethnic groups. Furthermore, there was no correlation between inter-continental allele frequency distribution and intra-continental allele frequency distribution among the ADME markers studied.
Population structure and pharmacogenomic variants
Global populations are known to show genetic population structure.13 We investigated the hypothesis that pharmacogenomic variants recapitulate this population structure. We computed principal components of the genotypes for the set of ADME markers across all population samples (Supplementary Figure 3) and compared this with PCs computed for a random set of markers of equivalent size for the same number of chromosomes from each population. As expected,14, 15 all AFR populations cluster tightly together with the notable exception of the Masai ethnic group—MKK. The African-American samples (HUFS and ASW) were anchored by the AFR populations as well as EUR populations consistent with the history of African and European admixture in African Americans. The AMR populations showed some separation from the EUR samples in the direction of the AFR samples and the EAS populations constituted another major cluster.
Procrustes analysis10 verified concordance between population and genetic data from ADME markers and nearly 13 000 randomly sampled genotypes for each population. The principal components analysis plot illustrated separation between ancestral groups but also indicated genetic diversity within a given ancestry. Highlighting this point were the differences in MAF observed within closely related groups (Supplementary Figure 1). For example, the ΔMAF for rs3211371 was 0.20 for AFR+AA populations; however, much of this frequency range was attributable to MAF differences within a given country as opposed to population samples between countries. This was evident within the two Nigerian samples; YRI had a MAF approaching monomorphic compared with IGBO, which had a MAF of 0.16 (FST=0.07). The two Ghanian samples, GAA and AKAN, also showed a large ΔMAF for that allele (MAF of 0.21 and 0.07, respectively) with a slightly smaller FST of 0.04. The Kenyan samples, MKK and LWK, did not differ at that position (both were monomorphic). Interestingly, the African-American samples, ASW and HUFS also showed similar frequency differences (that is, large ΔMAF) with MAF of 0.02 and 0.15, respectively, (FST=0.05). Given the variability observed in the AFR samples, which represent a large component of parental ancestry for these admixed groups, differences between two African-American groups sampled from different parts of the United States may not be surprising although there is also the potential for assay artifact.16 Population differentiation was also observed within the other continental samples. Over 90 SNPs had allele frequencies that generated FST values 0.05 for at least one EUR sample pair and 49 SNPs for AMR sample pairs. In summary, the principal components analysis showed the clustering of individuals relative to continental ancestry (geography) and ethnic grouping.
Individual-level variation in burden of pharmacogenomic risk variants
Group-level data are useful for understanding population frequencies. However, the individual is the subject at the clinical level. To illustrate the spectrum of variation of individual burden of risk for ADME variants, we examine the set of clinically ‘actionable’ SNPs in EAS populations (the ancestral group with the smallest ΔMAF for all actionable SNPs). Among EAS populations (n=286), the average MAF across all 42 actionable SNPs for these three populations was just 0.13. We focused on individuals homozygous for a risk allele given that these persons are likely to experience more severe phenotype than heterozygotes. We identified at least one individual homozygous for the risk allele in 20 of the 42 actionable SNPs. In total, 632 homozygous genotypes were detected from 277 individuals (an average of 2.2 homozygote genotypes per individual) indicating some individuals were homozygous for more than one risk allele. Ten individuals were homozygous for at least five actionable risk alleles and over 200 individuals were homozygous for at least two risk variants demonstrating the limitation of extrapolating from population to the individual level.
Knowledge of drug target genes and genes involved in drug ADME remains critical in predicting therapeutic effect and/or adverse drug response. Here, we present data on one of the largest sets of pharmacogenomic variants so far studied on 19 population groups from around the globe. Previous studies have either focused on single genes or a handful of genes and/or utilized samples with small numbers from each population group (for example, the Human Genome Diversity Project panel in which some populations have fewer than 30 individuals). Notably, we conducted de novo genotyping to increase the representation of Sub-Saharan Africa populations and African Americans given the paucity of comparative genetic variation data17 currently available from these groups in public databases despite the fact that they display the highest degree of genetic diversity compared with other human population groups. We have focused on analyzing the spectrum of diversity across individuals, ethnic groups and continental ancestry to provide new insights into some of the potential challenges that lie in the ongoing global effort to move from group labels such as ancestry, ‘race’ and ethnicity to drug prescription tailored to an individual’s genetic background (‘personalized medicine’). Overall, our data demonstrate that ADME genetic variants show considerable differences in allele frequency among global populations in general (Figure 2) as well as among populations that are often grouped together by continental origin, ancestry or ‘race’ (Figure 3).
At one level, our findings illustrate the utility of population data for guiding clinical decision making in the absence of individual-level genetic data. Actionable variants that are monomorphic (that is, no variation) across all samples are a good example. Three of the 42 actionable SNPs we investigated were either monomorphic or showed a MAF of<0.01 for all global populations tested. These SNPs have implications on toxicity of thiopurines (rs1800462),18 toxicity of capecitabine and other cancer drugs (rs3918290),19 and clopidogrel responsiveness (rs28399504).20 We also saw examples at the continental level such as AFR populations, which showed seven SNPs that were monomorphic (African Americans (ASW and HUFS) had<MAF 0.02) each having direct clinical actionability. Similarly, 11 actionable SNPs were monomorphic across EUR populations with the exceptions of GBR and FIN for just one SNP each and 17 SNPs in EAS populations, again with nominal exceptions (that is, MAF<0.01) for a handful of SNPs. Pending the routine use of individualized genetic testing at point-of-care, group data remains very useful in a number of ways. Regulatory bodies can use such data (in combination with clinical and functional data) to formulate guidelines for genetic testing of pharmacogenomic variants (see the FDA’s Table of Pharmacogenomic Biomarkers in Drug Labels21). In addition, guidelines and drug labels can be fine grained using such group data, for example: testing certain polymorphisms only in people of specific ancestry (for example, dermatologic reactions from carbamazepine for individuals carrying the HLA-B*1502 allele).21 Second, such data can be used to guide specific pharmacogenetic-based dosing guidelines, for example: warfarin dosing guidelines based on VKORC1 rs99923231, CYP2C9*2 and CYP2C9*3 alleles do not explain much of the variation of the dose in African ancestry populations because these variants are monomorphic or near-monomorphic.22, 23 Third, national regulatory bodies can use such data to guide their policies tailored to their specific populations. A good example of this is how the Singapore Health Services Authority used group genetic data on the country’s main ethnic groups to request revision of the package insert for irinotecan (to include the pharmacogenetic association with severe neutropenia) and to publicize the association and availability of a genotyping test.24 Thus, these kinds of data are of high public health and clinical relevance, serving as part of the necessary evidence base for translation of pharmacogenomics findings into routine clinical care (Figure 4).
The findings of this study also serve as a powerful reminder of the stark differences in allele frequencies among population groups and the direct clinical relevance of those differences. For example, three core variants (rs179983, rs1057910 and rs9923231) are typically used for estimating warfarin sensitivity. In the case of rs9923231, the mean MAF for EAS populations is 0.92 compared with 0.06 for AFR populations indicating strong population differentiation (FST 0.51–0.87).11, 25 Using current recommended dosing algorithms, the dosing range for AFR+AA populations will be 5.0–7.0 mg of warfarin per day with only 3% of individuals from these groups deviating from the recommended range.26 In sharp contrast, nearly 25% of the EAS individuals sampled in this study differ from the majority recommended dosing of 3.0–4.0 mg of warfarin per day based on their genotype data with some individuals expected to respond better to lower doses (0.5–2.0 mg per day) or higher doses (5.0–7.0 mg per day). In addition, two variants (rs2256871 and rs28371685 in CYP2C9*9 and CYP2C9*11, respectively) with enzymatic activities observed in AFR+AA population at a high frequency of 0.15 are monomorphic in the EUR, EAS and AMR populations. Many of the extreme examples of population differentiation observed have been shown to be driven by recent positive selection in ADME genes.9
Finally, findings from this study illustrate the potential pitfalls in the use of demographic labels such as ‘black’ or ‘white’ in the practice of medicine as we have alluded to in previous reports.27 For example, CYP2D6 enzymatic activity has been linked to tamoxifen efficacy in breast cancer patients. The CYP2D6*2A haplotype contains the SNP rs16947,19, 28 which has the SNP has a MAF of 0.43 in GAA to 0.76 in MKK samples (range=33%; FST=0.11), two populations generally referred to as ‘black’. In addition, acetylator phenotypes of NAT2 are predicted in part by rs1801280;29 this SNP has a MAF of 0.71 in IBS samples but observed to be as low as 0.42 in CEU (FST=0.09). Interestingly, the AMR samples all fell below a MAF of 0.40 for this SNP making ‘white’ or even ‘Latino’ or ‘Hispanic’ poor proxies for peoples of Spanish Iberia for an allele that is found in less than half of the population of some European and Latin American populations but nearly three-quarters of the IBS population. Traditional labels of race and ethnicity are often used in research studies, clinical medicine and public health as proxies (albeit imperfect ones) for unmeasured environmental and social covariates. When used to guide drug choice and dosage, they are also imperfect proxies for the unmeasured genotype. Until genotyping for pharmacogenomic biomarkers becomes universal and incorporated into routine clinical practice, traditional classifications of race/ethnicity will continue to be used for categorization of individuals. We anticipate that improvements in technology, falling costs and better guidelines for use of pharmacogenomic biomarkers in clinical decision-making will gradually lead to replacement of the race/ethnicity label (a blunt tool) with the more precise genotype.
As we take steps toward the integration of genomic medicine into day-to-day clinical care, the practice of medicine will benefit from studies that incorporate pharmacogenomics data from individuals sampled from multiple ancestral backgrounds across the world.23, 30 Expanding the evidence base to include multiple global populations will facilitate clinical decision making and provide useful data for regulatory bodies to utilize in policy recommendations about drug labels and genetic testing recommendations.
Panel: research in context
We did a PubMed search for ‘pharmacogenomics’ or ‘pharmacogenetics’ and ‘global’ and ‘populations’, which yielded 49 citations. However, most were either of a single gene, genetic variants for a single drug or a limited number of populations. One study did include a large number of pharmacogenes as well as a wide range of global populations.9 However, many of the included populations in that study had few individuals and the study focused primarily on population genetic parameters and signals of natural selection.
We examined 1156 genetic variants in 212 genes involved in drug ADME of drugs in 19 populations (n=1478). These pharmacogenomic variants showed marked variation between and within continental groupings of the populations studied. Group labels often used in clinical settings (that is, race labels) did not accurately portray the underlining genetics of an individual. This implies that individual genotype data is the best way of evaluating a patient’s pharmacogenetic risk profile. However, group data remain essential for developing recommendations for genetic testing, targeted therapeutics and drug labeling. Enlarging the scope of pharmacogenomics research to include multiple global populations can improve the evidence base for clinical translation and provide a starting point for studies that relate genotype to drug efficacy, toxicity and dosage guidelines.
The informatics expertise of Kevin Long, University of North Carolina, Chapel Hill is greatly appreciated. The study was supported by National Institutes of Health grants S06GM008016-320107 to CNR and S06GM008016-380111 to AA. HUFS participants were enrolled at the Howard University General Clinical Research Center, which is supported by grant 2M01RR010284 from the former National Center for Research Resources, National Institutes of Health. This research was supported in part by the Intramural Research Program of the Center for Research on Genomics and Global Health. The Center for Research on Genomics and Global Health is supported by the National Human Genome Research Institute, the National Institute of Diabetes and Digestive and Kidney Diseases, the Center for Information Technology and the Office of the Director at the National Institutes of Health (Z01HG200362).
About this article
Supplementary Information accompanies the paper on the The Pharmacogenomics Journal website (http://www.nature.com/tpj)
The Pharmacogenomics Journal (2018)