Introduction

Interindividual variability in drug responses caused by a combination of drug–drug interactions, as well as by physiopathologic, environmental, genetic, and epigenetic factors, constitutes a major challenge for clinical practice.1 In particular, genetic variability in the genes that encode proteins involved in drug absorption, distribution, metabolism, and excretion (ADME) has been shown to impact on drug pharmacokinetics, efficacy, and safety, highlighting the decisive role of genetic variation for treatment success.2, 3 Based on such genetic variability, the US Food and Drug Administration and the European Medicines Agency provide guidance in the drug Summaries of Product Characteristics to improve clinical pharmacotherapy.

While common single-nucleotide variants (SNVs) have been extensively studied, the vast extent of rare SNVs and small insertions–deletions (indels) has only recently received attention, driven in part by technological advancements.4, 5 Furthermore, there is growing evidence that copy-number variants (CNVs), defined as duplications or deletions of DNA segments ranging from ~1 kb to 3 Mb, contribute substantially to phenotypic diversity and disease.6, 7 Structural variants in which the whole or parts of the open reading frame of a gene is deleted commonly abrogate gene function, while whole-gene duplications can increase gene dosage and functionality. Functionally relevant CNVs have been described in the ADME genes CYP2A6, CYP2D6, GSTM1, GSTT1, and SULT1A1, with frequencies that substantially differ across human populations and that contribute to interethnic pharmacokinetic differences.8 Particularly for CYP2D6, not only deletions but also gene duplications and higher-level amplifications have been described that majorly impact human drug response and whose interethnic differences in frequencies are still not fully characterized.9

In recent years, next-generation sequencing (NGS) techniques have provided with powerful approaches for CNV detection, with whole-exome sequencing being able to detect CNVs in exonic sequences with a resolution similar to medium-resolution genomic microarrays.10 The Exome Aggregation Consortium (ExAC) recently released CNV data derived from whole-exome sequencing of 59,898 individuals distributed across six major human populations.11 However, no systematic analysis of CNVs in ADME genes has been presented so far, and the impact of these variants on drug response remains unknown.

By leveraging these novel NGS data sets, we here provide the first panorama of CNVs across 208 important ADME genes and estimate the contribution of these newly described structural variants to the variability in drug response. Furthermore, we experimentally identified the genomic breakpoints of three novel deletions in CYP2C19, CYP4F2, and SLCO1B3, and assessed the allele frequencies of these deletions of potential clinical relevance in 1,080 Spanish, 465 Finnish, and 590 Japanese individuals.

Materials and methods

Data collection

Bioinformatic analyses of CNVs were performed on 208 ADME genes with importance for drug pharmacokinetics (Novel deletions and duplications in 208 pharmacogenes across six populations; Supplementary Table S1 online). CNV allele counts were analyzed in six major populations (non-Finnish Europeans, Finns, Africans, South Asians, East Asians, and admixed Americans) by integrating CNV data from ExAC containing exome sequences from 59,898 unrelated individuals11 with whole-genome sequencing data from 2,504 individuals provided by the 1000 Genomes Project phase 3.12 The minor allele frequency (MAF) of rare CNVs was extracted from ExAC (MAF ≤0.5%), while data from the 1000 Genomes Project was used to collect data for common CNVs.

CNVs analyses

Novel CNVs were defined as those not previously reported in the literature. To calculate CNV MAF from carrier counts, Hardy–Weinberg equilibrium was assumed. For functional predictions of CNV effects, we assumed that the deletion of one or more exons of a gene results in a nonfunctional protein product. To estimate the contribution of deletions to the total number of loss-of-function (LOF) alleles, CNV data was related to LOF alleles derived from SNVs and indels obtained from ExAC as previously described.13 We used a conservative definition of LOF and only considered those variants that resulted in frameshifts, premature stop-codons, loss of start-codons, or altered canonical splice sites. Furthermore, we included well-described LOF variants in CYP2C19 (CYP2C19*2, rs4244285), CYP2D6 (CYP2D6*41, rs28371725), and CYP3A5 (CYP3A5*3, rs776746) that are either misclassified as nondefective variants or not covered by exome sequencing.

Breakpoint determination

DNA from three individuals (HG00268, HG01485, and NA19010) was acquired from the Coriell Biorepository (Coriell Cell Repositories, Camden, NJ) to identify deletion breakpoints corresponding to CYP2C19-esv3624259, SLCO1B3-esv3628797, and CYP4F2-esv3643780. Primers were designed to bind the flanking region of each deletion (CYP2C19-Fw: 5’-ATTAGCAATGTTGCCCGAAG-3’; CYP2C19-Rv: 5’-AGAAGAGCAACCCCAAGACA-3’; SLCO1B3-Fw: 5’TCCAAACCCACTTTGTTTCC-3’; SLCO1B3-Rv: 5’-TGCTGTGGGTGAATTGAAAG-3’; CYP4F2-Fw: 5’- AACCACTCATCCCACCACTC-3’; CYP4F2-Rv: 5’-TGACGGCAAGGAAATAAAGC-3’) and to amplify through polymerase chain reaction (PCR) the region containing the deletion breakpoints. PCR products were purified using the ExoSAP-IT for PCR Product Clean-Up (Thermo Fisher Scientific, Waltham, MA, USA) and subjected to Sanger sequencing using an ABI PRISM 3700 DNA Analyzer capillary sequencer (Thermo Fisher Scientific). DNA sequencing chromatograms were aligned to the reference human genome (GRCh38) to define the genomic coordinates of the deletions.

Samples and genotyping

Germ-line DNA was collected from unrelated individuals from Spain (n = 1,080), Finland (n = 465), and Japan (n = 590). Individuals were over 18 years old, and the collection of samples was approved by local ethical review committees. CNV genotyping was performed using the Kompetitive Allele Specific PCR (KASP) genotyping assays (LGC Genomics, Hoddesdon, UK) using specific assays designed for the CYP2C19, SLCO1B3, and CYP4F2 deletion breakpoints detected by Sanger sequencing. The KASP assay consists of two allele-specific forward primers, one labeled with FAM dye and the other with HEX dye and one common reverse primer. KASP reactions were carried according to the manufacturer’s instructions. Briefly, reactions were run in 5 μl final reaction volume containing 2.5 μl of KASP 2X reaction mix, 0.07 μl of assay primers mix, and 15 ng of genomic DNA. The following thermal cycling conditions were used: 94 °C for 15 min, followed by 10 touchdown cycles of 94 °C for 20 s, 61–55 °C for 60 s (dropping 0.6 °C per cycle), and then 26 cycles of 94 °C for 20 s, and 55 °C for 60 s. All assays included positive control samples with known genotypes and negative controls. All deletions identified by genotyping were confirmed by PCR using the breakpoint-specific primers described above.

Results

The landscape of ADME gene CNVs

To provide a comprehensive overview of CNVs within clinically relevant pharmacogenes, we collected data from ExAC and the 1000 Genomes Project and identified deletions and duplications. Of the 208 ADME genes analyzed, 201 harbored novel CNVs (97%) and we identified a total of 5,589 novel CNVs, of which 2,611 (47%) were deletions and 2,978 (53%) were duplications (Figure 1a).

Figure 1
figure 1

Overview of newly described gene deletions and duplications in 208 pharmacogenes across six human populations. (a) In total we found 5,589 novel copy-number variants (CNVs) in 59,898 individuals of which 2,611 were deletions (blue) and 2,978 duplications (red). (b) Dot plot depicting allele frequencies for novel deletions across non-Finnish Europeans (NFE), Finnish (FIN), East Asians (EAS), South Asians (SAS), Africans (AFR), and admixed Americans (AMR). Deletions of the CYP2A7 pseudogene as well as previously described deletions are not included in this representation and are depicted in Supplementary Figure S1 and Table 1, respectively. The highest deletion frequencies were observed for CYP2C19 in Finns and CYP2B6 in Africans. (c) Allele frequencies of novel duplications are shown. Duplications of CBR3 and CYP2B6 in Finnish and African individuals, respectively, were the most common. (d) Aggregated CNV frequencies were highest in Africans (7.2%) and lowest in admixed Americans (3.4%). (e) The size distributions of deletions and duplications are shown. Duplications were slightly larger affecting median genomic intervals of 67.1 kb compared to 24.7 kb for deletions. ADME, absorption, distribution, metabolism, and excretion.

Deletions were detected in 175 out of 208 genes (84%; Figure 1b and Supplementary Table S1). Deletions affecting CYP2C19 in Finns, CYP2B6 in Africans, and CYP4F2 in East Asians were the most frequent with MAFs of 1.1%, 0.9%, and 0.4%, respectively. Most of these newly described deletions were highly population-specific, with interpopulation differences in allele frequencies >10-fold, while others were present in all populations studied at comparable levels, such as those affecting CYP2F1 and SLCO1B3. Of all CNVs identified, deletions affecting the pseudogene CYP2A7, which may act as a micro RNA decoy for miR-126 and affect CYP2A6 levels,14 were most abundant with frequencies between 1.7% and 11.5% in the different populations (Supplementary Figure S1). Furthermore, we refined the population-specific data of previously well-characterized deletions in GSTM1, GSTT1, UGT2B17, UGT2B28, CYP2D6, CYP2A6, SULT1A1, and CYP2B6 using the 1000 Genomes Project data (Table 1). Overall, the aggregated frequency of the novel ADME gene deletions ranged from 4.2% in Africans to 1.3% in admixed Americans (Figure 1d).

Table 1 Allele frequencies of previously established gene deletions in GSTM1, GSTT1, UGT2B17, UGT2B28, CYP2D6, CYP2A6, SULT1A1, and CYP2B6 genes

Novel full- or partial-gene duplications were detected in 190 of 208 (91%) pharmacogenes studied. The most frequent were exonic duplications that affected CBR3 and CYP2B6 with 0.7% and 0.4% MAF in Finnish and African individuals, respectively (Figure 1c). Population-specific duplications included those in SULT1A2 and SLC13A1, mainly present in South Asians and Africans (0.3% and 0.2% MAF, respectively), whereas duplications in ABCC1 and ABCC6 were detected in all populations with MAFs ranging between 0.1% and 0.2% (Supplementary Table S1). The aggregated frequency of the novel ADME gene duplications varied from 3.6% in South Asians to 2.1% in Finns (Figure 1d).

The newly described deletions encompassed genomic intervals with a median size of 24.7 kb, whereas duplications were slightly larger with a median size of 67.1 kb (Figure 1e). Overall, 87% of deletions and 93% of duplications exceeded sizes of 3 kb.

Gene deletions comprise a substantial fraction of ADME LOF alleles

Next, we analyzed the relative contributions of novel CNVs to the overall ADME LOF alleles. The contribution of novel deletions to the LOF alleles varied widely among the populations studied for many of the genes (Figure 2). While deletions of CYP2C19 comprised 5.7% of all CYP2C19 LOF alleles in Finns, the contribution was substantially lower (0.05–1%) in the other populations. Deletions of CYP1A2 were identified exclusively in Africans with a low MAF (0.04%); however, they comprised around 57% of all CYP1A2 LOF alleles in this population. In total, novel deletions accounted for > 5% of LOF alleles in a substantial number of genes (87, 25, 49, 48, 59 and 51 genes in non-Finnish Europeans, Finnish, East Asians, South Asians, Africans, and admixed Americans, respectively; Figure 2), emphasizing the overall importance of ADME CNVs.

Figure 2
figure 2

Fraction of loss-of-function (LOF) alleles attributable to novel deletions in six human populations. The relative contribution of novel deletions to the overall pool of LOF alleles is depicted for non-Finnish Europeans (blue), Finnish (red), East Asians (green), South Asians (purple), Africans (turquoise), and admixed Americans (orange). Only absorption, distribution, metabolism, and excretion (ADME) genes for which the contribution of novel deletions to LOF alleles exceeds 5% in the respective population are shown. Single-nucleotide variant (SNV) and indel LOF alleles were defined conservatively and include only those leading to frameshifts in the coding sequence, start-loss, stop-gain, or splice variants. Deletions of the CYP2A7 pseudogene as well as previously described deletions are not included in this representation.

In genes with previously described common CNVs, deletions constituted the majority of LOF alleles (Supplementary Figure S2). Yet, pronounced interethnic differences were evident for some of these genes, as illustrated by CYP2D6. The deletion allele CYP2D6*5 accounted for 80% of all LOF alleles in East Asians, whereas it only accounted for 1–27% in the other populations studied.

Experimental validations of computational CNV predictions

To confirm the predictive quality of NGS-based CNV calls, we determined the breakpoints of three novel deletions overlapping CYP2C19, CYP4F2, and SLCO1B3 (Table 2). The most frequent deletion in each of the genes was validated by PCR using primers predicted to amplify the deletion junctions. The precise breakpoints for the CYP2C19 deletion, expanding from the promoter region to intron 5, SLCO1B3 deletion from introns 8 to 13, and CYP4F2 full-gene deletion affecting 13 exons, were located close to the computationally predicted deletion sites (Figure 3).

Table 2 Gene deletions in CYP2C19, CYP4F2 and SLCO1B3 described in ExAC (MAF > 0.05%)
Figure 3
figure 3

Experimental validation of newly identified CYP2C19,SLCO1B3 , and CYP4F2 deletions. Linear representation of the genomic reference sequence and the deletion breakpoints detected by Sanger sequencing in (a) CYP2C19, (b) SLCO1B3, and (c) CYP4F2. Sequences surrounding the deletion are shown in black and the deleted sequence is shown in gray. Coordinates refer to the GRCh38 reference genome assembly. IR, intergenic region.

The CYP2C19 exon 1–5 deletion was detected by genotyping with a MAF of 0.43% in a Finnish series of 465 subjects (0.8% MAF in this population according to ExAC; Table 2). The SLCO1B3 exon 9–13 deletion was detected with 0.53% MAF in our 1,080 Spanish cohort (0.1% MAF in non-Finnish Europeans according to ExAC), and CYP4F2 full-gene deletion was detected with a 1.61% MAF in 590 Japanese, including one subject homozygous for the gene deletion (0.4% MAF in East Asians according to ExAC).

Discussion

The interindividual variability in drug pharmacokinetics and pharmacodynamics causes lack of drug efficacy and adverse drug reactions (ADRs) in a large fraction of patients.15 In total, ADRs cause around 6.5% of admissions to hospitals and can have severe or fatal outcomes especially in pediatric and geriatric patients.16, 17 Furthermore, they account for 5–10% of annual hospital costs, posing an important economic burden on health-care services.18 Importantly, genetic factors are estimated to be responsible for 20–30% of observed ADRs and, thus, could be prevented by appropriated genetic tests.3 Accordingly, more than 100 drugs currently have pharmacogenomic labels to identify patients at risk for ADRs or lack of efficacy.19, 20 However, these identified pharmacogenomic biomarkers only relate to frequent genetic variations, whereas recent large-scale sequencing projects have also revealed that rare variants are of relevance to drug pharmacokinetics.13, 21, 22 Besides being enriched in functional effects, overall rare alleles can be highly population-specific. One important example has been described for CYP3A4. While the LOF variant CYP3A4*20 is globally rare, it is present at a high frequency in specific regions in Spain, where it significantly contributes to ADRs during paclitaxel therapy.23, 24

In recent years NGS techniques have burst into the field of genetics, providing novel and cost-effective tools to detect not only SNVs but also CNVs. NGS techniques suitable for the detection of CNVs have already been used in large research projects. Furthermore, the data presented here incentivizes the implementation of more comprehensive CNV testing also in the clinical setting once additional methodological requirements, particularly regarding analytic and clinical validity as well as the clinical utility, are met. Recently, CNV data from 59,898 human exomes was released, providing the largest and most powerful resource for the identification of this type of variation.11 By leveraging this extensive NGS data set, we provide the first global panorama of the structural genetic diversity in pharmacogenes. Frequencies of most of the identified novel deletions varied widely across populations and were estimated to substantially contribute to the total pool of LOF variants.

The gene with the highest number of novel deletions was CYP2C19, with a MAF in Finns of 1.1%. CYP2C19 is involved in the metabolism of multitude of drugs and CYP2C19 genotype-guided dosing recommendations are currently included in 21 Food and Drug Administration drug labels, including citalopram, clobazam, clopidogrel, escitalopram, and flibanserin (https://www.fda.gov/). We validated the most common deletion in CYP2C19, which we identified to affect exons 1–5, in a cohort of 465 Finns and found frequencies that were lower in the analyzed cohort than in computational predictions (predicted MAF = 0.8%, experimental MAF = 0.4%). This difference could be caused by the genotyping assay, which is specific for the breakpoints identified and would thus miss additional deletions bounded by alternative breakpoints but spanning the same exons. Regarding the contribution of CYP2C19 deletions to the LOF phenotype, several frequent LOF single-nucleotide polymorphisms are known to affect this gene (e.g., CYP2C19*2 and CYP2C19*3 alleles); despite this, the deletion was estimated to represent 5.7% of all LOF alleles in the Finnish population.

In non-Finnish Europeans, deletions in SLCO1B3 were the most common novel CNVs detected in the NGS data, accounting for 17% of all LOF alleles (Figure 2). The exon 9–13 deletion, predicted to have a 0.1% MAF in the general non-Finnish European population, occurred with a frequency of 0.5% in Spanish individuals. This 5-fold higher frequency in Spain might be caused by population-specific differences. OATP1B3, the transporter encoded by SLCO1B3, is important for the clearance of bilirubin25 and has been implicated in the transport of multiple drugs, with substantially overlapping specificity with OATP1B1.26 Importantly, however, some substrates including the angiotensin receptor blocker telmisartan and the oncology drug docetaxel are transported primarily by OATP1B3.27, 28, 29

Another relevant novel deletion spanned the CYP4F2 gene, enconding a vitamin K oxidase that acts as an important counterpart of VKORC1. Genetic polymorphisms in VKORC1, CYP2C9, and CYP4F2 are clinically relevant for the dosing with anticoagulants, explaining up to 45% of warfarin dose requirements.30, 31 The benefits of genotype-guided warfarin dosing on clinical outcomes have been studied in prospective clinical trials, albeit with sometimes conflicting results.32, 33, 34 CYP4F2 deletions were found mainly in East Asians, and in this population they accounted for 69% of all LOF alleles. Genotyping for the novel CYP4F2 full-gene deletion in Japanese revealed a subject homozygous for the deletion and a global MAF of 1.6%. The allele frequency found in this population was 4-fold higher than that predicted for East Asians (MAF = 0.36%) and might be explained by a population-specific distribution of the deletion. Thus, it can be suggested that genotyping for structural variants might further improve the selection of optimal anticoagulant starting doses and contribute to improved patient outcomes in East Asians.

In the era of precision medicine it is important to accurately characterize the genotypes of the specific patients subjected to pharmacotherapy. We here show that novel CNVs significantly contribute to the functionality of relevant pharmacogenes, adding an additional layer of pharmacogenetic complexity with important implications for the prediction of drug response and toxicity. We thus recommend the incorporation of CNV detection assays for relevant genes and populations. In combination with the important role of rare SNVs, our results suggest that the quality of preemptive pharmacogenetic advice, which is typically based on the interrogation of few candidate variants, can be improved by comprehensive NGS-based genotype identification of relevant pharmacogenes.