Genetic landscape of 125 pharmacogenes in Chinese from the Chinese Millionome Database

Inter-individual differences of drug responses could be attributed to genetic variants of pharmacogenes such as cytochrome P450 (CYP), phase 2 enzymes, and transporters. In contrast to extensive studies on the genetic polymorphisms of CYP gene, genetic mutation spectrum of other pharmacogenes was under-representative in the pharmacogenetics investigations. Here we studied the genetic variations of 125 pharmacogenes including drug transporters, non-CYP phase 1 enzymes, phase 2 enzymes, nuclear receptors and others in Chinese from the Chinese Millionome Database (CMDB), of which 38,188 variants were identified. Computational analyses of the 2554 exonic variants found 617 deleterious missense variants, 91.1% of which were rare, and of the 54 loss-of-function (splice acceptor, splice donor, start lost, and stop gained) variants, 53 (98.1%) were rare. These results suggested an enrichment of rare variants in functional ones for pharmacogenes. Certain common functional variants including NUDT15 13:48611934 G/A (rs186364861), UGT1A1 2:234676872 C/T (rs34946978), and ALDH2 12:112241766 G/A (rs671) were population-specific for CMDB Chinese because they were absent (with a zero of variant allele frequency) or very rare in other gnomAD populations. These findings might be useful for the further pharmacogenomics research and clinical application in Chinese.


Scientific Reports
| (2021) 11:19222 | https://doi.org/10.1038/s41598-021-98877-x www.nature.com/scientificreports/ better understand drug efficacy in participants of diverse ancestral backgrounds. However, these whole genome sequences database mainly comprised European populations while Chinese population were underrepresented. A large-scale Chinese population sequencing project named the Chinese Millionome Database (CMDB) recently investigated 141,431 whole genome sequences from Chinese women who took a non-invasive prenatal testing 12 . Our previous study of 57 CYP genes and POR (cytochrome P450 oxidoreductase) in CMDB provided a comprehensive data set of P450 genes covering the whole mainland China provinces and 37 ethnicities 13 . In the present study we investigated the genetic variations of 125 pharmacogenes including non-CYP phase 1 enzymes, phase 2 enzymes, drug transporters, nuclear receptors and others in Chinese from the CMDB.

Materials and methods
Data sources. One-hundred and twenty-five pharmacogenes, including 16 ABC transporters, 50 SLC transporters, 17 phase 1 enzymes, 26 phase 2 enzymes, 9 nuclear receptors and 7 others (including ACE, CDA, POR, NUDT15, CACNA1S, G6PD, IFNL3), were obtained through the pipeline in Fig. 1a. Briefly, the automated annotation text-mining system in PharmGKB (https:// www. pharm gkb. org) was performed to scan sentences in literatures from PubMed. When the sentences included information linking a chemical and a variation, the textmining system would annotate the sentence as pharmacogenomics information. And 1055 automated annotation pharmacogenes were identified (accessed 1 March, 2020), which were compared to pharmacogenes studied in reference (Ingelman-Sundberg et al.) 14 . Whole-genome sequencing data from 141,431 Chinese including 37 ethnicities living in 31 provinces of mainland China in CMDB were used to collect genetic variants information across the above 125 pharmacogenes (accessed 21 March, 2020). In brief, raw variants were gained using a P value less than 10 -6 by the maximum likelihood model in the accessible genomic regions. Then, a Bayesian Gaussian mixture model was applied to assign every variant candidate a Phred-scaled probabilistic score (VQSR score) demonstrating the probability that this variant was a genuine polymorphic variant. The higher VQSR score indicated the higher probability that the variant candidate was a true polymorphic variation. High transition versus transversion (Ti/Tv) ratio is found for the raw call set (maximum 8.9 for novel variants and 3.4 for known variants) but it would decrease when the filtration threshold for VQSR score increases. A 35 of filtration threshold of VQSR score was applied indicating a Ti/Tv ratio of 2.4 for novel variants and of 2.2 for known variants 12 . Novel variant (without one rs number) was defined relative to dbSNP release 135. Variants with a variation allele frequency (VAF) less than 1% were defined as rare and variants with a VAF more than 1% were defined as common. The VAF in different gnomAD populations were collected from gnomAD browser (http:// www. gnomad-sg. org/) in version 3.1. Statistical analysis. Kruskal-Wallis test was utilized to compare the differences between variants number across pharmacogene subfamilies (GraphPad Prism version 8) (www. graph pad. com). Linear regression was used to analyze the relationship between missense, deleterious and total variants numbers and the corresponding gene length in different pharmacogenes groups. The chi-square test or Fisher's exact test when needed was used for comparative analysis of the variant allele frequencies for the ABCC4, SLCO1B1, ALDH2, TPMT, UGT1A1, VDR and NUDT15 polymorphisms between different populations in gnomAD and CMDB Chinese using R version 4.1.0 (https:// www.r-proje ct. org/). P < 0.05 was recognized as statistically significant, while the Bonferroni Correction was used to adjust the significance threshold (P < 0.002 (0.05/18)) for the multiple testing of the linear regression analysis between the number of total variants, missense variants and deleterious variants and the corresponding gene length among the 6 pharmacogenes groups, and to adjust the significance threshold (P < 0.0006 (0.05/72)) for the multiple testing of the comparison for variants allele frequencies between CMDB Chinese and diverse gnomAD populations.
The majority of variants with putative clinical relevance were rare. Neutral missense variants were comprised of 446 while deleterious missense and loss-of-function variants were comprised of 671 (see the "Methods" section for details; Fig. 3a). Of the 617 deleterious missense variants, 562 (91.1%) were rare while  , non-CYP phase 1 enzymes (n = 3), phase 2 enzymes (n = 7), nuclear receptors (n = 6), and others (n = 2) were observed ( Fig. 3c; P = 0.0047). Totally, the average distribution of functional variants (deleterious missense plus loss-of-function variants) in ABC transporters (mean ± SD: 11.2 ± 9.03) was significantly more than those in SLC transporters (4.2 ± 3.22), non-CYP phase 1 enzymes (4.1 ± 5.09), phase 2 enzymes (3.7 ± 3.15), nuclear receptors (4.1 ± 2.86), and others (5.9 ± 5.60) ( Fig. 3d; P = 0.0048). However, we re-test the relationship between pharmacogenes family and loss-of-function burden and functional variants adjusting for gene size and no significant difference was found among the pharmacogenes families (P = 0.0916 for LoF variants and P = 0.1212 for functional variants). The significantly larger average distribution of functional variants in ABC transporters than those in other groups could be attributed to the gene size and the large number of exons across the ABC superfamily relative to the other genes/gene families tested. To evaluate the functional importance of rare variants in individual pharmacogene, we calculated the percentage of rare and common functional variants (deleterious missense plus loss-of-function variants) and aggregated frequencies of splice acceptor, splice donor, start lost, stop gained and putatively deleterious missense variants in each of the 125 studied pharmacogenes (Fig. 4).
In total, we could observe a substantially different distribution and pattern of genetic diversity among the 125 pharmacogenes studied. Different correlation patterns between number of variants and gene length among pharmacogene groups. Through linear regression analysis, we investigated the relationship between the number of total variants, missense variants and deleterious variants and the corresponding gene length among the 6 pharmacogenes groups. After the Bonferroni Correction for the multiple test, no significant correlation between the number of total variants (P = 0.0083), missense variants (P = 0.0234), and deleterious variants (P = 0.0081) and the corresponding gene length was found in phase 2 enzymes group (Fig. 5). Moreover, no correlation between the number of total variants (P = 0.9141), missense variants (P = 0.8818), and deleterious variants (P = 0.7784) and the corresponding gene length was found in nuclear receptors group. The trend upwards between the nuclear receptor total variant category and gene length was damaged by a single outlier (AHR, aryl hydrocarbon www.nature.com/scientificreports/ receptor). AHR, a member of the basic helix-loop-helix-period-aryl hydrocarbon receptor nuclear translocatorsingle-minded (bHLH-PAS) family of transcription factors, has a gene length of 429.79 kb but only 114 variants in total, which might be attributed to its conservative role as the biological sensor in initiating gene expression procedures in responses to exogenous and endogenous signals. In addition, the relationship between the number of total variants and gene length was found to be significant in SLC transporters (P < 0.0001), and non-CYP phase 1 enzymes (P = 0.0007) respectively, but not significant in ABC transporters (P = 0.0038). Furthermore, no significant correlation between the number of missense and deleterious variants and gene length was found in the three pharmacogenes groups (all P > 0.05). The others pharmacogenes group (totally 7 genes) had the significant correlation between the number of total variants (P < 0.0001) and gene length, while the relationship between the number of deleterious variants (P = 0.0559) and missense variants (P = 0.0299) and the corresponding gene length in this group were not significant.

Discussion
The present study provided a comprehensive and systematic genetic overview of 125 pharmacogenes including 16 ABC transporters, 50 SLC transporters, 17 non-CYP phase 1 enzymes, 26 phase 2 enzymes, 9 nuclear receptors and 7 others in Chinese from 141,431 whole genome sequences preserved in CMDB. To the best of our knowledge, this is the first study on the genetic variation of 125 pharmacogenes in such a large-scale population in mainland Chinese. Europeans were over-representative in genomes sequencing projects such as gnomAD worldwide, whereas other populations including Chinese were under-representative in these projects. Therefore, it was not appropriate to use the results from these projects directly on other populations. Moreover, drug development trials neglected to study participants of diverse ancestries would result in poor generalizability of marketed drug efficacy information. Additionally, the withdrawal of marketed drugs due to adverse drug reactions might be attributed to the patients with specific genetic variation, while in fact they would be efficacious for other patients without such certain genetic variant. For example, functional alleles SLCO1A2*2 and SLCO1A2*3 involved in drug uptake of opioid receptor agonist deltorphin II and methotrexate were prevalent in Ashkenazim (VAF = 0.145 and 0.034, respectively) and Europeans (VAF = 0.136 and 0.062, respectively) while absent in East Asians (both VAF < 0.001) 9 . And our present study found some common functional pharmacogene variants such as NUDT15 rs186364861, UGT1A1 rs34946978 and ALDH2 rs671 in CMDB Chinese were population-specific but absent in other populations (Table 1). These results were of key importance in designing , and before the whole genome sequencing becomes the routine testing in clinics, a cost-effective pharmacogenomics testing panel covering an individualized pharmacogenes variants testing items would be needed 15 .
The limitation of our present study is that lacking of haplotype information across pharmacogenes which is a main source of actionable variation for translating these data to clinically meaningful conclusions. According to a review by the Human Genetic Resources Administration of China (HGRAC), the individual genetic data from CMDB is unavailable 12 . We could obtain detailed summaries of the data such as allele frequencies and GWAS summary statistics. Unfortunately, ethnicity-specific allele frequencies across pharmacogenes are unavailable. However, the minorities in mainland China such as Mongolian, Tibetan and Uyghur populations owned different alleles and genotypes frequencies of pharmacogenes from those in Han Chinese 16 . Therefore, more detailed pharmacogenomic studies in these minorities would be needed to perform in future.
Genetic variability of the well-known pharmacogenes alleles could explain only part of the inter-individual differences in drug responses, and rare genetic variants in pharmacogenes such as SLC30A8 could modulate antidepressant (desipramine or fluoxetine) treatment 17 . The potential role on the disease development and drug efficacy of the rare variants in these genetically polymorphic pharmacogenes would be interesting and needed to be studied further 18,19 . The present study demonstrated that 91.1% of the 617 deleterious missense variants in the 125 pharmacogenes were rare (Fig. 3b), which suggested the necessity to focus on the rare pharmacogenes variants in studying inter-individual variability in drug response. Analysis of 25 clinically relevant pharmacogenes in 291 genomes of the Thai population identified 121 putatively functional variants, majority of which were rare and specific to the Thais but absent from gnomAD database 20 . As more and more rare pharmacogene variants were found, the method to interpret their clinical implication were needed to be developed urgently 21,22 . Therefore, the database comprised of both the pharmacogenes genotype and drug response phenotype information should be constructed across the world.
Most of the drugs are metabolized by several enzymes, and often a combination of CYPs and non-CYPs. Although the majority of drug biotransformation is performed by the CYP enzymes, other pharmacogenes such as drug transporters and nuclear receptors might participate in many clinically important drugs 23,24 . Constitutive androstane receptor (CAR) rs2502815 polymorphism and the carbamazepine response in epilepsy patients was potentially relevant. However, our present study found the rare variants in CAR (NR1I3) formed the 100% fraction of the functional variation and the aggregated putatively functional variants frequency of CAR was 0.00305. These results might explain the minor role played by the non-CYP pharmacogenes in the pharmacogenetics studies, where functional alleles in these pharmacogenes were not considered because of the very low VAF in the studied populations. Twin studies on the pharmacokinetics of metoprolol and torsemide suggested that up to 90% of the variation in their pharmacokinetic parameters could be allotted to the subjects' genetic makeup, whereas the known genetic variations of CYP2D6, CYP2C9, and SLCO1B1 explained only 39%, 2%, and 39% of the pharmacokinetics variability, respectively 25 . These findings combined with our present results indicated that a substantial fraction of the heritable variation in the drug responses of clinically important medicines remained to be elucidated.
In conclusion, we comprehensively mapped the genetic landscape of 125 pharmacogenes in mainland Chinese from CMDB and identified 38,188 variants. Computational analyses of the 2554 exonic variants identified 617 deleterious missense variants, 91.1% of which were rare, and of the 54 loss-of-function (splice acceptor, www.nature.com/scientificreports/ splice donor, start lost, and stop gained) variants, 53 (98.1%) were rare. These results suggested an enrichment of rare variants in functional ones for pharmacogenes. Certain common functional variants including NUDT15 13:48611934 G/A (rs186364861), UGT1A1 2:234676872 C/T (rs34946978), and ALDH2 12:112241766 G/A (rs671) were population-specific for CMDB Chinese because they were absent (with a zero of VAF) or very rare in other gnomAD populations. These findings might be useful for the further pharmacogenomics research and clinical application in Chinese.