Introduction

Adverse drug reactions (ADRs) constitute a significant contributor to morbidity and mortality, ranking among the top 10 leading causes of death and disease in developed nations1,2. The characteristics of ADRs exhibit variability contingent upon factors such as genotype, age, gender, population, pathology, drug type, route of administration, and drug interaction3,4. According to Ingelman-Sundberg, genetic factors may account for approximately 10% to 20% of the occurrence of ADRs5. Genetic factors have been found to significantly influence pharmacokinetics, pharmacodynamics, and susceptibility to allergic reactions, resulting in changes in both local and systemic drug exposure and/or drug target functionality, ultimately impeding drug responses6. Recent investigations have elucidated the genetic underpinnings of ADRs7, thereby highlighting the close association between genetic factors and drug response.

Pharmacogenomics, as a key area of precision medicine, is the use of genomic and other “omic” information to personalize drugs selection and administration to avoid ADRs and maximize drugs therapeutic efficacy8,9. Pharmacogenomics accounted for 80% of the variations in drug treatment and safety. More than 400 genes were found to be involved in drug metabolism, and around 200 drug genes were linked to ADRs. It has been shown that substantial differences in distribution and frequencies of single nucleotide variants (SNVs) worldwide affect the key genes involved in drug absorption, distribution, metabolism, and elimination of abnormalities10. SNVs were a vast resource of genetic variation in humans, resulting in phenotypic differences among individuals11,12. The Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB; http://www.pharmgkb.org) that collects, organizes, and disseminates information on the impact of genetic variations in humans on drug responses. It provides free clinical-related information, including dosing guidelines, annotated drug labels, potentially viable gene-drug associations, and genotype–phenotype relationships13.

In recent years, numerous researchers have investigated very important pharmacogene (VIP) variants in ethnic minorities in China, such as the Tibetans14 and Lahu15. According to the 7th National Census, the Zhuang population totaled 15,721,956, ranking second only to the Han Chinese among the 56 populations. They are widespread in China’s Yunnan and Guizhou provinces, mainly in the Guangxi Zhuang Autonomous Region. Over a long period of time, they have developed customs and cultures with their own ethnic characteristics. However, we still have limited information on pharmacogenetic variants in the Zhuang population.

In this study, the VIP variants selected were derived from the PharmGKB, the SNP database of National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/SNP/), and the International HapMap Project (http://www.hapmap.org/), in addition to relevant pharmacogenomics literature. Then, the allele and genotype frequencies of the Zhuang population were compared with those of 26 other populations to obtain significant differences in SNVs after genotyping 200 unrelated Zhuang subjects from Yunnan province. The results of this study may complement current pharmacogenomics data of the Zhuang population, providing a theoretical basis for the safe use of drugs and predicting certain diseases in the Zhuang population.

Materials and methods

Study subjects

In total, 200 unrelated Zhuang subjects (110 females and 90 males) were recruited from Wen Shan in the Yunnan Province of China. The sample size and the proportion were determined using G*Power 3.1.9.2 software16. The participants were healthy based on their medical history and physical examination. Additionally, they had at least three generations of Zhuang ancestry, while none of the other populations had any known ancestral background. Subjects with chronic diseases, infectious diseases, drug or alcohol abuse, severe heart, liver or kidney dysfunction, immune disorders, pregnancy, and lactation were excluded. The informed consent forms have been signed by all subjects. According to the study protocol approved by the Clinical Research Ethics Committee of Northwest University, 5 mL of peripheral blood was collected from each subject and stored at 4 °C for 24 h.

Variants selection and genotyping

Through an extensive literature review on drug metabolism and toxicity, we identified 24 genes associated with these phenomena. By utilizing resources such as the PharmGKB database, the SNP database of NCBI, and the International HapMap Project, in addition to relevant pharmacogenomics literature, we selected variants linked to drug therapy responsiveness. A preliminary screening identified 59 variants. However, only homozygous genotypes were observed for 11 of these variants, making it impossible to compare the distribution and differences in genotype frequency. Consequently, these 11 variants were excluded from our analysis, leaving 48 variants for further investigation.

Genomic DNA was extracted from participants’ peripheral blood using GoldMag-Mini Whole Blood Genomic DNA Purification Kit (GoldMag Ltd., Xi’an, China). The concentration of genomic DNA was measured using NanoDrop 2000C spectrophotometer (Thermo Scientific, Waltham, MA, USA). Subsequently, multiplexed SNV MassEXTEND assays were designed using Agena MassARRAY Assay Design 4.0 software (San Diego, California, USA), which allowed for the design of PCR primers for the selected VIP variants. Agena MassARRAY RS1000 (San Diego, California, USA) was able to genotype the 48 VIP variants according to the manufacturer's instructions17. Finally, the data of SNV genotypes were collected and managed using Agena Typer 4.0 software18, as mentioned in previous studies.

Populations variation data

We downloaded the genotype data from the 1000 Genomes website (https://www.internationalgenome.org/). The 26 populations included: (1) Chinese Dai in Xishuangbanna, China (CDX); (2) Han Chinese in Beijing, China (CHB); (3) Southern Han Chinese, China (CHS); (4) Japanese in Tokyo, Japan (JPT); (5) Kinh in Ho Chi Minh City, Vietnam (KHV); (6) African Caribbeans in Barbados (ACB); (7) African Ancestry in Southwest USA (ASW); (8) Esan in Nigeria (ESN); (9) Gambian in Western Division, The Gambia (GWD); (10) Luhya in Webuye, Kenya (LWK); (11) Mende in Sierra Leone (MSL); (12) Yoruba in Ibadan, Nigeria (YRI); (13) Colombian in Medellin, Colombia (CLM); (14) Mexican Ancestry in Los Angeles, California (MXL); (15) Peruvian in Lima, Peru (PEL); (16) Puerto Rican in Puerto Rico (PUR); (17) Utah residents with Northern and Western European ancestry (CEU); (18) Finnish in Finland (FIN); (19) British in England and Scotland (GBR); (20) Iberian populations in Spain (IBS); (21) Toscani in Italy (TSI); (22) Bengali in Bangladesh (BEB); (23) Gujarati Indian in Houston, Texas (GIH); (24) Indian Telugu in the UK (ITU); (25) Punjabi in Lahore, Pakistan (PJL) and (26) Sri Lankan Tamil in the UK (STU).

Structure analysis and Fst analysis

The Structure 2.3.4 software was used to analyze the structure of 27 populations, and Arlequin3.1 software was used to evaluate pairwise Fst values for assessing the relationship between 27 population groups. In addition, MEGA11 software was utilized to plot phylogenetic tree.

Protein hazard prediction

We performed a functional analysis of missense variants using online tools such as Polyphen2 (http://genetics.bwh.harvard.edu/pph2/), SNAP2 (https://rostlab.org/services/snap/), Mutationassessor (http://mutationassessor.org/r3/), FATHMM (http://fathmm.biocompute.org.uk/index.html), and Mutationtaster (https://www.mutationtaster.org/) to assess the impact of SNVs mutations to predict protein function.

Mutant protein structure prediction

A single amino acid change has the potential to significantly affect protein activity and function. We downloaded the protein structures of CYP2D6 and KCNH2 from the PDB database (https://www.rcsb.org/) and utilized the Chimera v1.16 software to predict and visualize the mutant protein structures.

Statistical analysis

The data were compiled, ordered, and analyzed using Microsoft Excel 2019 (Microsoft, Redmond, WA, USA) and SPSS 26.0 (SPSS, Chicago, IL, USA). The χ2 test was utilized to estimate the Hardy–Weinberg equilibrium (HWE) and compare the divergences in genotype frequencies of 48 VIP variants between the Zhuang population and the other 26 populations. All statistical tests were two-tailed (p < 0.05). Bonferroni corrections were performed to determine the significance level. After the Bonferroni’s multiple tests, p < 4.01 × 10−5 was recognized as statistically significant.

Ethics approval

This study was conducted by the World Medical Association Declaration of Helsinki and was approved by the Northwestern University Clinical Research Ethics Committee (Approval number of Ethics Committee: 230,413,002). All subjects signed an informed consent form.

Results

Basic characteristics of candidate VIP variants

The 48 VIP variants on 24 genes that satisfied the HWE equation (p > 0.05) were collected in this study. Table 1 summarizes the fundamental characteristics of these variants, including gene name, SNVs ID, position, functional consequence, genotype frequency, and minor allele frequency (MAF) in the Zhuang population. Additionally, Table S1 shows the PCR primers for the gathered VIP variants.

Table 1 Basic information of 48 selected VIP variants in the Zhuang population.

SNVs with significant differences in genotype frequencies between the Zhuang population and the other 26 populations

We compared the discrepancies in the genotype frequency distribution of the selected VIP variants between the Zhuang population and 26 other populations based on Chi-square tests. After the Bonferroni correction, the results were considered significant when p < 4.01 × 10–5. The number of SNVs with significant differences in genotype frequencies between the Zhuang population and 26 populations is shown in Fig. 1. The investigation demonstrated that the Zhuang population exhibited significant differences in four SNVs when compared to CDX, CHS, and KHV, and 31 SNVs when compared to ESN, GWD, and YRI. The Zhuang population exhibited differences in a number of SNVs when compared to other populations, including JPT (8), KHV (5), ACB (27), ASN (24), LWK (29), MSL (28), CLM (22), MXL (17), PEL (18), PUR (18), CEU (21), FIN (22), GBR (22), IBS (20), TSI (23), BEB (15), GIH (23), ITH (20), PJL (22), and STU (22). Furthermore, the Zhuang population showed significant differences in rs776746 (CYP3A5), rs4291 (ACE), and rs1805123 (KCNH2) compared to 26 other populations. Moreover, the Zhuang population exhibited significant differences in rs1065852 (CYP2D6) compared to 21 other populations (refer to Table 2 and Table S2).

Figure 1
figure 1

The amount of difference variants between the Zhuang population and 26 populations. The size of the rectangle indicates the number of different variants between the Zhuang population and the other 26 populations from five regions.

Table 2 Genotype frequency distribution differences of 26 populations compared with the Zhuang population after Bonferroni’s multiple adjustment.

Genetic structure analysis of 27 populations

A model-based clustering approach was used to analyze the genetic structure of the 27 populations distributed in Africa, America, East Asia, Europe and South Asia to further analyze their relationship. Based on the Structure 2.3.1 Software, different K values ranging from 5 to 8 were hypothetically considered in structure analysis. When K = 5, the groups were divided into 5 subgroups based on the relative majority probability of assigning individuals to subgroups (subgroup 1: GWD and LWK; Subgroup 2: BEB, CEU, FIN, GBR, IBS, TSI, CLM, MXL and PUR; Subgroup 3: Zhuang, CDX, CHB, CHS, JPT, KHV and PUR; Subgroup 4: GIH, ITU, PJL and STU; Subgroup 5: ACB, ASW, ESN, MSL and YRI). It can be observed from Fig. 2 that Zhuang population have a stronger affinity with CDX, CHB, CHS, JPT, KHV and PUR. This is consistent with the results in Table 2.

Figure 2
figure 2

Structure analysis of the genetic relationship between the Zhuang population and the other 26 populations. K denotes the possible numbers of parental population clusters. Each vertical bar represents a sample, dividing into color sections. K = 5 were utilized to evaluate the relationship between Zhuang and 26 populations.

The pairwise Fst values were used to assess relationships among 27 populations, as shown in Table 3 and Fig. 3A. The Fst values between the Zhuang population and the East Asian population (CDX, CHB, CHS, JPT and KHV) were small, which were 0.065, 0.068, 0.066, 0.073 and 0.067, respectively (Table 3). Smaller Fst values indicate closer relationships between the two groups and suggest that they share similar genetic backgrounds. The result is confirmed by the phylogenetic trees of 27 populations shown in Fig. 3B.

Table 3 Pairwise Fst values among the Zhuang and 26 populations.
Figure 3
figure 3

Fst value heamap and phylogenetic tree among 27 populations. (A) Heatmap based on the pairwise Fst values between 27 populations. (B) The phylogenetic tree was constructed by the neighboring-joining method among 27 populations.

Genotype frequencies of four significantly different SNVs

Moreover, the genotype frequency distribution of rs776746 (CYP3A5), rs4291 (ACE), rs1805123 (KCNH2), and rs1065852 (CYP2D6) in 26 populations are shown in Fig. 4. The genotype frequency of rs4291-AT in the Zhuang population is remarkably higher than that of the other 26 populations. The CC genotype frequency of rs776746 is similar to that of EUR and significantly higher than that of AFR. In the Zhuang population, the frequency of the rs1805123-GG genotype is notably higher compared to that observed in the other 26 populations. The frequency of rs1065852-GG is similar to that of KHV, CHS, CHB, and CDX, and lower than that of other populations.

Figure 4
figure 4

The distribution of genotype frequencies for significantly different SNVs in 27 populations at the rs776746, rs4291, rs1805123 and rs1065852.

MAF distribution of four significantly different SNVs

Based on the allele frequencies calculated in this study, we plotted a map of the MAF distribution of VIP variants that substantially differed from the other 26 populations. According to Fig. 5, the allele frequency of rs776746 at CYP3A5 in the Zhuang population was similar to that in the European population, despite their close genetic affinity with East Asians. The G allele of rs1805123 at KCNH2 was nearly fixed in the Zhuang population and had a low frequency in other global populations. The MAF of rs1065852 (CYP2D6) was similar to that of East Asians and higher than other populations. However, there were no significant differences in T allele frequency for rs4291 at ACE among different populations.

Figure 5
figure 5

The map of the allele frequency distribution for significantly different SNVs in 27 populations at the rs1065852, rs4291, rs776746 and rs1805123.

Clinical relevance of significant variants

The Table 4 presents the clinical annotation information of the VIP variants in PharmGKB. The genotype frequency of rs776746 (CYP3A5) has been shown to have an impact on the dose, toxicity and metabolism of tacrolimus19,20,21. Specific mutations in rs4291 (ACE) have been implicated in the metabolism of anti-hypertensive drugs such as amlodipine, sodium chlorthalidone and lisinopril22. Furthermore, they also influenced the risk of aspirin intolerance in asthmatics exposed to aspirin23. The efficacy of metoclopramide in patients with gastric disease was found to be associated with rs1805123 (KCNH2)24. Rs1065852 played an indispensable role in the regulation of the α-hydroxymetoprolol metabolism in patients with non-small cell lung cancer25.

Table 4 Clinical annotation of very important pharmacogenomic variants with significant differences.

Prediction of functional damage in proteins

Subsequently, we used the PolyPhen-2, SNAP2, FATHMM, Mutationtaster, and Mutationassessor online databases to predicte whether the four SNVs would affect protein structure and function (Table 5). The results indicated that rs1805123 would cause a mutation from K to T at position 897 of KCNH2, however, this mutation was considered benign and less harmful to the protein in most databases. In contrast, rs1065852 caused a mutation from P to A at the 34th position of CYP2D6. The database predicted that this change would severely impair the protein's function and potentially contribute to certain diseases. Additionally, Chimera v1.16 was utilized for predicting the structure of point mutations of CYP2D6 and KCNH2, as shown in Fig. 6.

Table 5 The functional analysis of missense variants using PolyPhen-2, SNAP2, Mutationassessor, FATHMM, and Mutationtaster.
Figure 6
figure 6

Structural prediction of point mutated proteins. (A) 3D structure of the CYP2D6 protein, with the yellow part being a SNV. (B) rs1065852 mutated local structure. (C) 3D structure of the KCNH2 protein, with the yellow part being a SNV mutation. (D) rs1805123 mutated local structure.

Discussion

During the development of biological sciences, it has gradually been realized that genetic differences between populations have an essential influence on drug metabolism, dosages and ADRs. This can potentially affect the efficacy of certain medications in specific populations. Pharmacogenomics research is gradually illuminating the genetic factors responsible for variations in drug utilization among diverse populations. For instance, an important study conducted by Wen et al. demonstrated that there were significant differences in allele frequencies of key genetic variants affecting drug selection and dosing between Hmong and East Asian populations26. Furthermore, pharmacogenomics studies have been reported on Mongolian27, Tibetan28, and Blang29, among others. However, there have been few pharmacogenomics studies conducted on the Zhuang population.

In this study, 200 Zhuang subjects in Yunnan Province were recruited and genotyped for 48 VIP variants on 24 candidate genes. The genotypic distribution was compared to that of 26 populations from the 1000G dataset. The results revealed significant differences in CYP3A5 (rs776746), ACE (rs4291), KCNH2 (rs1805123), and CYP2D6 (rs1065852) between Zhuang population and the other 26 populations. We also used the PharmGKB database to annotate significantly different SNVs. Our study on VIP polymorphism in the Zhuang population may provide tailored therapy for the Zhuang population.

The cytochrome P450 (CYP) superfamily is an ancient enzyme family found in hundreds of eukaryotic and prokaryotic organisms30. The human genome encodes 57 putative functional CYP genes, as well as 58 pseudogenes. Among these 57 functional human CYPs, 12 are involved in the metabolism of 70–80% commonly used drugs, including CYP2D6 and CYP3A531. The human CYP2D6 gene is relatively short, spanning only about 4.3 Kbps on the long arm of chromosome 22 (22q13.2). The CYP2D6 gene is composed of 9 exons and encodes the CYP2D6 protein, which is localized in the endoplasmic reticulum. This protein exhibited highly expressed levels in the liver, brain, intestinal tissues, and lymphocytes32. There were large population differences in the distribution of CYP2D6 alleles, which could lead to variations in drug utilization among different populations33. rs1065852 has been reported to result in reduced protein stability and a poor response to drugs such as iloperidone, atorvastatin, antidepressants, and antipsychotics34,35,36,37. Moreover, allele A was associated with decreased clearance of alpha-hydroxymetoprolol in healthy individuals and a higher plasma concentration of S-didesmethyl-citalopram when treated with citalopram or escitalopram in people with Depressive Disorder compared to allele G25,38. In this study, the MAF of rs1065852-A (54.80%) was higher than that in SAS (16.70%), EUR (20.30%), AFR (11.60%), and AMR (14.60%). In addition, the frequency of rs1065852-GG was lower than that in other populations except for EAS. Therefore, the differences in drug efficacy and safety caused by CYP2D6 rs1065852 should be taken into consideration in the Zhuang population.

CYP3A5, which is located in chromosome 7q21.1, is involved in the metabolism of many drugs. Tacrolimus, one of the substrates of CYP3A5, is widely used as an immunosuppressive agent for organ transplantation39. The expression of CYP3A5 varied among different populations, which may have an impact on drug metabolism in those populations21. One study identified that genotype CT was associated with a higher tacrolimus dose in renal transplant patients compared to genotype CC21. The results of Flores-Pérez et al. revealed that critically ill Mexican pediatric patients with the CYP3A5*3 allele variant (rs776746) had increased plasma levels of midazolam and higher drug clearance 3 h after the end of the infusion compared to carriers with the normal allele40. The study by Liang et al. pointed out that individuals with the rs776746-CC had an increased risk of amlodipine-induced peripheral edema in a dominant model among Chinese Han hypertensive patients41. In our study, the frequencies of CT, TT and CC of rs776746 were 0.5%, 9.5% and 90.0%, separately. The frequency of CT genotype in the Zhuang was lower than that in the other 26 populations, highlighting the importance of considering metabolism and absorption of specific drugs in the Zhuang population.

KCNH2 is a gene that encodes a component of voltage-activated potassium channel found in cardiac muscle, neuronal cells, and microglia. Four copies of this protein interact with a copy of the KCNE2 protein to form a functional potassium channel. Mutations in this gene can lead to long QT syndrome type 2 (LQT2)42. A recent study has identified KCNH2 p.Gly262AlafsTer98 as a novel pathogenic variant associated with long QT syndrome in a Spanish population43. In a separate study, it was found that KCNH2 mutations cause fetal biventricular densified cardiomyopathy with pulmonary stenosis and bradycardia44. The efficacy of metoclopramide in patients with gastric disease was found to be correlated with the polymorphism of KCNH2 gene (rs1805123, p = 0.020)24. A Marjamaa et al. found that allele G of rs1805123 was associated with a shorter QT interval in a Finnish population compared to the TT genotype45. In our study, the frequency of rs1805123-G was significantly higher in the Zhuang population than in the other 26 populations (88.70%). The rs1805123 causes a K-T mutation at site 897 of KCNH2. Although most databases predict this mutation to be benign, attention should be paid to the shorter/long QT interval and the dose of metoclopramide in the Zhuang population.

ACE, which encodes an enzyme, is known to participate in the regulation of blood pressure and electrolyte balance. Numerous studies have shown that ACE is closely associated with nervous system diseases46,47, cardiovascular diseases48, and hypertension49,50. In a previously published study, we found that the rs4291 genotype influenced drug dosing in the treatment of the disease. De Oliveira et al. found a correlation between the use of brain-penetrating angiotensin converting enzyme inhibitors (ACEIs) (such as captopril or perindopril) in antihypertensive therapy and rs429146. Another study has confirmed that the AA genotype of rs4291, compared to the genotype AT + TT, is associated with a reduced severity of renal failure in patients with Alzheimer's disease treated with captopril51. Furthermore, rs1800764 and rs4291 also formed haplotypes. A study discovered that ACEIs decelerated cognitive decline in individuals carrying the ACE haplotype with rs1800764-T and rs4291-A, as well as those carrying the APOE4 haplotype with either rs1800764-T or rs4291-T, regardless of changes in blood pressure52. Our study demonstrated that the allele frequency of rs4291 (ACE) differed significantly between the Zhuang population and the other 26 populations, which has been found to be associated with drug metabolism for various diseases, such as captopril, aspirin, and amlodipine. It may provide guidance for precision drug administration in the Zhuang population.

Our results are likely to complement the pharmacogenomic information of the Zhuang population and refine the study on the differences between the Zhuang population and the other 26 populations. More importantly, this study may provide certain theoretical support for drug use in the Zhuang population. Nonetheless, there are some limitations to our study. Our sample size was relatively small in this study. To design a comprehensive, systematic, disease-specific treatment protocol for the Zhuang population, we need to further expand the sample size for more in-depth studies. In addition, only Agena MassARRAY was used for genotyping in this study, and no other orthogonal method was employed to validate the sequencing, which will be utilized in subsequent studies.

Conclusion

In short, the genotype frequencies of CYP3A5 (rs776746), ACE (rs4291), CYP2D6 (rs1065852), and KCNH2 (rs1805123) showed significant disparities between the Zhuang population and 26 other populations. Our study can not only enrich the pharmacogenomics database of the Zhuang population but also provide a theoretical basis for tailored therapy in this population and ensure safe drug use for patients.