The human cytochrome P450 2B6 enzyme (CYP2B6) plays a pivotal role in the metabolism of different drugs used for malaria treatment (artemisinin derivatives such as artesunate, β-artemether and artemether) and for HIV life-long therapy (non-nucleoside reverse-transcriptase inhibitors such as efavirenz and nevirapine). CYP2B6 is a highly polymorphic enzyme that affects the therapeutic response including drug interactions in individuals1,2. Importantly, African populations show a high degree of variation in the CYP2B6 gene3. In Botswana, interventions towards the elimination of Plasmodium falciparum malaria have been intensified and HIV has been reported at a frequency of 18.5%4. So, a deeper knowledge of the genetic variability of CYP2B6 in the population of Botswana is necessary to improve the efficacy of ongoing fight against malaria and HIV.

Currently, the first line treatment for uncomplicated malaria in Botswana is a fixed-dose combination of artemether (AM) and lumefantrine (LU)5. After oral administration, AM is rapidly absorbed and metabolized to dihydroartemisinin (DHA) through demethylation, mainly via CYP2B6 enzymes6,7,8. DHA has a higher antimalarial activity than AM, and it is inactivated primarily by glucuronidation7,9.

Efavirenz (EFV) and nevirapine (NVP) are the most prescribed drugs in anti-HIV combination treatment in resource-limited countries10. There is a considerable literature on EFV and NVP pharmacogenetics and currently it has been proposed that personalizing the dosage of these anti-HIV drugs according to the CYP2B6 genotype of the patient will be of therapeutic benefit11.

The known single nucleotide polymorphisms (SNPs) within the CYP2B6 locus influencing AM and EFV/NVP plasma exposure are: 516 G > T (rs3745274), 785 A > G (rs2279343) and 983 T > C (rs28399499). The polymorphisms at position 516 and 983 confer a slow metabolic phenotype, leading to higher drug plasma exposure and increased toxicity risk. The variation at position 785 confers rapid drug metabolism, leading to lower drug plasma level and potentially poorer therapeutic outcome10. The above-mentioned SNPs, in combination or individually, may lead to the following CYP2B6 alleles: *4 (785 A > G only), *6 (516 G > T and 785 A > G), *9 (516 G > T only), *16 (785 A > G and 983 T > C), *18 (983 T > C only) ( The metabolic status for a given drug is defined as slow, intermediate, extensive or ultra-rapid as a result of the individual’s genetic make up12. Furthermore, in the anti-infectious therapy context, the therapeutic outcome based on the individual metabolic status for a given drug depends also on the length and frequency of therapy. For example, in the context of the short course uncomplicated malaria treatment, a slow AM metabolism induces a lower biotransformation rate to the most active metabolite DHA7 leading to a lower efficacy and possibly a longer tail during which malaria parasites are exposed to sub-inhibitory drug concentrations, inducing an increased permanence in the selective window for drug resistance13,14. Conversely, in the context of chronic EFV- or NVP-based antiretroviral combination therapy (ARTc), a slow metabolic profile reduces drug clearance leading to higher drug plasma exposure and consequent higher toxicity risk. This may cause poorer compliance by the patient with resultant emergence of viral resistance possibly related to suboptimal drug exposure10,15. Concerning the influence of rapid metabolism phenotypes on antimalarial and antiretroviral therapy, information is lacking10.

Another important aspect is that malaria treatment in sub-Saharan Africa may frequently overlap EFV- or NVP-based ARTc leading to possible drug-drug interactions. The co-administration of EFV or NVP and AM-LU was associated with low plasma concentration of AM and DHA, whereas LU concentration was reduced by EFV co-administration16,17. A recent study, looking at HIV-infected patients affected by malaria in Tanzania, has shown that the slow metaboliser genotype CYP2B6 *6/*6 leads to high EFV plasma concentration which are significantly correlated with low LU plasma concentrations and a high rate of recurrent parasitemia18, increasing the risk of the appearance of drug-resistant malaria parasites.

Finally, Botswana (together with other Southern African countries) is also home to individuals with unique southern African KhoeSan ancestry. The KhoeSan populations are the earliest known indigenous inhabitants of southern Africa and are distinguished by their unique phenotype(s), genetic divergence, click languages, and hunter-gatherer subsistence strategy compared to other African populations19,20,21. During evolution, hunter-gatherer practices exposed populations to several xenobiotics which sometimes imposed serious health and environmental risks, leading to the selection of specific mutations linked to an efficient detoxification. For example, significant differences in prevalence of acetylation phenotypes are found between hunter-gatherer and food-producing populations, both in sub-Saharan Africa and worldwide, and between agriculturalists and pastoralists in Central Asia22. In Africa, comparative studies on N-acetyl transferase (NAT2) haplotype frequencies and acetylation status inference revealed that the hunter-gatherer populations (San, but also Pygmies) are mainly composed of fast and intermediate acetylators23, in clear contrast with most agriculturalist populations (such as Bantu). A recent report showed differences in the allele frequency for CYP2C8*2 among San and Bantu-related communities in Botswana24. These general observations highlight the need for a detailed pharmacogenetic characterization of populations of pure or admixed KhoeSan ancestry in Botswana to accommodate individual/population genetic make-up and expedite optimum treatment strategies against important infections such as malaria and HIV infection.

In this study we aim to: i) define the CYP2B6 haplotype status in Botswana via the three most important functional SNPs known to affect the enzyme activity; ii) examine the relationship, if any, of the haplotypes within ethnic groups; iii) apply the “metabolic score” in the study of inferred metabolic phenotypes among the populations. We selected three sites where the ethnic composition is different (San-related population in the Ghanzi health district and Bantu-related populations in the Chobe and Serowe/Palapye health districts, respectively) (see Fig. 1). This work is part of the ongoing screening activities for antimalarial drugs efficacy and safety study in Botswana unto malaria elimination by 2020, but it also provides useful information on antiretroviral drug metabolism in the population, making this research of public health relevance for the country.

Figure 1
figure 1

Map of Botswana.


Out of 609 samples we obtained genotypic data for all the three polymorphisms from 570 samples (93.60%). The genotype and allele frequency for 516 G > T, 785 A > G and 983 T > C in the three selected districts are shown in Table 1 with statistical comparisons.

Table 1 Genotype and allele frequencies for CYP2B6 516-785-983 SNPs and comparisons among the groups.

Hardy-Weinberg Equilibrium

Hardy-Weinberg equilibrium analysis showed that CYP2B6-516 displayed significant deviations from Hardy–Weinberg equilibrium in samples from Serowe/Palapye district due to a significant heterozygous defect (Wright’s F = 0.13), with the same sample having a significant CYP2B6-983 heterozygous excess (Wright’s F = −0.17). Genotypes in samples from Chobe and Ghanzi districts were in Hardy-Weinberg equilibrium. CYP2B6-785 genotypes were in equilibrium in all three districts analysed. Regarding the only two samples showing a rare CC genotype for 983 T > C polymorphism25,26, we sequenced them and confirmed the result.

Linkage Disequilibrium analysis

A significant linkage disequilibrium (LD) was found with Arlequin between all pairwise comparisons of the three polymorphic loci when samples were considered as a single population (n = 570). Chi-square test values ranged from 36.07 (P < 0.00001, 1 df) for CYP2B6-516 vs CYP2B6-983, to 443.55 (P < 0.00001, 1 df) for CYP2B6-516 vs CYP2B6-785, to 43.93 (P < 0.00001, 1 df) for CYP2B6-983 vs CYP2B6-785. When analysing the samples separately by district of origin, we found a significant LD in all districts for all pairwise comparisons between the three loci (Table 2). In general, LD between CYP2B6-516 and CYP2B6-785 was strongest, while LD between CYP2B6-983 and the other two loci was weakest but still highly significant.

Table 2 Pairwise LD analysis for the three polymorphic loci. Chi-square values and P-value for LD analysis were obtained using Arlequin. Significance is assumed for P < 0.05.

Haplotype frequency estimation

Haplotype frequencies were estimated using Arlequin (Table 3). For the combined samples, the GAT haplotype was the most common and the GGC haplotype the rarest. Among the identified haplotypes, two not yet categorized were found, TGC and TAC, indicated as *6 + *18 and *9 + *18, respectively (Table 3). When districts were considered separately, GAT remained the dominant haplotype while GGC remained the rarest. However, differences in haplotype frequencies could be observed between districts. While there was no statistical difference in the population structure between the Serowe/Palapye and Chobe districts (Population pairwise FSTs test: P = 0.06 +/− 0.024, 110 permutations), there was a significant difference when both districts were separately compared to the Ghanzi district (Population pairwise FSTs test: P < 0.00001, 110 permutations for both comparisons). When the Serowe/Palapye and Chobe districts were combined and then compared to the Ghanzi district, the populations were still significantly different (Population pairwise FSTs test: P < 0.00001, 110 permutations). In particular the Ghanzi district displayed in particular a higher frequency (as estimated by ML) of the GAT (65.55%), TAT (10.98%) and TAC (1.53%) haplotypes and a lower frequency of the TGT (12.79%) and TGC (3.18%) haplotypes (Table 3). These data indicated a distinct population structure in terms of the CYP2B6 alleles in the Ghanzi district compared to both the Serowe/Palapye and Chobe districts.

Table 3 Haplotype frequencies by district and for all the samples combined. Maximum-likelihood (ML) haplotype frequencies were calculated using the EM algorithm in Arlequin.

Neutrality tests

For each sample, the most likely gametic phases (as determined by Arlequin) were selected to perform neutrality tests. The results of the Tajima’s D tests indicated positive values above 2 for the Serowe and Chobe populations with significant P-values and 1.26 for the Ghanzi samples with a non-significant P-value. The Chobe and Serowe combined population dataset (i.e. Bantu) also yielded a statistically significant D value above two (Table 4). The significantly positive Tajima’s D scores among the Bantu groups could either indicate the presence of balancing selection, migration or a sudden population contraction, or even indicate positive selection on pre-existing genetic variation27.

Table 4 Results of the Tajima’s D neutrality test. Tests were performed with DnaSP v6. P-values are provided in brackets.

Metabolic score

The expected metabolic scores were analysed among the three populations (Table 5 and Fig. 2).

Table 5 Distribution of the inferred metabolic phenotype and scores by district, and Kolmogorv-Smirnov (K-S) D statistics with associated P values to test the normal distribution of data.
Figure 2
figure 2

Distribution of the inferred metabolic scores by districts. PM = poor metabolisers; I = intermediate metabolisers (either with delayed or increased metabolism); EM = extensive or ‘normal’ metabolisers; UR = ultra-rapid metabolisers.

The Kolmogorov-Smirnov test did not find any statistically significant divergence from normal distribution for the metabolic scores in each of the three sub-samples (Table 5). The Bartlett test for homogeneity of variances showed that the three variances were not statistically different from each other (Bartlett’s chi square = 3.32, df = 2, P = 0.19). The chi-square statistic in the comparison of the absolute frequencies distribution of the metabolic scores among all districts was 9.43 with the associated P-value being 0.49, indicating that the distributions of metabolic scores in the three groups were comparable. The CYP2B6 inferred phenotypes associated with a reduced metabolism were found in 36.67% (n = 209/570) of the overall population studied, mainly intermediate (94.25%, n = 197/209) rather than poor metabolizer (5.75%, n = 12/209). Phenotypes associated with increased metabolism were found in 4.21% (n = 24/570) of the population studied, with only 2 individuals showing an inferred ultra-rapid metabolic phenotype.


African populations represent the most genetically diverse populations in the world and this complicates the already inadequate treatment strategies developed for several communicable and non-communicable diseases. Our study in Botswana showed that haplotype composition is diverse between the Bantu-related communities from Serowe/Palapye and Chobe districts, and the San-related communities of the Ghanzi area. This probably reflects the different genetic background and evolutionary history of hunter-gatherers and food producing populations (farming and pastoralists) in Southern Africa28. We pooled together the haplotype data from Serowe/Palapye and Chobe districts considering them as a single population since there was no statistical difference in the population structure between the two districts. The frequency for CYP2B6 516 G > T is in line with few other studies from Botswana29,30. It should be pointed out that looking at the single SNP CYP2B6 516 G > T, Serowe/Palapye and Chobe districts do show statistical difference in genotypes distribution (see Table 1). However, haplotypes better predict the population structure than single SNPs31, and this observation is also true in CYP2B6 because of the known linkage among the sites2, as confirmed in this study. We identified two haplotypes not yet categorized into alleles that however were already described in other studies32,33.

A possible explanation of the difference observed for genotype distribution at 516 polymorphism between the two districts inhabited by Bantu could be due to the HWE of genotypes that was present only in Chobe district. The deviation from the HWE in Serowe/Palapye district was also found for the 983 polymorphism. Deviation from HWE could be due to lack of CC genotypes for CYP2B6-983, LD between SNPs at position 516 and 983. In addition, Serowe/Palapye has a wider sample size that increases the likelihood of deviation from the HWE when one of the genotypes (CC) is absent or very rare. Other studies in Africa found absence of 983-CC genotypes which prevented testing for HWE34,35. Another possible factor affecting both CYP2B6-516 comparisons and HWE in Serowe/Palapye district could be ethnic admixture since it is know that the Bantu-related population of Botswana carries a variable proportion of KhoeSan ancestry36,37.

An important result of our study is that the metabolic inferred phenotypes are similar among the three investigated sites. The main reason for this phenomenon could be due to the homeostatic effect of the mutations when taken together. Thus, despite clear and high levels of haplotype diversity among the sites, we observe an inferred phenotypic convergence, with the result that, globally, drug metabolism remains consistent and comparable among the populations studied. It is worth noting that convergent phenotypic evolution is a known phenomenon in biology38,39 and also relevant in human populations for several characters included skin pigmentation, lactose tolerance and immune responses40,41,42,43. The CYP2B6 enzyme plays an important primary role in bioactivating and detoxifying a certain number of procarcinogens and environmental agents44,45 as well as processing the arsenal of plant chemical defences introduced with diets46. The homogeneous phenotypes among the different sub-samples of this study are probably due to adaptation to the environmental and/or toxicological conditions of the Kalahari and surrounding areas. Based on the Tajima’s D neutrality test results, the two different patterns between Bantu and San could either indicate the presence of balancing selection or alternatively migration or a sudden population contraction in the Bantu population but not in the San. The results could also be possible evidence for positive selection on pre-existing genetic variation. A similar trend was already found by Fuselli et al.47 for the CYP2D6 gene and by Podgorná et al.22 for the NAT2 gene comparing hunter gatherers with food producing populations in Africa. There is thus some evidence that the current haplotypes might have evolved independently among the different ethnic groups and may thus represent a spectrum in terms of historical and potential adaptations to different ecological and toxicological niches. However, analysis of more genes and control groups will be required to exclude non-evolutionary events (e.g. migration into the Bantu populations) as the underlying cause of the observed results.

According to the metabolic score, expected metabolic phenotypes were obtained: 59.12% of the population study showed a CYP2B6 extensive metabolic phenotype, whereas 36.67% and 4.21% had a reduced and increased metabolic phenotype respectively (see Table 5). The poor and the ultra-rapid metabolic phenotype were observed in 2.1% (n = 12/570) and 0.35% (n = 2/570) of individuals respectively, whereas 34.56% (n = 197/570) showed an intermediate reduced metabolic phenotype, and 3.86% (n = 22/570) had an intermediate increased metabolic phenotype. Individuals with a CYP2B6 reduced metabolic phenotype may have an increased risk of malaria treatment failure when treated with artemisinin derivatives48, as well as an impaired outcome of EFV or NVP-based ARTc10,29. This effect on malaria therapy efficacy could constitute an important obstacle in the malaria elimination phase, suggesting the need to strengthen the surveillance of AM-LU drug efficacy in Botswana. It is important to note that Chobe is a seasonal malarious area bordering with Namibia, Zambia and Zimbabwe, whereas Serowe/Palapye is more prone to unstable epidemics based on weather patterns49,50. Differently, in Ghanzi, which does not usually receive high amounts of rainfall, malaria outbreaks do occur at times, as it is currently the case (

Concerning individuals having an inferred ultra-rapid metabolic phenotype, they are rare and the effect of their metabolic pattern seems to affect mainly EFV or NVP-based ARTc by leading to sub-therapeutic drug exposure with a potential increase of risk of selection for viral resistance10. Further studies should focus on this fast metaboliser fraction of the population almost neglected in the scientific literature.

Finally, it is worth noticing that drug-drug interaction in malaria and HIV co-infections can lead to therapeutic consequences. For examples, EFV was shown to reduce AM AUC by 80%, tripling the dose of AM needed to compensate EFV-inductive effect51. In another study, EFV was shown to reduce LU bioavailability52. To ensure antimalarial treatment success in HIV/malaria co-infected patients on EFV-based ART, an increase of dosage or an extension of the duration of AM-LU treatment using the current dose was proposed from several authors17,53. Studying how this is affected by drug metabolism phenotype is another important topic that will deserve future attention.

Moreover, malaria infection stimulates HIV replication, causing transient elevation in viral load54 that can hamper the management of HIV infected patients, possibly amplifying HIV prevalence55 and this could be relevant for Botswana.

Our work has some limitations one being the absence of metabolic phenotypes, since this is a pure genetics study. Furthermore, we did not measure the extent of KhoeSan ancestry, instead we based our definition of San-related ethnicity on family names and sites. However, based on the results from our current study, we can conclude that although ethnically and genetically different, populations in Botswana display convergent evolution in their drug metabolism. The presence of significant numbers of slow and fast metabolisers can significantly impact the emergence and spread of drug resistance in malaria and HIV, either by exposing pathogens to sub-lethal drug doses or inducing non-compliance in patients, with similar consequences. The high frequency of co-infections and the negative interaction between anti-HIV and anti-malaria drugs further exacerbates the risk of resistance emerging. This warrants constant monitoring in the population to identify potential patients with abnormal drug metabolism and adapt treatments accordingly.


Sample collection

The survey was performed from March to May 2012 in Botswana in the broader context of a Malaria Indicator Survey56,57. A total of 609 unrelated children aged 2–12 years asymptomatic for malaria were enrolled from primary schools and child welfare clinics: 288 from the Serowe/Palapye district, 159 from the Chobe district and 162 from the Ghanzi (or Gantsi) district. These districts have peculiar ethnic compositions, as shown in previous studies24,58. The same protocol for enrollment was followed in all sites. Written informed consent for multiple genetic and epidemiological surveys was obtained from all the subjects’ parents/caregivers before enrollment in the study. This study was conducted in accordance with the guidelines of the Helsinki Declaration of 2000 revised 2013, with the approval from Human Research and Development Division of the Botswana Ministry of Health [PPME-13/18 V (380)] and the Institutional Review Board of the University of Pennsylvania [protocol number 820378]. Three ml of whole blood was collected into EDTA-containing tubes by qualified health care workers.

DNA extraction and PCR-RFLP conditions

DNA was extracted with Qiagen kits according to the manufacturer’s protocol. CYP2B6 516 G > T (rs3745274) detection was carried out using a PCR-RFLP technique according to the protocol of Lavandera et al.59 with minor modifications. For CYP2B6 983 T > C (rs28399499) detection we applied a touchdown PCR-RFLP assay developed in our laboratory34. Finally for the purpose of this study we adopted a new in-house protocol for the analysis of the CYP2B6 785 A > G (rs2279343) polymorphism. Briefly, we designed two primers that amplify a 223 bp fragment of the CYP2B6 gene (forward primer: 5′-AACCTGCAGGAAATCAATGC-3′; reverse primer: 5′-CCTTCTTCCCTCCCCATCTTC-3′). For PCR cycling, after 5 min of denaturation at 94 C, the PCR mixture was subjected to the following conditions: 30 s at 94 C, 30 s at 65 C and 30 s at 72 C for 35 cycles, with a final step of 10 min at 72 C. The PCR product was then incubated with NlaIV restriction enzyme that cuts the mutant allele (G) in three fragments of 144 bp, 62 bp and 17 bp; while the wild-type allele (A) is digested in two fragments of 161 bp and 62 bp. The digested fragments were visualized on a 4% agarose gel. Negative control comprised of PCR reaction without a DNA template and controls for human genotyping were utilized after sequencing the different genotypes.

Hardy-Weinberg Equilibrium evaluation

Evaluation of Hardy-Weinberg equilibrium was performed using the HWSIM software (freely available at and Monte-Carlo permutation test performed when genotypic classes had an expected cell size of less than five. Wright’s F statistics was applied to evaluate the expected level of heterozygosity.

Linkage Disequilibrium, haplotype frequency analysis and Neutrality test

Arlequin v3.560 was used to test for linkage disequilibrium between the three loci and for haplotype reconstruction. Several other studies were able to evaluate haplotype structure and population differentiation looking at functional SNPs in the same gene61,62,63,64. The input consisted of diploid genotypic DNA data with unknown gametic phase and assuming co-dominance. The expectation-maximization (EM) algorithm used to test for Linkage Disequilibrium (LD) was run for 20,000 permutations and 3 initial conditions, based on recommended criteria. For haplotype reconstruction the EM algorithm was run on the same input file used for the LD test at the haplotype level, with 50 starting points and 1,000 iterations. Other parameters were set to default. The Excoffier–Laval–Balding (ELB) algorithm (with default settings) was used to generate haplotype counts. Population haplotype frequencies were determined both by haplotype counts and estimated by Maximum Likelihood (ML) and were compared using pairwise FSTs calculated using default values.

After determining the gametic phase of the samples using the ELB algorithm in Arlequin, the DnaSP (v6) software65 was used to calculate and evaluate the Tajima’s D neutrality test using the total number of mutations.

Metabolic score

We elaborated and adopted a “metabolic score” to take into account the global effect of the three polymorphisms together, according to an extensive literature linking genetic polymorphisms to functional impact10,66,67,68. For the metabolic score we translated the genotype information into a measure of phenotype using an ‘activity score’ system already adopted for CYP2D647,69, for CYP2A670 and for CYP2C1971. The metabolic score adopted is based on the algebraic sum of the individual allele values, according to an additive model for CYP2B672,73. The scores were set conferring a −1 value for each slow metabolism allele and +1 for rapid metabolism allele, while an extensive metabolism allele was scored 0. Accordingly, we obtained: 516 GG = 0; 516 GT = −1; 516 TT = −2; 983 TT = 0; 983 TC = −1; 983 CC = −2; 785 AA = 0; 785 AG =  +1; 785 GG =  +2. Each composite genotype was attributed a final metabolic score by summing the single score for each SNP.

To test possible deviation from the normal distribution of the scores the Kolmogorov-Smirnov test was used and the statistic D values calculated for each distribution. A related P-value greater than 0.05 indicates normal distribution of data74. Bartlett’s test was performed to assess if the assumption of equal variances among the three populations was valid75. Comparisons among metabolic scores were calculated using the chi-square statistic.

All the statistical calculations were performed using Statistica 13.0 software (StatSoft).