Human cytochrome P450 2B6 genetic variability in Botswana: a case of haplotype diversity and convergent phenotypes

Identification of inter-individual variability for drug metabolism through cytochrome P450 2B6 (CYP2B6) enzyme is important for understanding the differences in clinical responses to malaria and HIV. This study evaluates the distribution of CYP2B6 alleles, haplotypes and inferred metabolic phenotypes among subjects with different ethnicity in Botswana. A total of 570 subjects were analyzed for CYP2B6 polymorphisms at position 516 G > T (rs3745274), 785 A > G (rs2279343) and 983 T > C (rs28399499). Samples were collected in three districts of Botswana where the population belongs to Bantu (Serowe/Palapye and Chobe) and San-related (Ghanzi) ethnicity. The three districts showed different haplotype composition according to the ethnic background but similar metabolic inferred phenotypes, with 59.12%, 34.56%, 2.10% and 4.21% of the subjects having, respectively, an extensive, intermediate, slow and rapid metabolic profile. The results hint at the possibility of a convergent adaptation of detoxifying metabolic phenotypes despite a different haplotype structure due to the different genetic background. The main implication is that, while there is substantial homogeneity of metabolic inferred phenotypes among the country, the response to drugs metabolized via CYP2B6 could be individually associated to an increased risk of treatment failure and toxicity. These are important facts since Botswana is facing malaria elimination and a very high HIV prevalence.


Results
Out of 609 samples we obtained genotypic data for all the three polymorphisms from 570 samples (93.60%). The genotype and allele frequency for 516 G > T, 785 A > G and 983 T > C in the three selected districts are shown in Table 1 with statistical comparisons.
Hardy-Weinberg Equilibrium. Hardy-Weinberg equilibrium analysis showed that CYP2B6-516 displayed significant deviations from Hardy-Weinberg equilibrium in samples from Serowe/Palapye district due to a significant heterozygous defect (Wright's F = 0.13), with the same sample having a significant CYP2B6-983 heterozygous excess (Wright's F = −0.17). Genotypes in samples from Chobe and Ghanzi districts were in Hardy-Weinberg equilibrium. CYP2B6-785 genotypes were in equilibrium in all three districts analysed. Regarding the only two samples showing a rare CC genotype for 983 T > C polymorphism 25,26 , we sequenced them and confirmed the result. Linkage Disequilibrium analysis. A significant linkage disequilibrium (LD) was found with Arlequin between all pairwise comparisons of the three polymorphic loci when samples were considered as a single population (n = 570). Chi-square test values ranged from 36.07 (P < 0.00001, 1 df) for CYP2B6-516 vs CYP2B6-983, to 443.55 (P < 0.00001, 1 df) for CYP2B6-516 vs CYP2B6-785, to 43.93 (P < 0.00001, 1 df) for CYP2B6-983 vs CYP2B6-785. When analysing the samples separately by district of origin, we found a significant LD in all districts for all pairwise comparisons between the three loci ( Table 2). In general, LD between CYP2B6-516 and CYP2B6-785 was strongest, while LD between CYP2B6-983 and the other two loci was weakest but still highly significant.
Haplotype frequency estimation. Haplotype frequencies were estimated using Arlequin (Table 3). For the combined samples, the GAT haplotype was the most common and the GGC haplotype the rarest. Among the identified haplotypes, two not yet categorized were found, TGC and TAC, indicated as *6 + *18 and *9 + *18, respectively (Table 3). When districts were considered separately, GAT remained the dominant haplotype while GGC remained the rarest. However, differences in haplotype frequencies could be observed between districts. While there was no statistical difference in the population structure between the Serowe/Palapye and Chobe districts (Population pairwise FSTs test: P = 0.06 +/− 0.024, 110 permutations), there was a significant difference when both districts were separately compared to the Ghanzi district (Population pairwise FSTs test: P < 0.00001, 110 permutations for both comparisons). When the Serowe/Palapye and Chobe districts were combined and then compared to the Ghanzi district, the populations were still significantly different (Population pairwise FSTs test: P < 0.00001, 110 permutations). In particular the Ghanzi district displayed in particular a higher frequency (as estimated by ML) of the GAT (65.55%), TAT (10.98%) and TAC (1.53%) haplotypes and a lower frequency of the TGT (12.79%) and TGC (3.18%) haplotypes (Table 3). These data indicated a distinct population structure in terms of the CYP2B6 alleles in the Ghanzi district compared to both the Serowe/Palapye and Chobe districts.
Neutrality tests. For each sample, the most likely gametic phases (as determined by Arlequin) were selected to perform neutrality tests. The results of the Tajima's D tests indicated positive values above 2 for the Serowe and Chobe populations with significant P-values and 1.26 for the Ghanzi samples with a non-significant P-value. The Chobe and Serowe combined population dataset (i.e. Bantu) also yielded a statistically significant D value above two ( Table 4). The significantly positive Tajima's D scores among the Bantu groups could either indicate the presence of balancing selection, migration or a sudden population contraction, or even indicate positive selection on pre-existing genetic variation 27 .
Metabolic score. The expected metabolic scores were analysed among the three populations (Table 5 and The Kolmogorov-Smirnov test did not find any statistically significant divergence from normal distribution for the metabolic scores in each of the three sub-samples ( Table 5). The Bartlett test for homogeneity of variances showed that the three variances were not statistically different from each other (Bartlett's chi square = 3.32, df = 2, P = 0.19). The chi-square statistic in the comparison of the absolute frequencies distribution of the metabolic scores among all districts was 9.43 with the associated P-value being 0.49, indicating that the distributions of metabolic scores in the three groups were comparable. The CYP2B6 inferred phenotypes associated with a reduced metabolism were found in 36.67% (n = 209/570) of the overall population studied, mainly intermediate (94.25%, n = 197/209) rather than poor metabolizer (5.75%, n = 12/209). Phenotypes associated with increased metabolism were found in 4.21% (n = 24/570) of the population studied, with only 2 individuals showing an inferred ultra-rapid metabolic phenotype.

Discussion
African populations represent the most genetically diverse populations in the world and this complicates the already inadequate treatment strategies developed for several communicable and non-communicable diseases. Our study in Botswana showed that haplotype composition is diverse between the Bantu-related communities from Serowe/Palapye and Chobe districts, and the San-related communities of the Ghanzi area. This probably reflects the different genetic background and evolutionary history of hunter-gatherers and food producing populations (farming and pastoralists) in Southern Africa 28 . We pooled together the haplotype data from Serowe/   Palapye and Chobe districts considering them as a single population since there was no statistical difference in the population structure between the two districts. The frequency for CYP2B6 516 G > T is in line with few other studies from Botswana 29,30 . It should be pointed out that looking at the single SNP CYP2B6 516 G > T, Serowe/ Palapye and Chobe districts do show statistical difference in genotypes distribution (see Table 1). However, haplotypes better predict the population structure than single SNPs 31 , and this observation is also true in CYP2B6 because of the known linkage among the sites 2 , as confirmed in this study. We identified two haplotypes not yet categorized into alleles that however were already described in other studies 32,33 . A possible explanation of the difference observed for genotype distribution at 516 polymorphism between the two districts inhabited by Bantu could be due to the HWE of genotypes that was present only in Chobe district. The deviation from the HWE in Serowe/Palapye district was also found for the 983 polymorphism. Deviation from HWE could be due to lack of CC genotypes for CYP2B6-983, LD between SNPs at position 516 and 983. In addition, Serowe/Palapye has a wider sample size that increases the likelihood of deviation from the HWE when one of the genotypes (CC) is absent or very rare. Other studies in Africa found absence of 983-CC genotypes which prevented testing for HWE 34,35 . Another possible factor affecting both CYP2B6-516 comparisons and HWE in Serowe/Palapye district could be ethnic admixture since it is know that the Bantu-related population of Botswana carries a variable proportion of KhoeSan ancestry 36,37 .
An important result of our study is that the metabolic inferred phenotypes are similar among the three investigated sites. The main reason for this phenomenon could be due to the homeostatic effect of the mutations when taken together. Thus, despite clear and high levels of haplotype diversity among the sites, we observe an inferred phenotypic convergence, with the result that, globally, drug metabolism remains consistent and comparable among the populations studied. It is worth noting that convergent phenotypic evolution is a known phenomenon in biology 38,39 and also relevant in human populations for several characters included skin pigmentation, lactose     tolerance and immune responses [40][41][42][43] . The CYP2B6 enzyme plays an important primary role in bioactivating and detoxifying a certain number of procarcinogens and environmental agents 44,45 as well as processing the arsenal of plant chemical defences introduced with diets 46 . The homogeneous phenotypes among the different sub-samples of this study are probably due to adaptation to the environmental and/or toxicological conditions of the Kalahari and surrounding areas. Based on the Tajima's D neutrality test results, the two different patterns between Bantu and San could either indicate the presence of balancing selection or alternatively migration or a sudden population contraction in the Bantu population but not in the San. The results could also be possible evidence for positive selection on pre-existing genetic variation. A similar trend was already found by Fuselli et al. 47 for the CYP2D6 gene and by Podgorná et al. 22 for the NAT2 gene comparing hunter gatherers with food producing populations in Africa. There is thus some evidence that the current haplotypes might have evolved independently among the different ethnic groups and may thus represent a spectrum in terms of historical and potential adaptations to different ecological and toxicological niches. However, analysis of more genes and control groups will be required to exclude non-evolutionary events (e.g. migration into the Bantu populations) as the underlying cause of the observed results. According to the metabolic score, expected metabolic phenotypes were obtained: 59.12% of the population study showed a CYP2B6 extensive metabolic phenotype, whereas 36.67% and 4.21% had a reduced and increased metabolic phenotype respectively (see Table 5). The poor and the ultra-rapid metabolic phenotype were observed in 2.1% (n = 12/570) and 0.35% (n = 2/570) of individuals respectively, whereas 34.56% (n = 197/570) showed an intermediate reduced metabolic phenotype, and 3.86% (n = 22/570) had an intermediate increased metabolic phenotype. Individuals with a CYP2B6 reduced metabolic phenotype may have an increased risk of malaria treatment failure when treated with artemisinin derivatives 48 , as well as an impaired outcome of EFV or NVP-based ARTc 10,29 . This effect on malaria therapy efficacy could constitute an important obstacle in the malaria elimination phase, suggesting the need to strengthen the surveillance of AM-LU drug efficacy in Botswana. It is important to note that Chobe is a seasonal malarious area bordering with Namibia, Zambia and Zimbabwe, whereas Serowe/ Palapye is more prone to unstable epidemics based on weather patterns 49,50 . Differently, in Ghanzi, which does not usually receive high amounts of rainfall, malaria outbreaks do occur at times, as it is currently the case (http:// www.moh.gov.bw/press%20release/MALARIA%20PRESS%20RELEASE.pdf).
Concerning individuals having an inferred ultra-rapid metabolic phenotype, they are rare and the effect of their metabolic pattern seems to affect mainly EFV or NVP-based ARTc by leading to sub-therapeutic drug exposure with a potential increase of risk of selection for viral resistance 10 . Further studies should focus on this fast metaboliser fraction of the population almost neglected in the scientific literature.
Finally, it is worth noticing that drug-drug interaction in malaria and HIV co-infections can lead to therapeutic consequences. For examples, EFV was shown to reduce AM AUC by 80%, tripling the dose of AM needed to compensate EFV-inductive effect 51 . In another study, EFV was shown to reduce LU bioavailability 52 . To ensure antimalarial treatment success in HIV/malaria co-infected patients on EFV-based ART, an increase of dosage or an extension of the duration of AM-LU treatment using the current dose was proposed from several authors 17,53 . Studying how this is affected by drug metabolism phenotype is another important topic that will deserve future attention.
Moreover, malaria infection stimulates HIV replication, causing transient elevation in viral load 54 that can hamper the management of HIV infected patients, possibly amplifying HIV prevalence 55 and this could be relevant for Botswana.
Our work has some limitations one being the absence of metabolic phenotypes, since this is a pure genetics study. Furthermore, we did not measure the extent of KhoeSan ancestry, instead we based our definition of San-related ethnicity on family names and sites. However, based on the results from our current study, we can conclude that although ethnically and genetically different, populations in Botswana display convergent evolution in their drug metabolism. The presence of significant numbers of slow and fast metabolisers can significantly impact the emergence and spread of drug resistance in malaria and HIV, either by exposing pathogens to sub-lethal drug doses or inducing non-compliance in patients, with similar consequences. The high frequency of co-infections and the negative interaction between anti-HIV and anti-malaria drugs further exacerbates the risk of resistance emerging. This warrants constant monitoring in the population to identify potential patients with abnormal drug metabolism and adapt treatments accordingly.

Methods
Sample collection. The survey was performed from March to May 2012 in Botswana in the broader context of a Malaria Indicator Survey 56,57 . A total of 609 unrelated children aged 2-12 years asymptomatic for malaria were enrolled from primary schools and child welfare clinics: 288 from the Serowe/Palapye district, 159 from the Chobe district and 162 from the Ghanzi (or Gantsi) district. These districts have peculiar ethnic compositions, as shown in previous studies 24,58 . The same protocol for enrollment was followed in all sites. Written informed consent for multiple genetic and epidemiological surveys was obtained from all the subjects' parents/caregivers before enrollment in the study. This study was conducted in accordance with the guidelines of the Helsinki DNA extraction and PCR-RFLP conditions. DNA was extracted with Qiagen kits according to the manufacturer's protocol. CYP2B6 516 G > T (rs3745274) detection was carried out using a PCR-RFLP technique according to the protocol of Lavandera et al. 59 with minor modifications. For CYP2B6 983 T > C (rs28399499) detection we applied a touchdown PCR-RFLP assay developed in our laboratory 34 . Finally for the purpose of this study we adopted a new in-house protocol for the analysis of the CYP2B6 785 A > G (rs2279343) polymorphism. Briefly, we designed two primers that amplify a 223 bp fragment of the CYP2B6 gene (forward primer: 5′-AACCTGCAGGAAATCAATGC-3′; reverse primer: 5′-CCTTCTTCCCTCCCCATCTTC-3′). For PCR cycling, after 5 min of denaturation at 94 C, the PCR mixture was subjected to the following conditions: 30 s at 94 C, 30 s at 65 C and 30 s at 72 C for 35 cycles, with a final step of 10 min at 72 C. The PCR product was then incubated with NlaIV restriction enzyme that cuts the mutant allele (G) in three fragments of 144 bp, 62 bp and 17 bp; while the wild-type allele (A) is digested in two fragments of 161 bp and 62 bp. The digested fragments were visualized on a 4% agarose gel. Negative control comprised of PCR reaction without a DNA template and controls for human genotyping were utilized after sequencing the different genotypes.
Hardy-Weinberg Equilibrium evaluation. Evaluation of Hardy-Weinberg equilibrium was performed using the HWSIM software (freely available at http://krunch.med.yale.edu/hwsim/) and Monte-Carlo permutation test performed when genotypic classes had an expected cell size of less than five. Wright's F statistics was applied to evaluate the expected level of heterozygosity.
Linkage Disequilibrium, haplotype frequency analysis and Neutrality test. Arlequin v3.5 60 was used to test for linkage disequilibrium between the three loci and for haplotype reconstruction. Several other studies were able to evaluate haplotype structure and population differentiation looking at functional SNPs in the same gene [61][62][63][64] . The input consisted of diploid genotypic DNA data with unknown gametic phase and assuming co-dominance. The expectation-maximization (EM) algorithm used to test for Linkage Disequilibrium (LD) was run for 20,000 permutations and 3 initial conditions, based on recommended criteria. For haplotype reconstruction the EM algorithm was run on the same input file used for the LD test at the haplotype level, with 50 starting points and 1,000 iterations. Other parameters were set to default. The Excoffier-Laval-Balding (ELB) algorithm (with default settings) was used to generate haplotype counts. Population haplotype frequencies were determined both by haplotype counts and estimated by Maximum Likelihood (ML) and were compared using pairwise FSTs calculated using default values.
After determining the gametic phase of the samples using the ELB algorithm in Arlequin, the DnaSP (v6) software 65 was used to calculate and evaluate the Tajima's D neutrality test using the total number of mutations.
Metabolic score. We elaborated and adopted a "metabolic score" to take into account the global effect of the three polymorphisms together, according to an extensive literature linking genetic polymorphisms to functional impact 10,[66][67][68] . For the metabolic score we translated the genotype information into a measure of phenotype using an 'activity score' system already adopted for CYP2D6 47,69 , for CYP2A6 70 and for CYP2C19 71 . The metabolic score adopted is based on the algebraic sum of the individual allele values, according to an additive model for CYP2B6 72,73 . The scores were set conferring a −1 value for each slow metabolism allele and +1 for rapid metabolism allele, while an extensive metabolism allele was scored 0. Accordingly, we obtained: 516 GG = 0; 516 GT = −1; 516 TT = −2; 983 TT = 0; 983 TC = −1; 983 CC = −2; 785 AA = 0; 785 AG = +1; 785 GG = +2. Each composite genotype was attributed a final metabolic score by summing the single score for each SNP.
To test possible deviation from the normal distribution of the scores the Kolmogorov-Smirnov test was used and the statistic D values calculated for each distribution. A related P-value greater than 0.05 indicates normal distribution of data 74 . Bartlett's test was performed to assess if the assumption of equal variances among the three populations was valid 75 . Comparisons among metabolic scores were calculated using the chi-square statistic.
All the statistical calculations were performed using Statistica 13.0 software (StatSoft).