Identification of ancestry proportions in admixed groups across the Americas using clinical pharmacogenomic SNP panels

We evaluated the performance of three PGx panels to estimate biogeographical ancestry: the DMET panel, and the VIP and Preemptive PGx panels described in the literature. Our analysis indicate that the three panels capture quite well the individual variation in admixture proportions observed in recently admixed populations throughout the Americas, with the Preemptive PGx and DMET panels performing better than the VIP panel. We show that these panels provide reliable information about biogeographic ancestry and can be used to guide the implementation of PGx clinical decision-support (CDS) tools. We also report that using these panels it is possible to control for the effects of population stratification in association studies in recently admixed populations, as exemplified with a warfarin dosing GWA study in a sample from Brazil.

Many genetic variants associated with drug response show relatively large frequency differences between human populations [1][2][3][4][5][6][7][8][9] , and this has implications in terms of the clinical implementation of pharmacogenomics (PGx) to guide drug therapy. Several recent efforts have been made to evaluate the usefulness of PGx variants to infer biogeographical ancestry 3,10,11 . This is particularly important for studies in recently admixed populations in the Americas, which are characterized by varying admixture proportions from different continental groups [12][13][14][15][16] . Variation in admixture proportions between individuals creates population structure that can cause false positives in genetic association studies [17][18][19][20] . Bonifaz-Peña et al. 3 developed a panel of 71 Ancestry Informative Markers (AIMs) extracted from the Affymetrix DMET Plus Platform to identify African, European and Native American contributions in populations across the Americas, and validated the panel using dense microarray data. Jackson et al. 11 evaluated the capacity of the Affymetrix DMET Plus microarray to estimate population substructure and concluded, based on comparisons with genome-wide HapMap data, that it was an effective tool for ancestry inference in analyses including East Asian, African, European and Mexican samples. More recently, Hernandez et al. 10 evaluated the ability of two clinical PGx panels, namely a Preemptive-PGx panel including 243 markers and a VIP panel including 122 SNPs, to estimate individual ancestry. The focus of Hernandez et al. 10 paper was primarily to accurately identify ancestry in European and African American populations.
Obtaining accurate estimates of individual ancestry proportions using panels of PGx markers can have important applications for PGx-informed drug prescription. For genetic association studies in targeted genomic regions, inclusion of individual admixture proportions obtained with PGx panels in the statistical models can minimize the risk of false positive associations, which can be a problem in recently admixed populations. Additionally, PGx panels can be used to assign appropriate dosing algorithms for individual patients. As an example, Hernandez et al. 10 have recently shown how estimates of individual ancestry obtained with PGx panels could be used to identify individuals with high African ancestry to whom a recently developed African-American-specific warfarin dosing algorithm could be applied 21 .
In this study, we evaluated the relative performance of three different PGx panels to infer individual ancestry in recently admixed populations in the Americas. We compared ancestry estimates obtained with dense Unsupervised and supervised ADMIXTURE analyses. The unsupervised ADMIXTURE analyses of the parental samples are presented in Supplementary Fig. 2. The genome-wide panel provides perfect discrimination between the individuals of each group (Supplementary Fig. 2A). All the individuals from each parental group belong to a different genetic cluster (AFR: orange, EUR: blue, EAS: yellow, and NAM, green). This is not the case for the three PGx panels ( Supplementary Fig. 2B-D). In these analyses, individuals of each parental group have a predominant genetic cluster component, but also minor components from other clusters.
Next, we carried out supervised ADMIXTURE analysis including parental samples as reference groups and samples from the admixed populations of the Americas as test groups. These analyses provide estimates of the relative admixture proportions in individuals from the admixed samples. The results using four reference parental groups (AFR, EUR, EAS and NAM) are provided in Supplementary Fig. 3. The analyses using the genome-wide panel are in agreement with the trends observed in the PCA plots and highlight differences in the admixture proportions between the admixed samples ( Supplementary Fig. 3A). Of note, the EAS genetic contribution is very low in all the admixed samples. The results obtained with the PGx panels are quite consistent with those observed with the genome-wide panel ( Supplementary Fig. 3B-D), although it can be observed that the estimates of EAS genetic contributions obtained with the three PGx panels are higher than those obtained with the genome-wide panel.
Given the very small EAS contributions observed in the admixed samples from the Americas (less than 1% in all samples), we repeated the supervised ADMIXTURE analyses using only three parental groups as reference samples (AFR, EUR, NAM). As shown in Supplementary Fig. 4, the results obtained with each PGx panel are very consistent with those observed with the genome-wide data. PGx panels applicability to control for population stratification in a Brazilian sample. In order to evaluate the ability of the three PGx panels to correct for the effect of population stratification we used data collected in a previous GWAs of stable warfarin dosing in a sample from Brazil 28 that included patients receiving           www.nature.com/scientificreports/

Discussion
We carried out an exhaustive analysis of the performance of three PGx panels to estimate biogeographical ancestry: the DMET panel previously reported by Bonifaz-Pena et al. 3 , and the Preemptive-PGx and VIP panels recently described by Hernandez et al. 10 . It is important to note that one of the major goals of Hernandez et al. 10 was to use these panels to identify individuals with ≥ 70% African ancestry, to whom an African-Americanspecific warfarin dosing algorithm could be applied. For validation of the Preemptive-PGx and VIP panels, Hernandez et al. 10 used African, European and East Asian samples as reference groups, not including Native American samples to represent one of the major parental groups involved in the historical admixture process throughout the Americas. The present study included four parental groups, namely: African, European, Native American and East Asian, and we carried out ADMIXTURE analyses to evaluate the relative ancestry proportions in six admixed samples from the Americas. We observed that the East Asian contribution is very small in all these samples (lower than 1%), and focused our validation analyses mainly on models with three parental populations (African, European and Native American).
The PCA analyses show that the Preemptive-PGx panel and the DMET panel provide good discrimination of the four parental groups, which cluster with very little overlap in the plots. The VIP panel shows less discrimination than the other two panels (Fig. 1). Using supervised ADMIXTURE analyses based on three parental populations (AFR, EUR, NAM) we observed that the mean admixture proportions estimated with the PGx panels are very close to those obtained with the genome-wide panel (typically within 10% of the genome-wide estimates). The PGx panels typically underestimate the admixture proportions of the major parental group, and overestimate the admixture proportions of the minor parental groups (Fig. 2). The differences in mean admixture proportions tend to be higher with the VIP panel than with the other two PGx panels. The analysis of correlations of genome-wide and PGx panel individual admixture estimates provides more nuanced information ( Supplementary Fig. 5). When considering all admixed samples in a combined analysis, the Preemptive-PGx and the DMET panels showed very good performances. For the Preemptive-PGx panel the R 2 values were 0.95 (AFR), 0.89 (NAM) and 0.86 (EUR). The R 2 values were almost as high for the DMET panel (R 2 AFR = 0.95, R 2 NAM = 0.85 and R 2 EUR = 0.83), in spite of the fact that this panel has a smaller number of variants (67 markers) than the other two panels (219 markers for the Preemptive-PGx panel and 102 for the VIP panel). This is most probably driven by the approach used to select these markers, based on high allele frequency differences between the parental populations, which is reflected in higher mean FST values between parental populations for the DMET panel than for the Preemptive-PGx and VIP panels (Supplementary Table 2). The R 2 values observed for the VIP panel, while smaller than for the other two panels, were still quite high (R 2 AFR = 0.89, R 2 NAM = 0.75 and R 2 EUR = 0.73). Notably, the correlations in the estimates of African ancestry were extremely high for the three panels, confirming the results reported by Hernandez et al. 10 . Overall, our analysis indicates that the three PGx panels capture quite well the individual variation in admixture proportions observed in recently admixed populations throughout the Americas, and that the Preemptive-PGx and DMET panels tend to perform better than the VIP panel.
It is also relevant to discuss in more detail the results observed in the analysis of individual admixed populations ( Supplementary Fig. 6), which clearly shows that the correlation of the genome-wide estimates with those obtained with the PGx panels is strongly dependent on the range of individual ancestry proportions present in the admixed population. Comparison of results for AFR_ASW and AFR_ACB is quite illustrative. The R 2 values observed with the Preemptive-PGx panel for AFR and EUR ancestry for the AFR_ASW sample (AFR = 0.804 and EUR = 0.624) are substantially higher than those observed for the AFR_ACB sample (AFR = 0.322 and EUR = 0.370). This can be explained by the broader distribution of individual ancestry in the AFR_ASW than in the AFR_ACB sample (Fig. 2). Not surprisingly, the R 2 values tend to be very low for the ancestral groups for which there are low average contributions with very limited ranges . In practice, this should have limited impact on the clinical utility of the PGx panels. As an example, in a hypothetical implementation of the approach described by Hernandez et al. 10 for the selection of individuals with African ancestry ≥ 70% for application of an African-American-specific warfarin dosing algorithm, 86.9% of the AFR_ASW individuals and 91.7% of AFR_ACB individuals would have been selected by both the genome-wide and the Preemptive-PGx panel.
When performing association studies in recently admixed populations, an important concern is the possibility of obtaining inflated p-values due to the effects of population stratification 20,[30][31][32] . This is typically not an issue in GWAs studies based on microarray or whole genome data, as the individual ancestry estimates are very precise in this scenario and can be included in statistical models to control for the effects of stratification. However, when carrying out targeted association studies in limited genomic regions, it becomes more critical to ensure that there is an appropriate correction for population stratification. One possible strategy is to genotype panels including a limited number of AIMs, and use the estimates of individual ancestry obtained with these panels as covariates in the statistical models 14,33,34 . We compared the degree of inflation in the p-values of a GWAs study of warfarin dosing in a Brazilian sample 28 using no individual admixture estimates in the statistical models, or alternatively including estimates of ancestry derived from a genome-wide panel or the PGx panels. This sample is perfectly suited for this analysis, as African ancestry shows a very strong association with high warfarin dosing (p = 0.007), in agreement with data indicating that, on average, individuals of African ancestry require higher warfarin doses than individuals of European ancestry [35][36][37][38] . As expected, if ancestry is not included in the logistic regression models, there is substantial genomic inflation (lambda = 1.18). In contrast, when including genomewide estimates of individual ancestry in the logistic regression the estimates of lambda are reduced dramatically (genome-wide estimate, lambda = 1.02; Preemptive-PGx panel, lambda = 1.02; DMET, lambda = 1.05; VIP, lambda = 1.06). In summary, the three panels reduced significantly the inflation of test statistics. www.nature.com/scientificreports/ In conclusion, our analysis of the DMET, Premptive-PGx and VIP panels highlight their usefulness for several PGx applications. We showed that these panels can provide reliable information about biogeographic ancestry. This information can be used to guide the implementation of PGx clinical decision-support (CDS) tools, as described by Hernandez et al. 10 . Overall, when considering how well the three PGx panels capture individual admixture proportions, the Preemptive-PGx and the DMET panels show the best performances, and the VIP panel provides less discrimination of the parental populations. Finally, we also show that using these panels it is possible to control for the effects of population stratification in association studies in recently admixed populations, as exemplified with a warfarin dosing GWAs study in Brazilian patients.

Data availability
The datasets analysed in the current study are available from the corresponding authors upon reasonable request.