Investigation of triple-negative breast cancer risk alleles in an International African-enriched cohort

Large-scale efforts to identify breast cancer (BC) risk alleles have historically taken place among women of European ancestry. Recently, there are new efforts to verify if these alleles increase risk in African American (AA) women as well. We investigated the effect of previously reported AA breast cancer and triple-negative breast cancer (TNBC) risk alleles in our African-enriched International Center for the Study of Breast Cancer Subtypes (ICSBCS) cohort. Using case–control, case-series and race-nested approaches, we report that the Duffy-null allele (rs2814778) is associated with TNBC risk (OR = 3.814, p = 0.001), specifically among AA individuals, after adjusting for self-indicated race and west African ancestry (OR = 3.368, p = 0.007). We have also validated the protective effect of the minor allele of the ANKLE1 missense variant rs2363956 among AA for TNBC (OR = 0.420, p = 0.005). Our results suggest that an ancestry-specific Duffy-null allele and differential prevalence of a polymorphic gene variant of ANKLE1 may play a role in TNBC breast cancer outcomes. These findings present opportunities for therapeutic potential and future studies to address race-specific differences in TNBC risk and disease outcome.


Results
Multi-ethnic cohort analysis of population-specific BC risk alleles reaffirms race group specific effects. Our overall BC risk assessment model was an all-inclusive analysis, including all breast cancer subtypes and self-indicated race (SIR)/ancestral groups, where we have expanded the number of BC cases from Eastern and Western African nations, investigating previously published BC risk alleles that have been validated among African American women in the AMBER consortium 32 (Tables 1, 2, Fig. 1A (left)). No strong linkage disequilibrium was observed among these alleles (maximum r 2 of 0.44). Three alleles replicated previous associations of increased overall BC risk in our unadjusted models. These include rs2981578 (FGFR2), rs4849887 (GLI2), and rs3745185 (BABAM1). Interestingly, we found that the T allele of rs2981578 in the FGFR2 gene was associated with increased risk (OR = 1.508, p = 0.008491), which contrasts with previous reports of the C allele as the risk allele. The C allele of rs4849887 in the GLI2 gene was associated with increased risk (OR = 1.654, p = 0.006122), replicating previous findings. We also replicated the protective A allele of rs3745185 in the BABAM1 gene (OR = 0.67, p = 0.008402).
To determine whether these all-inclusive association models may be confounded by race-specific bias in age or allele frequency, we adjusted the risk model to correct for race and age. Interestingly, each unadjusted risk association loses significance in the combined race group model after adjusting for race and age, indicating that the risk alleles may have higher frequency in one of the SIR groups (See Table 1). Specifically, in the case of the risk (C) allele of rs4849887, we find it is 10-15% lower in populations of West African descent (AA = 34.9%, Ghanaians = 32.9%), compared to European Americans (49.5%) and East Africans (44.0%) in our cohort. Two additional alleles gained significance in overall BC risk associations after race and age adjustments in our allinclusive model, rs2981579 in the FGFR2 gene (OR = 1.899, p = 0.03038) and rs3112572 in the LOC643714 gene (OR = 2.410, p = 0.03055).
Next, we tested whether the associated BC risk of our candidate alleles was different among SIR groups by performing a nested BC risk assessment within each of the SIR groups (Table 2 and Supplemental Table 1). While we observed rs4849887 was associated with overall BC risk prior to adjusting for age and race, this allele is associated with higher overall BC risk only in Ghanaians prior to adjusting for age (OR = 2.472, p = 0.001032) (Fig. 1B, Supplemental Table 1). While we did not observe a significant association between rs609275 and overall BC risk for Scientific Reports | (2021) 11:9247 | https://doi.org/10.1038/s41598-021-88613-w www.nature.com/scientificreports/ the whole cohort assessment, a very high overall BC risk was observed specifically for AA prior to adjusting for age (OR = 5.383, p = 0.048). There were no significant associations found between the previously identified variants and breast cancer risk among SIR EA in both unadjusted and age-adjusted models (Supplemental Table 1).
TNBC-specific case-series analysis of population-specific BC risk alleles shows associations within ancestral groups. The higher rate of TNBC among women of African descent worldwide begs the question of whether there is a shared genetic risk among the African diaspora, and we have previously shown that quantified West African ancestry was strongly associated with TNBC disease 31 . Using a case-series analysis in our African-enriched cohort, we tested whether previously reported AA-specific risk alleles were associated specifically with TNBC disease risk (Table 3, Supplemental Table 2, Fig. 1A (right)). Prior to adjusted covariate  The ANKLE1 variant rs2363956 replicated in the TNBC/ER-negative specific protective effect that was previously reported and was the only variant to retain significance after adjusting for race and age (OR = 0.542, p = 0.014). Similar to our BC case-control analysis, we used a nested risk analysis within SIR groups to test for SIRspecific risk. For the admixed AA population, we included quantified West African ancestry (WAa) in the adjusted covariate modeling. The rs2363956 variant in the ANKLE1 gene retained a protective effect for TNBC in AAs, even after covariate adjustments, (age and WAa adjusted OR = 0.4204, p = 0.005), indicating this is not a mere artifact of disequilibrium, or biased distribution of the allele in African populations ( Fig. 1C and Table 3).

DARC/ACKR1 alleles in BC and TNBC risk.
In addition to the previously implicated AA-risk alleles, we have also included DARC/ACKR1 alleles, including the TNBC risk associated Duffy-null allele 31 , to investigate whether alternative variants may capture risk due to unique biological contributions of either isoforms or distinct gene regulation (Table 1). Our new analysis found that four DARC/ACKR1 SNVs also had significant potential to confer overall BC risk in our all-inclusive analysis models (rs2814778 OR = 1.512, p < 0.001, rs17838198, OR = 4.798, p < 0.001, rs3027016 OR = 4.586, p = 0.005 and rs12075 OR = 2.534, p < 0.001, respectively), however, after adjusting for age and race, this is mostly lost ( Table 4, Fig. 2A (left)). In our SIR nested analysis model, the DARC/ACKR1 variant rs3027013 showed a significant protective effect in EA patients, even after age-adjusted modeling (age-adjusted OR = 0.131, p = 0.03897) ( Fig. 2B and Supplemental Table 3).
For DARC/ACKR1 variant associations in TNBC-specific risk, we similarly observed that seven out of eight variants were associated with TNBC disease, in which five of the minor alleles presented a protective effect and two showed increased risk, prior to race/age adjustments (rs6676002, OR = 0.191, p = 0.007; rs3027008, OR = 0.134, p = 0.006; rs17838198, OR = 0.367, p = 0.015; rs3027016, OR = 0.390, p = 0.065; rs12075, OR = 0.380, p = 0.003, rs71782098, OR = 3.403, p = 0.018; and rs2814778, OR = 3.062, p < 0.001) ( Table 5, Fig. 2A (right)). Interestingly, as we previously reported with only AA and EA, the Duffy-Null allele, rs2814778, retained significant TNBC-risk association with the addition of West African samples, even after age and SIR adjustments (OR = 3.814, p = 0.001). The Duffy-Null (rs2814778) TNBC-risk association was also retained in our nested SIR analysis among AA, following both age and quantified West African ancestry adjustment (OR = 3.368, p = 0.007) ( Fig. 2C and Table 5). This indicates that the TNBC-specific risk conferred by the Duffy-null allele in the DARC/ ACKR1 gene is not an artifact of shared ancestry bias, but rather an ancestry-specific risk allele.
Functional consequences of the TNBC-protective rs2362956 variant in ANKLE1. In our TNBC risk analysis, we found that the minor G allele of the rs2363956 ANKLE1 variant was protective against TNBC disease, which has previously been shown for ER-negative disease among AA 32 . Given its SIR-specific effect, we investigated the frequency of the allele across global 1000 genomes (1 KG) populations 33 . Population minor allele frequency (MAF) of the protective G allele is relatively equal among European and African groups (57% vs 50%, respectively, Table 1). However, among TNBC cases in our ICSBCS cohort, the frequency of the GG genotype is much lower in AA patients, compared to EA patients (14% and 43%, respectively) (Fig. 3B). This 20% drop in the minor allele frequency in TNBC cases among AA is what explains the interpreted potentially protective effect of the minor allele, inferring the major allele may somehow drive TNBC frequency higher in AAs (MAF EA = 57.1%, MAF AA = 37.2%).    32,34,35 , no investigation has linked a functional impact of this variant to risk or survival in this population. Given that the variant causes a dramatic amino acid change of leucine to tryptophan (L184W, Fig. 3A), there is a high probability that the protein structure is impacted, and subsequently have altered the function. We conducted a 3D rendering of the variant, comparing the structure of the protein with leucine at position 184 (Fig. 3C) to the minor allele change to tryptophan, and found a predicted destabilization of the gene product (Fig. 3D).
The allele's protective effect through destabilization of ANKLE1 structure, together with its significant loss in AAs who suffer from higher rates of TNBC, suggests the major allele ANKLE1 protein could be a genetic driver of TNBC. We hypothesize that wildtype ANKLE1 expression suppresses TNBC progression, which is most frequently found in EA patients when caused by the rs2363956 variant. To further investigate this theory, we determined whether the expression of ANKLE1 had any impact on survival 36 . We found that survival trends in TCGA breast cancer cases are significantly impacted by ANKLE1 expression, but that the advantage of ANKLE1 expression only benefits EA patients (Fig. 3E-G). Specifically, we found that when comparing high vs low/ www.nature.com/scientificreports/ medium ANKLE1 expression within SIR groups, EA have a significant survival improvement associated with higher expression (p = 0.035), but AA did not (p = 0.83) (Fig. 3E-F). In fact, when only including patients who had high expression of ANKLE1, EA had a longer survival advantage associated with ANKLE1, compared to AA (Fig. 3G, p = 0.052). This suggests that the benefit of ANKLE1, only found in EA, could be due to the 41-53% chance that EA are expressing the polymorphic version of ANKLE1, which harbors the rs2363956 allele.

Discussion
While recent findings have delineated breast cancer risk alleles that pose increased or even decreased risk in African Americans specifically, many of these findings do not always replicate in other independent multi-ethnic cohorts. This is likely because of unmeasured individual admixture among the non-white individuals, who through social history are of mixed ancestry (i.e. Caribbean, Latin American and AAs) resulting from recent genetic admixture originating from multiple ancestor lineages [37][38][39] . This complexity of AA ancestry includes heterogeneity of African origins, spanning multiple African parental lineages through dozens of generations. This undoubtedly creates confounding genetic backgrounds that still pose a significant obstacle in identifying causal risk alleles among "African" Americans. However, measuring this genetic and ancestral diversity, and accounting for ancestry substructure would be a key first step toward clarifying the alleles that may be shared among individuals of common ancestry within SIR groups who display common disease/tumor types. Our latest race and West African ancestry adjustments in risk models demonstrate the power of combining diverse ancestral groups and utilizing ancestry estimates to clarify either false-positive or false-negative results if models do not properly consider the underlying ancestry/genetic background of the cohorts. Our work represents a uniquely powered cohort that is enriched with a diverse cohort of patients and controls of African ancestry to directly investigate the impact of shared African ancestry in genetic risk for TNBC. We anticipate that our observations account for increased prevalence in women of African descent, at least in part. However, our analysis is still limited by the paucity of hormone receptor status in African cases and so the limited number of patients we can include in this analysis, thus far. Despite this limitation, we have robust findings that are compelling to expound upon in follow-up molecular and clinical studies.
First, our intention to replicate and verify the findings of AA-specific risk alleles is somewhat tenuous with associations fluctuating after adjustments for age and/or race. These covariate adjustments altering significance reflect the varying frequency of these alleles across these strata in our cohort and possibly more broadly in the population. Specifically, rs2981578, rs3745195 and rs4849887 were found to be significant prior to and after race adjustment, and lost significance with age adjustment, while rs2981579 and rs3112572 were found to be significant after race and age adjustment. For alleles that are in significantly different frequency across age categories, their distribution may reflect a difference in early vs. late onset cancers. For alleles that have significantly different frequency across race categories, their distribution may reflect ancestry-specific risk or populationprivate variants. Either scenario warrants a larger and more inclusive dataset to uncover genetic risk, robustly. This is an unmet need that could be essential to cancer prevention and much needed improvement for cancer risk prediction models.
We have validated our previous finding 31 of the Duffy-null allele (rs2814778) as a TNBC-risk allele in our SIR all-inclusive analysis (OR = 3.814, p = 0.001). The Duffy-null allele is an ancestry-specific allele restricted to descendants of Sub-Saharan Africans. The allele arose among Sub-Saharan Africans and removed expression of DARC from erythrocytes, lending immunity from Plasmodium vivax malaria, as this malaria parasite utilized DARC as a portal of entry into erythrocytes 40,41 . The allele quickly swept to fixation across this population and is found at nearly ~ 100% among West Africans, and ~ 80% among AAs 42,43 . With the associations between WAa and TNBC that we and others have reported 31,44 , the potential association of the Duffy-null allele and TNBC is of great interest. With our expanded cohort analysis, we were able to perform the TNBC case-series risk assessment among SIR AAs only, and found that the risk was significantly retained among AA women after adjusting for both age and WAa (OR = 3.368, p = 0.007). This highlights that the Duffy null allele represents an ancestry-specific TNBC risk allele, and that the findings in our SIR all-inclusive analysis were not driven by ancestry-bias in our cohort. This is an important finding among our cohort, as the Duffy-null allele would not have been identified among previous GWAS studies underpowered with individuals of African ancestry.
Second, we have investigated the consequences of the protective rs2363956 variant on the ANKLE1 gene coding region and uncovered a potential functional reason for race-group risk distinction. The allele has repeatedly been associated with breast and ovarian cancer risk and survival 34,35 , and this association has been replicated among AA women 32 . In the present analysis, we are the first to report that the 'protective' polymorphic ANKLE1 would be the more likely version expressed in EA patients, compared to AA or Ghanaian patients (GG genotype, 43%, 14% and 25%, respectively) (Fig. 3B). This suggests that the major T allele corresponds to a TNBC-specific oncogenic version of the ANKLE1 gene. The potential mechanism of action for increased survival would appear to be DNA damage response, as ANKLE1 has repeatedly been shown to be involved in DNA repair pathways in pre-clinical and ex vivo screening, including endonuclease activity 45,46 , proliferation, and drug response hits in CRISPR screens in cancer cell lines [47][48][49][50] . Most intriguingly, one study in non-small-cell lung cancer indicated the combination of ANKLE1 RNAi with paclitaxel increased the efficacy of the drug response 51 . Altogether, this is a very promising avenue for further investigation of targeted/combinatorial therapy, with potential to be transformative in treatment of TNBC, and with specific impact in AA who have higher expression of ANKLE1.
If validated through additional clinical studies, finding a novel oncogene specific to TNBC could be transformative in two ways: (i) to improve genetic risk models or create AA-tailored risk models, and (ii) to develop prognostic tests to inform survival prediction models, which currently do not include information about ANKLE1. Specifically, if we find that the patients who have longer survival carry the minor protective allele, www.nature.com/scientificreports/ correlated with higher expression of this polymorphic ANKLE1, we can quickly investigate if this is ultimately related to treatment response. Our preliminary data on survival trends certainly suggests this could be true. The reported, albeit controversial, findings of TNBC mortality differences between women of African descent compared to women of European descent may be an important indicator of unknown differences in tumor biology. Here, we show that ANKLE1 expression is linked to distinct survival outcomes, and this could potentially be linked to this polymorphic version of the ANKLE1 gene. Intriguingly, this corresponds with differential impact of the gene's expression on survival when comparing race groups among patients with high expression of the gene. While the functional consequence on mechanistic change is yet unknown, it is a clear indicator of survival and therefore a prognostic indicator. Excitingly, this also reveals a potential opportunity to develop immune-based inhibition of the oncogenic (major allele) version that is more likely expressed in AA. As the frequency of the oncogenic ANKLE1 allele is higher in AA populations, this could present an opportunity for additional research to address its potential in precision therapies to bridge the survival gap in TNBC among race groups. Inclusion of diverse cohorts have powered this discovery and will drive clinical applications in the future.

Methods
International center for the study of breast cancer subtypes. The mission of the International Center for the Study of Breast Cancer Subtypes (ICSBCS) is to reduce the global breast cancer burden through advances in research and delivery of care to diverse populations worldwide. The ICSBCS brings together an international consortium of breast cancer clinicians and researchers, all of whom share the goal of addressing genetic and phenotypic variation in breast cancer risk and survival outcomes. We accrued prospective breast cancer patients from 2013 to 2017 as previously described 31  Immunohistochemistry for BC tumor subtyping. For our TNBC case-series risk analysis, we determined hormone receptor status in our ICSBCS biospecimen registry via immunohistochemistry (IHC) methods that were described in detail in our previous study 31 . Expression of biomarkers was interpreted in accordance with the American Society of Clinical Oncology/College of American Pathologists guidelines 53,54 . Briefly, for estrogen and progesterone receptor IHC, staining of at least 1% was determined as positive. HER2/neu staining score of 0 or 1 + was determined as negative, and 3 + was determined as positive. HER2/neu staining score of 2 + was deemed equivocal and was further evaluated by fluorescent in situ hybridization. ICSBCS cases accrued in the USA were reviewed by the treating facility. IHC and pathology review of Ghanaian and Ethiopian cases was completed in Michigan (University of Michigan and Henry Ford Health System Hospital) and New York (Weill Cornell Medicine).
Allele selection for BC case-control and TNBC case-series analyses. In our previous publication, we investigated nine reported AA BC risk variants in our African-enriched ICSBCS cohort, to determine BC or TNBC-specific risk within self-identified race (SIR) groups in our cohort. We additionally included the Duffy-Null allele (rs2814778), a promoter region variant of the DARC/ACKR1 gene in our panel and demonstrated this allele to be a TNBC-specific risk allele among AA. Building upon our previous findings, we have both increased our number of samples across our SIR groups with genotypes available, and included an additional eight DARC/ ACKR1 gene variants in our panel that are implicated as ancestry-specific alleles, or sit in regions that are potentially involved in DARC/ACKR1 gene regulation. These eight DARC/ACKR1 gene variants represent upstream variants, 5′ UTR variants, and variants in the coding region of the gene. All alleles that were assessed in subsequent analyses are described in Table 1. Additionally, our African-enriched ICSBCS cohort allows us to also incorporate African ancestry measurements into the association model (below). PLINK (version 2.0) 55 was used to assess linkage disequilibrium among these alleles, and no strong linkage disequilibrium was observed (maximum r 2 of 0.44).
Global ancestry estimation and genotyping of candidate alleles. Methods to determine global genetic ancestry have been previously reported in detail 31,56 . Briefly, DNA extracted from saliva samples were genotyped on the Sequenom MassARRAY iPLEX platform using an AIMs panel containing 100 markers specifically selected and validated for estimating continental ancestry among admixed populations 57,58 . The Sequenom TYPER software (version 4.0) was used for genotype calls, and STRU CTU RE (version 2.3) was used to calculate admixture estimates for each individual 59 .
Similar to our global ancestry estimations, to obtain genotypes for our candidate variants for risk analyses (Table 1), DNA from saliva samples were genotyped for each of the variants using the Sequenom platform. For the Duffy-Null allele (rs2814778), we have obtained additional genotypes using single-target allele amplification reactions, as previously described 31  www.nature.com/scientificreports/ Risk assessment. From our genotyping data, we used PLINK (version 2.0) 55 to determine associations between the candidate variants and breast cancer risk in case-control analysis model, and TNBC-specific risk in case-series analysis model as previously described 31 . In both our BC and TNBC-specific risk analyses, we performed associations without covariates (non-adjusted), with SIR adjustment, and with SIR and age adjustments. We additionally investigated variant and risk associations within each SIR race group, where we performed analyses for non-adjusted and age-adjustments. For our analysis within SIR AA, using the genetic ancestry estimates, we were additionally able to adjust for West African ancestry in our models. For the candidate variants, we conducted the risk association using both a dominant and dosage statistical model 31 . In the dominance model where the genotypes are AA, Aa, aa (where a is minor allele), the resulting genotypes would be coded as 0, 1, 1 in the analysis model, where risk is weighted based on having at least one minor dominant allele. In the dosage model using the same genotypes, the resulting genotypes would be coded as 0, 1, 2, where the risk is weighted by the number of minor alleles present. In the main figures and tables, we show and discuss risk assessment output from the dosage models, where the full range of genotypes is considered in the analysis. In addition, the Benjamini-Hochberg method was used to adjust for multiple comparisons while controlling false discovery rate (FDR) at 0.05. FDR adjusted p values for Tables 2, 3 3D modeling of ANKLE1 protein. We used the cBioPortal MutationMapper online program to visualize the ANKLE1 protective variant rs2363956 in the context of the protein domain structure 60,61 . For 3D modeling of the wild type and rs2363956 missense variant, the ANKLE1 amino acid sequence in FASTA format was obtained from NCBI using the GrCh37.p13 reference and was submitted to I-TASSER [62][63][64] . The amino acid sequence is 615 residues long, and we performed 3D modeling to obtain the structure with and without the ANKLE1 missense mutation included in our candidate variant analysis (rs2363956, L184W). The estimate of the accuracy of the predictions using I-TASSER is provided based on the confidence score (C-score) of the modeling. The C-score range is between [− 5, 2], where a C-score of a higher value suggests a model with higher confidence and vice-versa. Furthermore, Chimera program 65 (version 1.14) was used for visualization and analysis of the predicted 3D ANKLE1 protein structure from I-TASSER. ANKLE1 survival analysis. The UALCAN online database was accessed to determine potential associations between gene expression and patient survival outcomes in the TCGA BC cohort 36 . ANKLE1 gene expression was assessed across the patient cohort, and the upper quartile of expression was used to dichotomize expression into high and low/medium ANKLE1 expressing individuals. The log rank p value obtained between comparison groups is reported on the plots.