Abstract
We performed genome-wide association studies of breast cancer including 18,034 cases and 22,104 controls of African ancestry. Genetic variants at 12 loci were associated with breast cancer risk (P < 5 × 10−8), including associations of a low-frequency missense variant rs61751053 in ARHGEF38 with overall breast cancer (odds ratio (OR) = 1.48) and a common variant rs76664032 at chromosome 2q14.2 with triple-negative breast cancer (TNBC) (OR = 1.30). Approximately 15.4% of cases with TNBC carried six risk alleles in three genome-wide association study-identified TNBC risk variants, with an OR of 4.21 (95% confidence interval = 2.66–7.03) compared with those carrying fewer than two risk alleles. A polygenic risk score (PRS) showed an area under the receiver operating characteristic curve of 0.60 for the prediction of breast cancer risk, which outperformed PRS derived using data from females of European ancestry. Our study markedly increases the population diversity in genetic studies for breast cancer and demonstrates the utility of PRS for risk prediction in females of African ancestry.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Summary-level statistics data have been uploaded to the GWAS Catalog (GCST90296719 for overall breast cancer, GCST90296720 for ER+ breast cancer, GCST90296721 for ER− breast cancer and GCST90296722 for TNBC). Data from the 1000 Genomes Project can be obtained from www.internationalgenome.org/data/. The GENCODE datasets are available from www.gencodegenes.org/human/. The genomic and transcriptomic data generated from the Komen Tissue Bank samples have been uploaded to the database of Genotypes and Phenotypes (accession no. phs003535).
Code availability
The analysis code can be accessed at the GitHub repository (github.com/Damon0212/AABCG_GWAS_breast_cancer). The code has also been uploaded to Zenodo60 (https://doi.org/10.5281/zenodo.10372821).
References
Hu, C. et al. A population-based study of genes previously implicated in breast cancer. N. Engl. J. Med. 384, 440–451 (2021).
Dorling, L. et al. Breast cancer risk genes—association analysis in more than 113,000 women. N. Engl. J. Med. 384, 428–439 (2021).
Narod, S. A. Which genes for hereditary breast cancer? N. Engl. J. Med. 384, 471–473 (2021).
Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).
Zhang, H. et al. Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses. Nat. Genet. 52, 572–581 (2020).
Shu, X. et al. Identification of novel breast cancer susceptibility loci in meta-analyses conducted among Asian and European descendants. Nat. Commun. 11, 1217 (2020).
Jia, G. et al. Incorporating polygenic risk scores and nongenetic risk factors for breast cancer risk prediction among Asian women. Am. J. Hum. Genet. 109, 2185–2195 (2022).
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Huo, D. et al. Genome-wide association studies in women of African ancestry identified 3q26.21 as a novel susceptibility locus for oestrogen receptor negative breast cancer. Hum. Mol. Genet. 25, 4835–4846 (2016).
Zheng, W. et al. Genome-wide association study identifies a new breast cancer susceptibility locus at 6q25.1. Nat. Genet. 41, 324–328 (2009).
Long, J. et al. A common deletion in the APOBEC3 genes and breast cancer risk. J. Natl Cancer Inst. 105, 573–579 (2013).
Cai, Q. et al. Genome-wide association analysis in East Asians identifies breast cancer susceptibility loci at 1q32.1, 5q14.3 and 15q26.1. Nat. Genet. 46, 886–890 (2014).
Haiman, C. A. et al. A common variant at the TERT-CLPTM1L locus is associated with estrogen receptor-negative breast cancer. Nat. Genet. 43, 1210–1214 (2011).
Fejerman, L. et al. Genome-wide association study of breast cancer in Latinas identifies novel protective variants on 6q25. Nat. Commun. 5, 5260 (2014).
Iqbal, J., Ginsburg, O., Rochon, P. A., Sun, P. & Narod, S. A. Differences in breast cancer stage at diagnosis and cancer-specific survival by race and ethnicity in the United States. JAMA 313, 165–173 (2015).
Howard, F. M. & Olopade, O. I. Epidemiology of triple-negative breast cancer: a review. Cancer J. 27, 8–16 (2021).
Martin, A. R., Teferra, S., Möller, M., Hoal, E. G. & Daly, M. J. The critical needs and challenges for genetic architecture studies in Africa. Curr. Opin. Genet. Dev. 53, 113–120 (2018).
Zheng, Y. et al. Fine mapping of breast cancer genome-wide association studies loci in women of African ancestry identifies novel susceptibility markers. Carcinogenesis 34, 1520–1528 (2013).
Feng, Y. et al. A comprehensive examination of breast cancer risk loci in African American women. Hum. Mol. Genet. 23, 5518–5526 (2014).
Huo, D. et al. Evaluation of 19 susceptibility loci of breast cancer in women of African ancestry. Carcinogenesis 33, 835–840 (2012).
Mavaddat, N. et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am. J. Hum. Genet. 104, 21–34 (2019).
Du, Z. et al. Evaluating polygenic risk scores for breast cancer in women of African ancestry. J. Natl Cancer Inst. 113, 1168–1176 (2021).
Ho, W.-K. et al. Polygenic risk scores for prediction of breast cancer risk in Asian populations. Genet. Med. 24, 586–600 (2022).
Shieh, Y. et al. A polygenic risk score for breast cancer in US Latinas and Latin American women. J. Natl Cancer Inst. 112, 590–598 (2020).
Yang, Y. et al. Incorporating polygenic risk scores and nongenetic risk factors for breast cancer risk prediction among Asian women. JAMA Netw. Open 5, e2149030 (2022).
Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Stacey, S. N. et al. Ancestry-shift refinement mapping of the C6orf97-ESR1 breast cancer susceptibility locus. PLoS Genet. 6, e1001029 (2010).
Gao, G. et al. A joint transcriptome-wide association study across multiple tissues identifies candidate breast cancer susceptibility genes. Am. J. Hum. Genet. 110, 950–962 (2023).
Zou, Y., Carbonetto, P., Wang, G. & Stephens, M. Fine-mapping from summary data with the ‘Sum of Single Effects’ model. PLoS Genet. 18, e1010299 (2022).
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Series B Stat. Methodol. 82, 1273–1300 (2020).
Privé, F., Arbel, J. & Vilhjálmsson, B. J. LDpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2020).
Giro-Perafita, A. et al. LncRNA RP11-19E11 is an E2F1 target required for proliferation and survival of basal breast cancer. NPJ Breast Cancer 6, 1 (2020).
Han, Y. J. et al. LncRNA BLAT1 is upregulated in basal-like breast cancer through epigenetic modifications. Sci. Rep. 8, 15572 (2018).
Palmer, J. R. et al. Genetic susceptibility loci for subtypes of breast cancer in an African American population. Cancer Epidemiol. Biomarkers Prev. 22, 127–134 (2013).
Park, C. et al. Overexpression and selective anticancer efficacy of ENO3 in STK11 mutant lung cancers. Mol. Cells 42, 804–809 (2019).
Liu, K. et al. ARHGEF38 as a novel biomarker to predict aggressive prostate cancer. Genes Dis. 7, 217–224 (2020).
Chen, J. W. & Dhahbi, J. Lung adenocarcinoma and lung squamous cell carcinoma cancer classification, biomarker identification, and gene expression analysis using overlapping feature selection methods. Sci. Rep. 11, 13323 (2021).
Jiang, X. et al. Shared heritability and functional enrichment across six solid cancers. Nat. Commun. 10, 431 (2019).
Gao, G. et al. Polygenic risk scores for prediction of breast cancer risk in women of African ancestry: a cross-ancestry approach. Hum. Mol. Genet. 31, 3133–3143 (2022).
Siu, A. L. Screening for breast cancer: U.S. Preventive Services Task Force recommendation statement. Ann. Intern. Med. 164, 279–296 (2016).
Chen, Z. et al. Discovery of structural deletions in breast cancer predisposition genes using whole genome sequencing data from > 2000 women of African-ancestry. Hum. Genet. 140, 1449–1457 (2021).
Kasimatis, K. R. et al. Evaluating human autosomal loci for sexually antagonistic viability selection in two large biobanks. Genetics 217, 1–10 (2021).
Adedokun, B. et al. Cross-ancestry GWAS meta-analysis identifies six breast cancer loci in African and European ancestry women. Nat. Commun. 12, 4198 (2021).
Bien, S. A. et al. Strategies for enriching variant coverage in candidate disease loci on a multiethnic genotyping array. PLoS ONE 11, e0167758 (2016).
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).
Surveillance, Epidemiology, and End Results (SEER) Program. SEER*Stat Database: Populations—Total U.S. (1969–2020) <Katrina/Rita Adjustment>—Linked to County Attributes—Total U.S., 1969–2020 Counties, National Cancer Institute, DCCPS, Surveillance Research Program; www.seer.cancer.gov (2022).
de Glas, N. A. et al. Performing survival analyses in the presence of competing risks: a clinical example in older breast cancer patients. J. Natl Cancer Inst. 108, djv366 (2016).
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).
Privé, F., Albiñana, C., Arbel, J., Pasaniuc, B. & Vilhjálmsson, B. J. Inferring disease architecture and predictive ability with LDpred2-auto. Am. J. Hum. Genet. 110, 2042–2055 (2023).
Pepe, M. S., Longton, G. & Janes, H. Estimation and comparison of receiver operating characteristic curves. Stata J. 9, 1 (2009).
Zheng, W. et al. Genetic and clinical predictors for breast cancer risk assessment and stratification among Chinese women. J. Natl Cancer Inst. 102, 972–981 (2010).
Wen, W. et al. Prediction of breast cancer risk based on common genetic variants in women of East Asian ancestry. Breast Cancer Res. 18, 124 (2016).
Jia, G. Damon0212/AABCG_GWAS_breast_cancer: AABCG_GWAS_breast_cancer (AABCG_GWAS). Zenodo. https://zenodo.org/records/10372821 (2023).
Acknowledgements
This research was supported in part by National Institutes of Health grant nos. R01 CA202981 (to W.Z., C.A.H. and J.R.P.) and R01 CA235553 (to W.Z. and J. L.). Sample preparation and genotyping assays at Vanderbilt were conducted at the Survey and Biospecimen Shared Resources and Vanderbilt Technologies for Advanced Genomics, which are supported in part by the Vanderbilt-Ingram Cancer Center (no. P30CA068485 to B. H. Park). Data analyses were conducted using the Advanced Computing Center for Research and Education at Vanderbilt University. Additional information is provided in the Supplementary Note. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agents. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
G.J. and W.Z. conceived and designed the study. S.A., M.E.B., Y.C., M.G.-C., J.G., J.J.H., D.H., E.M.J., C.I.L., K.L.N., B.N., O.I.O., T.P., M.F.P., M.S., D.P.S., X.-O.S., M.A.T., S.Y., P.O.A., T.A., A.M.B., A.J.M.H., T.M., P.N., K.M.O., A.F.O., M.M.O., S.R., E.N.B., M.H., A.N., C.B.A., Q.C., J.L., J.R.P., C.A.H. and W.Z. recruited the study participants, and collected the data and specimens. G.J., J.P., X.G., Y.Y., M.G., J.G., D.H., E.M.J., M.A.T., S.Y., T.A., A.F.O., E.N.B., M.H., C.B.A. and Q.C. managed sample and data preparation, or carried out quality control. G.J., J.P., X.G., Y.Y., R.T., J.L.L., H.Q. and H.Z. analyzed the data. G.J., J.P., Y.Y., B.L., J.L.L., T.P., X.-O.S., A.J.M.H., S.R., H.Z. and W.Z. interpreted the findings. G.J., X.G., R.T., M.E.B., M.G., D.H., C.I.L., J.L.L., O.I.O., T.P., M.F.P., P.O.A., T.A., A.J.M.H., T.M., P.N., M.M.O., S.R., A.N., J.L., J.R.P., C.A.H. and W.Z. drafted or substantively revised the paper. J.R.P., C.A.H. and W.Z. supervised the project.
Corresponding author
Ethics declarations
Competing interests
O.I.O. is a cofounder of CancerIQ, serves as scientific adviser at Tempus and is on the board of 54gene. A.N. has received research support from Roche. The other authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Laura Fejerman, Catherine Tcheandjieu and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Genome-wide association with overall breast cancer and subtypes.
(a) Overall breast cancer, (b) ER-positive breast cancer, (c) ER-negative breast cancer, (d) triple-negative breast cancer. Associations were estimated in each dataset and meta-analyzed with fixed effects model. Logistic regression was used for genome-wide association analyses, with two-sided tests, using 5 × 10−8 as genome-wide significance level (dashed lines).
Extended Data Fig. 2 Forest plot for risk estimates of selected variants.
(a) missense variant rs61751053 at locus 4q24 in association with overall breast cancer, T allele as effect allele; (b) novel variant rs10853615 at locus 18q11.2 in association with ER-positive breast cancer, A allele as effect allele; (c) novel risk variant rs76664032 at locus 2q14.2 in association with ER-negative breast cancer, A allele as effect allele; (d) novel risk variant rs76664032 at locus 2q14.2 in association with TNBC, A allele as effect allele. r2, imputation quality score. a. Studies for each dataset are described in Supplementary Note and Supplementary Table 1. EAF, effect allele frequency. CI, confidence interval. Logistic regression was used for association analyses with two-sided tests. The sample size (n) was shown for each participating study/dataset. In the bar plot, the log odds ratios (beta) are presented as mean ±1.96*standard error, with mean value as the center and 95% CI as the minimum and maximum.
Extended Data Fig. 3 Distribution of PRS in the testing set of samples.
(a) The PRSAFR was calculated using 94 selected variants with weights derived from the training samples; (b) The PRSEUR constructed with previously reported weight derived from European-ancestry women. Both PRSs were divided by the standard deviation in controls and centered at the mean in controls.
Supplementary information
Supplementary Information
Supplementary Figs. 1–10, Methods and Acknowledgements.
Supplementary Tables
Supplementary Tables 1–12
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jia, G., Ping, J., Guo, X. et al. Genome-wide association analyses of breast cancer in women of African ancestry identify new susceptibility loci and improve risk prediction. Nat Genet 56, 819–826 (2024). https://doi.org/10.1038/s41588-024-01736-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-024-01736-4