Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Genome-wide association analyses of breast cancer in women of African ancestry identify new susceptibility loci and improve risk prediction

Abstract

We performed genome-wide association studies of breast cancer including 18,034 cases and 22,104 controls of African ancestry. Genetic variants at 12 loci were associated with breast cancer risk (P < 5 × 10−8), including associations of a low-frequency missense variant rs61751053 in ARHGEF38 with overall breast cancer (odds ratio (OR) = 1.48) and a common variant rs76664032 at chromosome 2q14.2 with triple-negative breast cancer (TNBC) (OR = 1.30). Approximately 15.4% of cases with TNBC carried six risk alleles in three genome-wide association study-identified TNBC risk variants, with an OR of 4.21 (95% confidence interval = 2.66–7.03) compared with those carrying fewer than two risk alleles. A polygenic risk score (PRS) showed an area under the receiver operating characteristic curve of 0.60 for the prediction of breast cancer risk, which outperformed PRS derived using data from females of European ancestry. Our study markedly increases the population diversity in genetic studies for breast cancer and demonstrates the utility of PRS for risk prediction in females of African ancestry.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Cumulative absolute risk of breast cancer according to PRS group among females of African ancestry.

Similar content being viewed by others

Data availability

Summary-level statistics data have been uploaded to the GWAS Catalog (GCST90296719 for overall breast cancer, GCST90296720 for ER+ breast cancer, GCST90296721 for ER− breast cancer and GCST90296722 for TNBC). Data from the 1000 Genomes Project can be obtained from www.internationalgenome.org/data/. The GENCODE datasets are available from www.gencodegenes.org/human/. The genomic and transcriptomic data generated from the Komen Tissue Bank samples have been uploaded to the database of Genotypes and Phenotypes (accession no. phs003535).

Code availability

The analysis code can be accessed at the GitHub repository (github.com/Damon0212/AABCG_GWAS_breast_cancer). The code has also been uploaded to Zenodo60 (https://doi.org/10.5281/zenodo.10372821).

References

  1. Hu, C. et al. A population-based study of genes previously implicated in breast cancer. N. Engl. J. Med. 384, 440–451 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Dorling, L. et al. Breast cancer risk genes—association analysis in more than 113,000 women. N. Engl. J. Med. 384, 428–439 (2021).

    Article  CAS  PubMed  Google Scholar 

  3. Narod, S. A. Which genes for hereditary breast cancer? N. Engl. J. Med. 384, 471–473 (2021).

    Article  PubMed  Google Scholar 

  4. Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Zhang, H. et al. Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses. Nat. Genet. 52, 572–581 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Shu, X. et al. Identification of novel breast cancer susceptibility loci in meta-analyses conducted among Asian and European descendants. Nat. Commun. 11, 1217 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Jia, G. et al. Incorporating polygenic risk scores and nongenetic risk factors for breast cancer risk prediction among Asian women. Am. J. Hum. Genet. 109, 2185–2195 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Huo, D. et al. Genome-wide association studies in women of African ancestry identified 3q26.21 as a novel susceptibility locus for oestrogen receptor negative breast cancer. Hum. Mol. Genet. 25, 4835–4846 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Zheng, W. et al. Genome-wide association study identifies a new breast cancer susceptibility locus at 6q25.1. Nat. Genet. 41, 324–328 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Long, J. et al. A common deletion in the APOBEC3 genes and breast cancer risk. J. Natl Cancer Inst. 105, 573–579 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Cai, Q. et al. Genome-wide association analysis in East Asians identifies breast cancer susceptibility loci at 1q32.1, 5q14.3 and 15q26.1. Nat. Genet. 46, 886–890 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Haiman, C. A. et al. A common variant at the TERT-CLPTM1L locus is associated with estrogen receptor-negative breast cancer. Nat. Genet. 43, 1210–1214 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Fejerman, L. et al. Genome-wide association study of breast cancer in Latinas identifies novel protective variants on 6q25. Nat. Commun. 5, 5260 (2014).

    Article  CAS  PubMed  Google Scholar 

  15. Iqbal, J., Ginsburg, O., Rochon, P. A., Sun, P. & Narod, S. A. Differences in breast cancer stage at diagnosis and cancer-specific survival by race and ethnicity in the United States. JAMA 313, 165–173 (2015).

    Article  CAS  PubMed  Google Scholar 

  16. Howard, F. M. & Olopade, O. I. Epidemiology of triple-negative breast cancer: a review. Cancer J. 27, 8–16 (2021).

    Article  CAS  PubMed  Google Scholar 

  17. Martin, A. R., Teferra, S., Möller, M., Hoal, E. G. & Daly, M. J. The critical needs and challenges for genetic architecture studies in Africa. Curr. Opin. Genet. Dev. 53, 113–120 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Zheng, Y. et al. Fine mapping of breast cancer genome-wide association studies loci in women of African ancestry identifies novel susceptibility markers. Carcinogenesis 34, 1520–1528 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Feng, Y. et al. A comprehensive examination of breast cancer risk loci in African American women. Hum. Mol. Genet. 23, 5518–5526 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Huo, D. et al. Evaluation of 19 susceptibility loci of breast cancer in women of African ancestry. Carcinogenesis 33, 835–840 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Mavaddat, N. et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am. J. Hum. Genet. 104, 21–34 (2019).

    Article  CAS  PubMed  Google Scholar 

  22. Du, Z. et al. Evaluating polygenic risk scores for breast cancer in women of African ancestry. J. Natl Cancer Inst. 113, 1168–1176 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Ho, W.-K. et al. Polygenic risk scores for prediction of breast cancer risk in Asian populations. Genet. Med. 24, 586–600 (2022).

    Article  CAS  PubMed  Google Scholar 

  24. Shieh, Y. et al. A polygenic risk score for breast cancer in US Latinas and Latin American women. J. Natl Cancer Inst. 112, 590–598 (2020).

    Article  PubMed  Google Scholar 

  25. Yang, Y. et al. Incorporating polygenic risk scores and nongenetic risk factors for breast cancer risk prediction among Asian women. JAMA Netw. Open 5, e2149030 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

    Article  CAS  PubMed  Google Scholar 

  28. Stacey, S. N. et al. Ancestry-shift refinement mapping of the C6orf97-ESR1 breast cancer susceptibility locus. PLoS Genet. 6, e1001029 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Gao, G. et al. A joint transcriptome-wide association study across multiple tissues identifies candidate breast cancer susceptibility genes. Am. J. Hum. Genet. 110, 950–962 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Zou, Y., Carbonetto, P., Wang, G. & Stephens, M. Fine-mapping from summary data with the ‘Sum of Single Effects’ model. PLoS Genet. 18, e1010299 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Series B Stat. Methodol. 82, 1273–1300 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Privé, F., Arbel, J. & Vilhjálmsson, B. J. LDpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2020).

    Article  PubMed Central  Google Scholar 

  33. Giro-Perafita, A. et al. LncRNA RP11-19E11 is an E2F1 target required for proliferation and survival of basal breast cancer. NPJ Breast Cancer 6, 1 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Han, Y. J. et al. LncRNA BLAT1 is upregulated in basal-like breast cancer through epigenetic modifications. Sci. Rep. 8, 15572 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Palmer, J. R. et al. Genetic susceptibility loci for subtypes of breast cancer in an African American population. Cancer Epidemiol. Biomarkers Prev. 22, 127–134 (2013).

    Article  CAS  PubMed  Google Scholar 

  36. Park, C. et al. Overexpression and selective anticancer efficacy of ENO3 in STK11 mutant lung cancers. Mol. Cells 42, 804–809 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Liu, K. et al. ARHGEF38 as a novel biomarker to predict aggressive prostate cancer. Genes Dis. 7, 217–224 (2020).

    Article  CAS  PubMed  Google Scholar 

  38. Chen, J. W. & Dhahbi, J. Lung adenocarcinoma and lung squamous cell carcinoma cancer classification, biomarker identification, and gene expression analysis using overlapping feature selection methods. Sci. Rep. 11, 13323 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Jiang, X. et al. Shared heritability and functional enrichment across six solid cancers. Nat. Commun. 10, 431 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Gao, G. et al. Polygenic risk scores for prediction of breast cancer risk in women of African ancestry: a cross-ancestry approach. Hum. Mol. Genet. 31, 3133–3143 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Siu, A. L. Screening for breast cancer: U.S. Preventive Services Task Force recommendation statement. Ann. Intern. Med. 164, 279–296 (2016).

    Article  PubMed  Google Scholar 

  42. Chen, Z. et al. Discovery of structural deletions in breast cancer predisposition genes using whole genome sequencing data from > 2000 women of African-ancestry. Hum. Genet. 140, 1449–1457 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Kasimatis, K. R. et al. Evaluating human autosomal loci for sexually antagonistic viability selection in two large biobanks. Genetics 217, 1–10 (2021).

    Article  PubMed  Google Scholar 

  44. Adedokun, B. et al. Cross-ancestry GWAS meta-analysis identifies six breast cancer loci in African and European ancestry women. Nat. Commun. 12, 4198 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Bien, S. A. et al. Strategies for enriching variant coverage in candidate disease loci on a multiethnic genotyping array. PLoS ONE 11, e0167758 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

    Article  CAS  PubMed  Google Scholar 

  49. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Surveillance, Epidemiology, and End Results (SEER) Program. SEER*Stat Database: Populations—Total U.S. (1969–2020) <Katrina/Rita Adjustment>—Linked to County Attributes—Total U.S., 1969–2020 Counties, National Cancer Institute, DCCPS, Surveillance Research Program; www.seer.cancer.gov (2022).

  53. de Glas, N. A. et al. Performing survival analyses in the presence of competing risks: a clinical example in older breast cancer patients. J. Natl Cancer Inst. 108, djv366 (2016).

    Article  PubMed  Google Scholar 

  54. Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Privé, F., Albiñana, C., Arbel, J., Pasaniuc, B. & Vilhjálmsson, B. J. Inferring disease architecture and predictive ability with LDpred2-auto. Am. J. Hum. Genet. 110, 2042–2055 (2023).

    Article  PubMed  Google Scholar 

  57. Pepe, M. S., Longton, G. & Janes, H. Estimation and comparison of receiver operating characteristic curves. Stata J. 9, 1 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Zheng, W. et al. Genetic and clinical predictors for breast cancer risk assessment and stratification among Chinese women. J. Natl Cancer Inst. 102, 972–981 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Wen, W. et al. Prediction of breast cancer risk based on common genetic variants in women of East Asian ancestry. Breast Cancer Res. 18, 124 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Jia, G. Damon0212/AABCG_GWAS_breast_cancer: AABCG_GWAS_breast_cancer (AABCG_GWAS). Zenodo. https://zenodo.org/records/10372821 (2023).

Download references

Acknowledgements

This research was supported in part by National Institutes of Health grant nos. R01 CA202981 (to W.Z., C.A.H. and J.R.P.) and R01 CA235553 (to W.Z. and J. L.). Sample preparation and genotyping assays at Vanderbilt were conducted at the Survey and Biospecimen Shared Resources and Vanderbilt Technologies for Advanced Genomics, which are supported in part by the Vanderbilt-Ingram Cancer Center (no. P30CA068485 to B. H. Park). Data analyses were conducted using the Advanced Computing Center for Research and Education at Vanderbilt University. Additional information is provided in the Supplementary Note. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agents. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

G.J. and W.Z. conceived and designed the study. S.A., M.E.B., Y.C., M.G.-C., J.G., J.J.H., D.H., E.M.J., C.I.L., K.L.N., B.N., O.I.O., T.P., M.F.P., M.S., D.P.S., X.-O.S., M.A.T., S.Y., P.O.A., T.A., A.M.B., A.J.M.H., T.M., P.N., K.M.O., A.F.O., M.M.O., S.R., E.N.B., M.H., A.N., C.B.A., Q.C., J.L., J.R.P., C.A.H. and W.Z. recruited the study participants, and collected the data and specimens. G.J., J.P., X.G., Y.Y., M.G., J.G., D.H., E.M.J., M.A.T., S.Y., T.A., A.F.O., E.N.B., M.H., C.B.A. and Q.C. managed sample and data preparation, or carried out quality control. G.J., J.P., X.G., Y.Y., R.T., J.L.L., H.Q. and H.Z. analyzed the data. G.J., J.P., Y.Y., B.L., J.L.L., T.P., X.-O.S., A.J.M.H., S.R., H.Z. and W.Z. interpreted the findings. G.J., X.G., R.T., M.E.B., M.G., D.H., C.I.L., J.L.L., O.I.O., T.P., M.F.P., P.O.A., T.A., A.J.M.H., T.M., P.N., M.M.O., S.R., A.N., J.L., J.R.P., C.A.H. and W.Z. drafted or substantively revised the paper. J.R.P., C.A.H. and W.Z. supervised the project.

Corresponding author

Correspondence to Wei Zheng.

Ethics declarations

Competing interests

O.I.O. is a cofounder of CancerIQ, serves as scientific adviser at Tempus and is on the board of 54gene. A.N. has received research support from Roche. The other authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Laura Fejerman, Catherine Tcheandjieu and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Genome-wide association with overall breast cancer and subtypes.

(a) Overall breast cancer, (b) ER-positive breast cancer, (c) ER-negative breast cancer, (d) triple-negative breast cancer. Associations were estimated in each dataset and meta-analyzed with fixed effects model. Logistic regression was used for genome-wide association analyses, with two-sided tests, using 5 × 10−8 as genome-wide significance level (dashed lines).

Extended Data Fig. 2 Forest plot for risk estimates of selected variants.

(a) missense variant rs61751053 at locus 4q24 in association with overall breast cancer, T allele as effect allele; (b) novel variant rs10853615 at locus 18q11.2 in association with ER-positive breast cancer, A allele as effect allele; (c) novel risk variant rs76664032 at locus 2q14.2 in association with ER-negative breast cancer, A allele as effect allele; (d) novel risk variant rs76664032 at locus 2q14.2 in association with TNBC, A allele as effect allele. r2, imputation quality score. a. Studies for each dataset are described in Supplementary Note and Supplementary Table 1. EAF, effect allele frequency. CI, confidence interval. Logistic regression was used for association analyses with two-sided tests. The sample size (n) was shown for each participating study/dataset. In the bar plot, the log odds ratios (beta) are presented as mean ±1.96*standard error, with mean value as the center and 95% CI as the minimum and maximum.

Extended Data Fig. 3 Distribution of PRS in the testing set of samples.

(a) The PRSAFR was calculated using 94 selected variants with weights derived from the training samples; (b) The PRSEUR constructed with previously reported weight derived from European-ancestry women. Both PRSs were divided by the standard deviation in controls and centered at the mean in controls.

Supplementary information

Supplementary Information

Supplementary Figs. 1–10, Methods and Acknowledgements.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–12

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jia, G., Ping, J., Guo, X. et al. Genome-wide association analyses of breast cancer in women of African ancestry identify new susceptibility loci and improve risk prediction. Nat Genet 56, 819–826 (2024). https://doi.org/10.1038/s41588-024-01736-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-024-01736-4

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer