Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses


Breast cancer susceptibility variants frequently show heterogeneity in associations by tumor subtype1,2,3. To identify novel loci, we performed a genome-wide association study including 133,384 breast cancer cases and 113,789 controls, plus 18,908 BRCA1 mutation carriers (9,414 with breast cancer) of European ancestry, using both standard and novel methodologies that account for underlying tumor heterogeneity by estrogen receptor, progesterone receptor and human epidermal growth factor receptor 2 status and tumor grade. We identified 32 novel susceptibility loci (P < 5.0 × 10−8), 15 of which showed evidence for associations with at least one tumor feature (false discovery rate < 0.05). Five loci showed associations (P < 0.05) in opposite directions between luminal and non-luminal subtypes. In silico analyses showed that these five loci contained cell-specific enhancers that differed between normal luminal and basal mammary cells. The genetic correlations between five intrinsic-like subtypes ranged from 0.35 to 0.80. The proportion of genome-wide chip heritability explained by all known susceptibility loci was 54.2% for luminal A-like disease and 37.6% for triple-negative disease. The odds ratios of polygenic risk scores, which included 330 variants, for the highest 1% of quantiles compared with middle quantiles were 5.63 and 3.02 for luminal A-like and triple-negative disease, respectively. These findings provide an improved understanding of genetic predisposition to breast cancer subtypes and will inform the development of subtype-specific polygenic risk scores.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Ideogram of all of the independent genome-wide-significant breast cancer susceptibility variants in overall, subtype, BCAC triple-negative and CIMBA BRCA1 carrier meta-analyses.
Fig. 2: Heatmap and clustering of P values from a marker-specific heterogeneity test for 32 breast cancer susceptibility loci.
Fig. 3: Susceptibility variants with associations in opposite directions across subtypes.
Fig. 4: Heatmap of CCVs overlapping with enhancer states in primary breast subpopulations for five variants with associations in the opposite direction across subtypes.
Fig. 5: Genetic correlation between the five intrinsic-like breast cancer subtypes and breast cancer in BRCA1 mutation carriers, estimated through linkage disequilibrium score regression.

Data availability

Summary-level statistics are available from and Requests for data can be made to the corresponding author or the Data Access Coordination Committees (DACCs) of BCAC (see above URL) and CIMBA (see above URL). BCAC DACC approval is required to access data from the 2SISTER, ABCS, ABCS-F, ABCTB, BBCC, BCEES, BCFR-NY, BCFR-PA, BCINIS, BIGGS, BREOGAN, BSUCH, CECILE, CGPS, CNIO-BCS, CPSII, CTS, DIETCOMPLYF, ESTHER, FHRISK, FHRISK, GENICA, GEPARSIXTO, HABCS, HCSC, HEBCS, HMBCS, HUBCS, KARBAC, KARMA, KBCP, KCONFAB/AOCS, LMBC, MABCS, MBCSG, MCBCS, MISS, MMHS, MTLGEBCS, NBCS, NCBCS, OBCS, ORIGO, PKARMA, PREFACE, PROCAS, RBCS, SEARCH, SKKDKFZS, SUCCESSB, SUCCESSC, SZBCS, TNBCC, UCIBCS, UKBGS, UKOPS and USRT studies (Supplementary Table 1). CIMBA DACC approval is required to access data from the BCFR-ON, CONSIT TEAM, DKFZ, EMBRACE, FPGMX, GC-HBOC, GEMO, G-FAST, HEBCS, HEBON, IHCC, INHERIT, IOVHBOCS, IPOBCS, MCGILL, MODSQUAD, NAROD, OCGN, OUH and UKGRFOCR studies (Supplementary Table 3).

Code availability

The data analysis code relevant to this paper is available at The implementation of this two-stage polytomous regression method is available in an R package called TOP (, with a detailed tutorial available at


  1. 1.

    Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).

    Article  Google Scholar 

  2. 2.

    Milne, R. L. et al. Identification of ten variants associated with risk of estrogen-receptor-negative breast cancer. Nat. Genet. 49, 1767–1778 (2017).

    CAS  Article  Google Scholar 

  3. 3.

    Garcia-Closas, M. et al. Genome-wide association studies identify four ER negative-specific breast cancer risk loci. Nat. Genet. 45, 392–398 (2013).

    CAS  Article  Google Scholar 

  4. 4.

    Zhang, H. et al. A mixed-model approach for powerful testing of genetic associations with cancer risk incorporating tumor characteristics. Biostatistics (2020).

  5. 5.

    Michailidou, K. et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat. Genet. 45, 353–361 (2013).

    CAS  Article  Google Scholar 

  6. 6.

    Michailidou, K. et al. Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nat. Genet. 47, 373–380 (2015).

    CAS  Article  Google Scholar 

  7. 7.

    Dempster, A. P., Laird, N. M. & Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–38 (1977).

    Google Scholar 

  8. 8.

    Curigliano, G. et al. De-escalating and escalating treatments for early-stage breast cancer: the St. Gallen International Expert Consensus Conference on the Primary Therapy of Early Breast Cancer 2017. Ann. Oncol. 28, 1700–1712 (2017).

    CAS  Article  Google Scholar 

  9. 9.

    Spurdle, A. B. et al. Refined histopathological predictors of BRCA1 and BRCA2 mutation status: a large-scale analysis of breast cancer characteristics from the BCAC, CIMBA, and ENIGMA consortia. Breast Cancer Res. 16, 3419 (2014).

    Article  Google Scholar 

  10. 10.

    Phelan, C. M. et al. Identification of 12 new susceptibility loci for different histotypes of epithelial ovarian cancer. Nat. Genet. 49, 680–691 (2017).

    CAS  Article  Google Scholar 

  11. 11.

    Stacey, S. N. et al. A germline variant in the TP53 polyadenylation signal confers cancer susceptibility. Nat. Genet. 43, 1098–1103 (2011).

    CAS  Article  Google Scholar 

  12. 12.

    Pellacani, D. et al. Analysis of normal human mammary epigenomes reveals cell-specific active enhancer states and associated transcription factor networks. Cell Rep. 17, 2060–2074 (2016).

    CAS  Article  Google Scholar 

  13. 13.

    Beesley, J. et al. Chromatin interactome mapping at 139 independent breast cancer risk signals. Genome Biol. 21, 8 (2020).

    Article  Google Scholar 

  14. 14.

    Fachal, L. et al. Fine-mapping of 150 breast cancer risk regions identifies 178 high confidence target genes. Nat. Genet. 52, 56–73 (2020).

    CAS  Article  Google Scholar 

  15. 15.

    Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).

    CAS  Article  Google Scholar 

  16. 16.

    Ferreira, M. A. et al. Genome-wide association and transcriptome studies identify target genes and risk loci for breast cancer. Nat. Commun. 10, 1741 (2019).

    Article  Google Scholar 

  17. 17.

    Wu, L. et al. A transcriptome-wide association study of 229,000 women identifies new candidate susceptibility genes for breast cancer. Nat. Genet. 50, 968–978 (2018).

    CAS  Article  Google Scholar 

  18. 18.

    Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    CAS  Article  Google Scholar 

  19. 19.

    Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    CAS  Article  Google Scholar 

  20. 20.

    Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    CAS  Article  Google Scholar 

  21. 21.

    Ahearn, T. U. et al. Common breast cancer risk loci predispose to distinct tumor subtypes. Preprint at bioRxiv (2019).

  22. 22.

    Mavaddat, N. et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am. J. Hum. Genet. 104, 21–34 (2019).

    CAS  Article  Google Scholar 

  23. 23.

    Lin, D. Y. & Sullivan, P. F. Meta-analysis of genome-wide association studies with overlapping subjects. Am. J. Hum. Genet. 85, 862–872 (2009).

    CAS  Article  Google Scholar 

  24. 24.

    Chatterjee, N. A two-stage regression model for epidemiological studies with multivariate disease classification data. J. Am. Stat. Assoc. 99, 127–138 (2004).

    Article  Google Scholar 

  25. 25.

    Anderson, W. F., Rosenberg, P. S., Prat, A., Perou, C. M. & Sherman, M. E. How many etiological subtypes of breast cancer: two, three, four, or more? J. Natl. Cancer Inst. 106, dju165 (2014).

    Article  Google Scholar 

  26. 26.

    Barnes, D. R. et al. Evaluation of association methods for analysing modifiers of disease risk in carriers of high-risk mutations. Genet. Epidemiol. 36, 274–291 (2012).

    Article  Google Scholar 

  27. 27.

    Antoniou, A. C. et al. A locus on 19p13 modifies risk of breast cancer in BRCA1 mutation carriers and is associated with hormone receptor-negative breast cancer in the general population. Nat. Genet. 42, 885–892 (2010).

    CAS  Article  Google Scholar 

  28. 28.

    Mavaddat, N. et al. Pathology of breast and ovarian cancers among BRCA1 and BRCA2 mutation carriers: results from the Consortium of Investigators of Modifiers of BRCA1/2 (CIMBA). Cancer Epidemiol. Biomarkers Prev. 21, 134–147 (2012).

    CAS  Article  Google Scholar 

  29. 29.

    Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

    CAS  Article  Google Scholar 

  30. 30.

    Hendricks, A. E., Dupuis, J., Logue, M. W., Myers, R. H. & Lunetta, K. L. Correction for multiple testing in a gene region. Eur. J. Hum. Genet. 22, 414–418 (2014).

    CAS  Article  Google Scholar 

  31. 31.

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

    Google Scholar 

  32. 32.

    Udler, M. S., Tyrer, J. & Easton, D. F. Evaluating the power to discriminate between highly correlated SNPs in genetic association studies. Genet. Epidemiol. 34, 463–468 (2010).

    Article  Google Scholar 

  33. 33.

    Wellcome Trust Case Control Consortiumet al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).

    Article  Google Scholar 

  34. 34.

    Pharoah, P. D. et al. Polygenic susceptibility to breast cancer and implications for prevention. Nat. Genet. 31, 33–36 (2002).

    CAS  Article  Google Scholar 

  35. 35.

    Visscher, P. M., Hill, W. G. & Wray, N. R. Heritability in the genomics era—concepts and misconceptions. Nat. Rev. Genet. 9, 255–266 (2008).

    CAS  Article  Google Scholar 

Download references


We thank all of the individuals who took part in these studies, and all of the researchers, clinicians, technicians and administrative staff who enabled this work to be carried out. This project has been funded in part with Federal funds from the National Cancer Institute Intramural Research Program, National Institutes of Health. Genotyping for the OncoArray was funded by the government of Canada through Genome Canada and the Canadian Institutes of Health Research (GPH-129344), the Ministère de l'Économie et de la Science et de l'Innovation du Québec through Génome Québec, the Quebec Breast Cancer Foundation for the PERSPECTIVE project, the US National Institutes of Health (NIH) (1U19 CA148065 for the Discovery, Biology and Risk of Inherited Variants in Breast Cancer (DRIVE) project and X01HG007492 to the Center for Inherited Disease Research under contract HHSN268201200008I), Cancer Research UK (C1287/A16563), the Odense University Hospital Research Foundation (Denmark), the National R&D Program for Cancer Control–Ministry of Health and Welfare (Republic of Korea; 1420190), the Italian Association for Cancer Research (AIRC; IG16933), the Breast Cancer Research Foundation, the National Health and Medical Research Council (Australia) and German Cancer Aid (110837). Genotyping for the iCOGS array was funded by the European Union (HEALTH-F2-2009-223175), Cancer Research UK (C1287/A10710, C1287/A10118 and C12292/A11174), NIH grants (CA128978, CA116167 and CA176785) and the Post-Cancer GWAS initiative (1U19 CA148537, 1U19 CA148065 and 1U19 CA148112 (GAME-ON initiative)), an NCI Specialized Program of Research Excellence (SPORE) in Breast Cancer (CA116201), the Canadian Institutes of Health Research (CIHR) for the CIHR Team in Familial Risks of Breast Cancer, the Ministère de l'Économie, Innovation et Exportation du Québec (PSR-SIIRI-701), the Komen Foundation for the Cure, the Breast Cancer Research Foundation and the Ovarian Cancer Research Fund. Combination of the GWAS data was supported in part by the NIH Cancer Post-Cancer GWAS initiative (1U19 CA148065) (DRIVE, part of the GAME-ON initiative). Linkage disequilibrium score regression analysis was supported by grant CA194393. BCAC was funded by Cancer Research UK (C1287/A16563) and by the European Union via its Seventh Framework Programme (HEALTH-F2-2009-223175; COGS) and the Horizon 2020 Research and Innovation Programme (633784 (B-CAST) and 634935 (BRIDGES)). CIMBA was funded by Cancer Research UK (C12292/A20861 and C12292/A11174). N.C. was funded by NHGRI (1R01 HG010480-01). For a full description of funding and acknowledgments, see the Supplementary Note.

Author information





Writing group: H.Z., T.U.A., D. Barnes, J. Beesley, G.Q., X.J., T.A.O., R.L.M., P.K., J. Simard, P.D.P.P., K.M., A.C.A., M.K.S., G.C.-T., D.F.E., N.C., M.G.-C. Statistical analysis: H.Z., T.U.A., J. Lecarpentier, D. Barnes, J. Beesley, G.Q., X.J., T.A.O., N.Z. Provision of DNA samples and/or phenotypic data: M.K.B., A.M.D., J.D., Q.W., Z.A.F., K.A., I.L.A., H.A.-C., V.A., K.J.A., B.K.A., P.L.A., J.A., D. Barrowdale, H. Becher, M.W.B., S.B., J. Benitez, M.B., K.B., A. Blanco, C.B., N.V.B., S.E.B., B. Bonanni, D. Bondavalli, A. Borg, H. Brauch, H. Brenner, I.B., A. Broeks, S.Y.B., T.B., B. Burwinkel, S.S.B., H. Byers, T.C., M.A.C., M.C., D.C., J.E.C., J.C.-C., S.J.C., M.C., H.C., W.K.C., K.B.M.C., C.L.C., S.C., F.J.C., A.C., S.S.C., K.C., M.B.D., P.D., O.D., S.M.D., T.D., M.D., D.M.E., A.B.E., D.G.E., P.A.F., J.F., L.F., F.F., E.F., D.F., M.G.-D., S.M.G., J. Garber, J.A.G.-S., M.M.G., S.A.G., G.G.G., A.K.G., M.S.G., D.E.G., A.G.-N., M.H.G., J. Gronwald, P.G., L.H., E.H., C.A.H., C.R.H., P. Hall, U.H., E.F.H., B.A.M.H.-G., P. Hillemanns, F.B.L.H., B.H., A. Hollestelle, M.J.H., R.N.H., J.L.H., A. Howell, H.H., P.J.H., E.N.I., C.I., L.I., A. Jager, M.J., A. Jakubowska, P.J., R.J., W.J., E.M.J., M.E.J., A. Jung, R. Kaaks, P.M.K., B.Y.K., R. Keeman, S. Khan, E.K., C.M.K., Y.D.K., I.K., L.B.K., S. Koutros, V.N.K., A.-V.L., D.L., S.C.L., P.L.-P., C.L., E.L., F. Lejbkowicz, G.L., F. Lesueur, A. Lindblom, J. Lissowska, W.-Y.L., J.T.L., J. Lubinski, A. Lukomska, R.J.M., A. Mannermaa, M. Manoochehri, S. Manoukian, S. Margolin, M.E.M., L. Matricardi, L. McGuffog, C. McLean, N.M., A. Meindl, U.M., A. Miller, E.M., M. Montagna, A.M.M., C. Mulot, T.A.M., K.L.N., S.L.N., H.N., P.N., W.G.N., F.C.N., L.N.-Z., J.N., K.O., E.O., O.I.O., H.O., N.O., L.P., J. Papp, T.-W.P.-S., M.T.P., B. Peissel, A.P., B. Peshkin, P.P., J. Peto, K.-A.P., M.P., D.P.-K., K.P., R.P., D.P., B.R., P.R., S.J.R., J.R., M.U.R., G.R., H.S.R., H.A.R., A.R., M.A.R., R.R., T.R., E.S., S. Sampson, D.P.S., E.J.S., M.T.S., R.K.S., A.S., M.J.S., B.S., P. Schürmann, L.S., P. Sharma, M.E.S., X.-O.S., C.F.S., S. Smichkoska, P. Soucy, M.C.S., J.J.S., J. Stone, D.S.-L., A.J.S., C.I.S., R.M.T., W.J.T., J.A.T., M.R.T., M. Terry, M. Thomassen, D.L.T., M. Tischkowitz, A.E.T., R.A.E.M.T., I.T., D.T., M.A.T., T.T., N.T., M.U., C.M.V., A.M.W.V.D.O., L.E.V.D.K., E.M.V.V., E.J.V.R., A.V., B.W., C.R.W., J.N.W., H.W., R.W., A.W., X.R.Y., D.Y., W.Z., K.K.Z., R.L.M., P.K., J. Simard, P.D.P.P., K.M., A.C.A., M.K.S., G.C.-T., D.F.E., M.G.-C. All authors read and approved the final version of the manuscript.

Corresponding author

Correspondence to Nilanjan Chatterjee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Overview of the analytic strategy and results from the investigation of breast cancer susceptibility variants in women of European descent.

Analyses included investigating for susceptibility variants for overall breast cancer (invasive, in-situ or unknown invasiveness) and for susceptibility variants accounting for tumor heterogeneity according to the estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2), and grade, and specifically investigating for variants that predispose for risk of the triple-negative subtype. 1) Genotyping data from two Illumina genome-wide custom arrays, the iCOGS and Oncoarray, and imputed to the 1000 Genomes Project (Phase 3). (2) Overall breast cancer (invasive, in-situ, or unknown invasiveness) analyses included 82 studies from the Breast Cancer Association Consortium (BCAC; 118,474 cases and 96,201 controls) and summary level data from 11 other breast cancer GWAS (14,910 cases and 17,588 controls; Supplementary Table 1). (3) Analyses accounting for tumor marker heterogeneity according to ER, PR, HER2 and grade included 81 studies from BCAC (106,278 invasive cases and 91,477 controls). (4) Analyses investigating triple-negative susceptibility variants included 91,477 controls and 8,602 triple-negative TN (effective sample, see Supplementary Note) cases from BCAC and 9,414 affected and 9,494 unaffected BRCA1/2 carriers from 60 studies from the Consortium of Investigators of Modifiers of BRCA1/2 (CIMBA; Supplementary Table 3). (5) Variants excluded following conditional analyses showing the identified variants to not be independent (P>1x10-6) of 178 known susceptibility variants (see Methods). (6) See Supplementary Fig. 6 for results of country-specific sensitivity analyses. (7) See Supplementary Table 5 for the 22 independent susceptibility variants identified in overall breast cancer analyses. (8) See Supplementary Table 6 for the 16 independent susceptibility variants identified using two-stage polytomous regression, accounting for tumor markers heterogeneity according to ER, PR, HER2, and grade. Note that 8 of the 16 variants were also detected in the overall breast cancer analysis (9) See Supplementary Table 7 for the 3 independent susceptibility variants identified in the CIMBA/BCAC- triple-negative TN meta-analysis. Note that rs78378222 was detected in both the analyses using the two-stage polytomous regression and in CIMBA/BCAC- triple-negative TN.

Extended Data Fig. 2 Associations between three different polygenetic risk scores1,2,3 and luminal A-like4 risk in the test dataset.

Odds ratios for different quantiles of the PRS against the middle quantile (40%–60%) of the PRS. The odds ratios were estimated using the test dataset like (n = 7,325 Luminal-A like cases, n = 20,815 controls).

Extended Data Fig. 3 Associations between three different polygenetic risk scores1,2,3 and luminal B/HER2-negative-like4 risk in the test dataset.

Odds ratios for different quantiles of the PRS against the middle quantile (40%–60%) of the PRS. The odds ratios were estimated using the test dataset like (n = 1,779 Luminal B/HER2-negative-like cases, n = 20,815 controls).

Extended Data Fig. 4 Associations between three different polygenetic risk scores1,2,3 and luminal B-like4 risk in the test dataset.

Odds ratios for different quantiles of the PRS against the middle quantile (40%–60%) of the PRS. The odds ratios were estimated using the test dataset like (n = 1,682 Luminal B-like cases, n = 20,815 controls).

Extended Data Fig. 5 Associations between three different polygenetic risk scores1,2,3 and HER2-enriched-like4 risk in the test dataset.

Odds ratios for different quantiles of the PRS against the middle quantile (40%–60%) of the PRS. The odds ratios were estimated using the test dataset like (n = 718 HER2-enriched-like, n = 20,815 controls).

Extended Data Fig. 6 Associations between three different polygenetic risk scores1,2,3 and triple-negative4 risk in the test dataset.

Odds ratios for different quantiles of the PRS against the middle quantile (40%–60%) of the PRS. The odds ratios were estimated using the test dataset like (n = 2,006 triple-negative cases, n = 20,815 controls).

Supplementary information

Supplementary Information

Supplementary Figs. 1–10 and Note

Reporting Summary

Supplementary Tables

Supplementary Tables 1–19

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Ahearn, T.U., Lecarpentier, J. et al. Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses. Nat Genet 52, 572–581 (2020).

Download citation