Breast cancer susceptibility variants frequently show heterogeneity in associations by tumor subtype1,2,3. To identify novel loci, we performed a genome-wide association study including 133,384 breast cancer cases and 113,789 controls, plus 18,908 BRCA1 mutation carriers (9,414 with breast cancer) of European ancestry, using both standard and novel methodologies that account for underlying tumor heterogeneity by estrogen receptor, progesterone receptor and human epidermal growth factor receptor 2 status and tumor grade. We identified 32 novel susceptibility loci (P < 5.0 × 10−8), 15 of which showed evidence for associations with at least one tumor feature (false discovery rate < 0.05). Five loci showed associations (P < 0.05) in opposite directions between luminal and non-luminal subtypes. In silico analyses showed that these five loci contained cell-specific enhancers that differed between normal luminal and basal mammary cells. The genetic correlations between five intrinsic-like subtypes ranged from 0.35 to 0.80. The proportion of genome-wide chip heritability explained by all known susceptibility loci was 54.2% for luminal A-like disease and 37.6% for triple-negative disease. The odds ratios of polygenic risk scores, which included 330 variants, for the highest 1% of quantiles compared with middle quantiles were 5.63 and 3.02 for luminal A-like and triple-negative disease, respectively. These findings provide an improved understanding of genetic predisposition to breast cancer subtypes and will inform the development of subtype-specific polygenic risk scores.
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Summary-level statistics are available from http://bcac.ccge.medschl.cam.ac.uk/bcacdata/ and http://cimba.ccge.medschl.cam.ac.uk/projects/. Requests for data can be made to the corresponding author or the Data Access Coordination Committees (DACCs) of BCAC (see above URL) and CIMBA (see above URL). BCAC DACC approval is required to access data from the 2SISTER, ABCS, ABCS-F, ABCTB, BBCC, BCEES, BCFR-NY, BCFR-PA, BCINIS, BIGGS, BREOGAN, BSUCH, CECILE, CGPS, CNIO-BCS, CPSII, CTS, DIETCOMPLYF, ESTHER, FHRISK, FHRISK, GENICA, GEPARSIXTO, HABCS, HCSC, HEBCS, HMBCS, HUBCS, KARBAC, KARMA, KBCP, KCONFAB/AOCS, LMBC, MABCS, MBCSG, MCBCS, MISS, MMHS, MTLGEBCS, NBCS, NCBCS, OBCS, ORIGO, PKARMA, PREFACE, PROCAS, RBCS, SEARCH, SKKDKFZS, SUCCESSB, SUCCESSC, SZBCS, TNBCC, UCIBCS, UKBGS, UKOPS and USRT studies (Supplementary Table 1). CIMBA DACC approval is required to access data from the BCFR-ON, CONSIT TEAM, DKFZ, EMBRACE, FPGMX, GC-HBOC, GEMO, G-FAST, HEBCS, HEBON, IHCC, INHERIT, IOVHBOCS, IPOBCS, MCGILL, MODSQUAD, NAROD, OCGN, OUH and UKGRFOCR studies (Supplementary Table 3).
The data analysis code relevant to this paper is available at https://github.com/andrewhaoyu/breast_cancer_data_analysis. The implementation of this two-stage polytomous regression method is available in an R package called TOP (https://github.com/andrewhaoyu/TOP), with a detailed tutorial available at https://github.com/andrewhaoyu/TOP/blob/master/inst/TOP.pdf.
Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).
Milne, R. L. et al. Identification of ten variants associated with risk of estrogen-receptor-negative breast cancer. Nat. Genet. 49, 1767–1778 (2017).
Garcia-Closas, M. et al. Genome-wide association studies identify four ER negative-specific breast cancer risk loci. Nat. Genet. 45, 392–398 (2013).
Zhang, H. et al. A mixed-model approach for powerful testing of genetic associations with cancer risk incorporating tumor characteristics. Biostatistics https://doi.org/10.1093/biostatistics/kxz065 (2020).
Michailidou, K. et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat. Genet. 45, 353–361 (2013).
Michailidou, K. et al. Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nat. Genet. 47, 373–380 (2015).
Dempster, A. P., Laird, N. M. & Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–38 (1977).
Curigliano, G. et al. De-escalating and escalating treatments for early-stage breast cancer: the St. Gallen International Expert Consensus Conference on the Primary Therapy of Early Breast Cancer 2017. Ann. Oncol. 28, 1700–1712 (2017).
Spurdle, A. B. et al. Refined histopathological predictors of BRCA1 and BRCA2 mutation status: a large-scale analysis of breast cancer characteristics from the BCAC, CIMBA, and ENIGMA consortia. Breast Cancer Res. 16, 3419 (2014).
Phelan, C. M. et al. Identification of 12 new susceptibility loci for different histotypes of epithelial ovarian cancer. Nat. Genet. 49, 680–691 (2017).
Stacey, S. N. et al. A germline variant in the TP53 polyadenylation signal confers cancer susceptibility. Nat. Genet. 43, 1098–1103 (2011).
Pellacani, D. et al. Analysis of normal human mammary epigenomes reveals cell-specific active enhancer states and associated transcription factor networks. Cell Rep. 17, 2060–2074 (2016).
Beesley, J. et al. Chromatin interactome mapping at 139 independent breast cancer risk signals. Genome Biol. 21, 8 (2020).
Fachal, L. et al. Fine-mapping of 150 breast cancer risk regions identifies 178 high confidence target genes. Nat. Genet. 52, 56–73 (2020).
Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).
Ferreira, M. A. et al. Genome-wide association and transcriptome studies identify target genes and risk loci for breast cancer. Nat. Commun. 10, 1741 (2019).
Wu, L. et al. A transcriptome-wide association study of 229,000 women identifies new candidate susceptibility genes for breast cancer. Nat. Genet. 50, 968–978 (2018).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Ahearn, T. U. et al. Common breast cancer risk loci predispose to distinct tumor subtypes. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/733402v1 (2019).
Mavaddat, N. et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am. J. Hum. Genet. 104, 21–34 (2019).
Lin, D. Y. & Sullivan, P. F. Meta-analysis of genome-wide association studies with overlapping subjects. Am. J. Hum. Genet. 85, 862–872 (2009).
Chatterjee, N. A two-stage regression model for epidemiological studies with multivariate disease classification data. J. Am. Stat. Assoc. 99, 127–138 (2004).
Anderson, W. F., Rosenberg, P. S., Prat, A., Perou, C. M. & Sherman, M. E. How many etiological subtypes of breast cancer: two, three, four, or more? J. Natl. Cancer Inst. 106, dju165 (2014).
Barnes, D. R. et al. Evaluation of association methods for analysing modifiers of disease risk in carriers of high-risk mutations. Genet. Epidemiol. 36, 274–291 (2012).
Antoniou, A. C. et al. A locus on 19p13 modifies risk of breast cancer in BRCA1 mutation carriers and is associated with hormone receptor-negative breast cancer in the general population. Nat. Genet. 42, 885–892 (2010).
Mavaddat, N. et al. Pathology of breast and ovarian cancers among BRCA1 and BRCA2 mutation carriers: results from the Consortium of Investigators of Modifiers of BRCA1/2 (CIMBA). Cancer Epidemiol. Biomarkers Prev. 21, 134–147 (2012).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Hendricks, A. E., Dupuis, J., Logue, M. W., Myers, R. H. & Lunetta, K. L. Correction for multiple testing in a gene region. Eur. J. Hum. Genet. 22, 414–418 (2014).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
Udler, M. S., Tyrer, J. & Easton, D. F. Evaluating the power to discriminate between highly correlated SNPs in genetic association studies. Genet. Epidemiol. 34, 463–468 (2010).
Wellcome Trust Case Control Consortiumet al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).
Pharoah, P. D. et al. Polygenic susceptibility to breast cancer and implications for prevention. Nat. Genet. 31, 33–36 (2002).
Visscher, P. M., Hill, W. G. & Wray, N. R. Heritability in the genomics era—concepts and misconceptions. Nat. Rev. Genet. 9, 255–266 (2008).
We thank all of the individuals who took part in these studies, and all of the researchers, clinicians, technicians and administrative staff who enabled this work to be carried out. This project has been funded in part with Federal funds from the National Cancer Institute Intramural Research Program, National Institutes of Health. Genotyping for the OncoArray was funded by the government of Canada through Genome Canada and the Canadian Institutes of Health Research (GPH-129344), the Ministère de l'Économie et de la Science et de l'Innovation du Québec through Génome Québec, the Quebec Breast Cancer Foundation for the PERSPECTIVE project, the US National Institutes of Health (NIH) (1U19 CA148065 for the Discovery, Biology and Risk of Inherited Variants in Breast Cancer (DRIVE) project and X01HG007492 to the Center for Inherited Disease Research under contract HHSN268201200008I), Cancer Research UK (C1287/A16563), the Odense University Hospital Research Foundation (Denmark), the National R&D Program for Cancer Control–Ministry of Health and Welfare (Republic of Korea; 1420190), the Italian Association for Cancer Research (AIRC; IG16933), the Breast Cancer Research Foundation, the National Health and Medical Research Council (Australia) and German Cancer Aid (110837). Genotyping for the iCOGS array was funded by the European Union (HEALTH-F2-2009-223175), Cancer Research UK (C1287/A10710, C1287/A10118 and C12292/A11174), NIH grants (CA128978, CA116167 and CA176785) and the Post-Cancer GWAS initiative (1U19 CA148537, 1U19 CA148065 and 1U19 CA148112 (GAME-ON initiative)), an NCI Specialized Program of Research Excellence (SPORE) in Breast Cancer (CA116201), the Canadian Institutes of Health Research (CIHR) for the CIHR Team in Familial Risks of Breast Cancer, the Ministère de l'Économie, Innovation et Exportation du Québec (PSR-SIIRI-701), the Komen Foundation for the Cure, the Breast Cancer Research Foundation and the Ovarian Cancer Research Fund. Combination of the GWAS data was supported in part by the NIH Cancer Post-Cancer GWAS initiative (1U19 CA148065) (DRIVE, part of the GAME-ON initiative). Linkage disequilibrium score regression analysis was supported by grant CA194393. BCAC was funded by Cancer Research UK (C1287/A16563) and by the European Union via its Seventh Framework Programme (HEALTH-F2-2009-223175; COGS) and the Horizon 2020 Research and Innovation Programme (633784 (B-CAST) and 634935 (BRIDGES)). CIMBA was funded by Cancer Research UK (C12292/A20861 and C12292/A11174). N.C. was funded by NHGRI (1R01 HG010480-01). For a full description of funding and acknowledgments, see the Supplementary Note.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Overview of the analytic strategy and results from the investigation of breast cancer susceptibility variants in women of European descent.
Analyses included investigating for susceptibility variants for overall breast cancer (invasive, in-situ or unknown invasiveness) and for susceptibility variants accounting for tumor heterogeneity according to the estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2), and grade, and specifically investigating for variants that predispose for risk of the triple-negative subtype. 1) Genotyping data from two Illumina genome-wide custom arrays, the iCOGS and Oncoarray, and imputed to the 1000 Genomes Project (Phase 3). (2) Overall breast cancer (invasive, in-situ, or unknown invasiveness) analyses included 82 studies from the Breast Cancer Association Consortium (BCAC; 118,474 cases and 96,201 controls) and summary level data from 11 other breast cancer GWAS (14,910 cases and 17,588 controls; Supplementary Table 1). (3) Analyses accounting for tumor marker heterogeneity according to ER, PR, HER2 and grade included 81 studies from BCAC (106,278 invasive cases and 91,477 controls). (4) Analyses investigating triple-negative susceptibility variants included 91,477 controls and 8,602 triple-negative TN (effective sample, see Supplementary Note) cases from BCAC and 9,414 affected and 9,494 unaffected BRCA1/2 carriers from 60 studies from the Consortium of Investigators of Modifiers of BRCA1/2 (CIMBA; Supplementary Table 3). (5) Variants excluded following conditional analyses showing the identified variants to not be independent (P>1x10-6) of 178 known susceptibility variants (see Methods). (6) See Supplementary Fig. 6 for results of country-specific sensitivity analyses. (7) See Supplementary Table 5 for the 22 independent susceptibility variants identified in overall breast cancer analyses. (8) See Supplementary Table 6 for the 16 independent susceptibility variants identified using two-stage polytomous regression, accounting for tumor markers heterogeneity according to ER, PR, HER2, and grade. Note that 8 of the 16 variants were also detected in the overall breast cancer analysis (9) See Supplementary Table 7 for the 3 independent susceptibility variants identified in the CIMBA/BCAC- triple-negative TN meta-analysis. Note that rs78378222 was detected in both the analyses using the two-stage polytomous regression and in CIMBA/BCAC- triple-negative TN.
Extended Data Fig. 2 Associations between three different polygenetic risk scores1,2,3 and luminal A-like4 risk in the test dataset.
Odds ratios for different quantiles of the PRS against the middle quantile (40%–60%) of the PRS. The odds ratios were estimated using the test dataset like (n = 7,325 Luminal-A like cases, n = 20,815 controls).
Extended Data Fig. 3 Associations between three different polygenetic risk scores1,2,3 and luminal B/HER2-negative-like4 risk in the test dataset.
Odds ratios for different quantiles of the PRS against the middle quantile (40%–60%) of the PRS. The odds ratios were estimated using the test dataset like (n = 1,779 Luminal B/HER2-negative-like cases, n = 20,815 controls).
Extended Data Fig. 4 Associations between three different polygenetic risk scores1,2,3 and luminal B-like4 risk in the test dataset.
Odds ratios for different quantiles of the PRS against the middle quantile (40%–60%) of the PRS. The odds ratios were estimated using the test dataset like (n = 1,682 Luminal B-like cases, n = 20,815 controls).
Extended Data Fig. 5 Associations between three different polygenetic risk scores1,2,3 and HER2-enriched-like4 risk in the test dataset.
Odds ratios for different quantiles of the PRS against the middle quantile (40%–60%) of the PRS. The odds ratios were estimated using the test dataset like (n = 718 HER2-enriched-like, n = 20,815 controls).
Extended Data Fig. 6 Associations between three different polygenetic risk scores1,2,3 and triple-negative4 risk in the test dataset.
Odds ratios for different quantiles of the PRS against the middle quantile (40%–60%) of the PRS. The odds ratios were estimated using the test dataset like (n = 2,006 triple-negative cases, n = 20,815 controls).
About this article
Cite this article
Zhang, H., Ahearn, T.U., Lecarpentier, J. et al. Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses. Nat Genet 52, 572–581 (2020). https://doi.org/10.1038/s41588-020-0609-2