Fine-mapping of 150 breast cancer risk regions identifies 191 likely target genes


Genome-wide association studies have identified breast cancer risk variants in over 150 genomic regions, but the mechanisms underlying risk remain largely unknown. These regions were explored by combining association analysis with in silico genomic feature annotations. We defined 205 independent risk-associated signals with the set of credible causal variants in each one. In parallel, we used a Bayesian approach (PAINTOR) that combines genetic association, linkage disequilibrium and enriched genomic features to determine variants with high posterior probabilities of being causal. Potentially causal variants were significantly over-represented in active gene regulatory regions and transcription factor binding sites. We applied our INQUSIT pipeline for prioritizing genes as targets of those potentially causal variants, using gene expression (expression quantitative trait loci), chromatin interaction and functional annotations. Known cancer drivers, transcription factors and genes in the developmental, apoptosis, immune system and DNA integrity checkpoint gene ontology pathways were over-represented among the highest-confidence target genes.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Flowchart summarizing the study design.
Fig. 2: Determining independent risk signals and CCVs.
Fig. 3: Overlap of CCVs with gene regulatory regions, gene bodies and TFBSs.
Fig. 4: Predicted target genes are enriched in known breast cancer driver genes and transcription factors.
Fig. 5: Predicted target genes by phenotype and significantly enriched pathways.

Data availability

The credible set of causal variants (determined by either multinomial stepwise regression or PAINTOR) is provided in Supplementary Table 2c. Further information and requests for resources should be directed to M.K.B. (


  1. 1.

    Milne, R. L. et al. Identification of ten variants associated with risk of estrogen-receptor-negative breast cancer. Nat. Genet. 49, 1767–1778 (2017).

  2. 2.

    Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).

  3. 3.

    Ghoussaini, M. et al. Evidence that breast cancer risk at the 2q35 locus is mediated through IGFBP5 regulation. Nat. Commun. 4, 4999 (2014).

  4. 4.

    Wyszynski, A. et al. An intergenic risk locus containing an enhancer deletion in 2q35 modulates breast cancer risk by deregulating IGFBP5 expression. Hum. Mol. Genet. 25, 3863–3876 (2016).

  5. 5.

    Guo, X. et al. Fine-scale mapping of the 4q24 locus identifies two independent loci associated with breast cancer risk. Cancer Epidemiol. Biomark. Prev. 24, 1680–1691 (2015).

  6. 6.

    Glubb, D. M. et al. Fine-scale mapping of the 5q11.2 breast cancer locus reveals at least three independent risk variants regulating MAP3K1. Am. J. Hum. Genet. 96, 5–20 (2015).

  7. 7.

    Dunning, A. M. et al. Breast cancer risk variants at 6q25 display different phenotype associations and regulate ESR1, RMND1 and CCDC170. Nat. Genet. 48, 374–386 (2016).

  8. 8.

    Shi, J. et al. Fine-scale mapping of 8q24 locus identifies multiple independent risk variants for breast cancer. Int. J. Cancer 139, 1303–1317 (2016).

  9. 9.

    Orr, N. et al. Fine-mapping identifies two additional breast cancer susceptibility loci at 9q31.2. Hum. Mol. Genet. 24, 2966–2984 (2015).

  10. 10.

    Darabi, H. et al. Polymorphisms in a putative enhancer at the 10q21.2 breast cancer risk locus regulate NRBF2 expression. Am. J. Hum. Genet. 97, 22–34 (2015).

  11. 11.

    Darabi, H. et al. Fine scale mapping of the 17q22 breast cancer locus using dense SNPs, genotyped within the Collaborative Oncological Gene-Environment Study (COGs). Sci. Rep. 6, 32512 (2016).

  12. 12.

    Meyer, K. B. et al. Fine-scale mapping of the FGFR2 breast cancer risk locus: putative functional variants differentially bind FOXA1 and E2F1. Am. J. Hum. Genet. 93, 1046–1060 (2013).

  13. 13.

    Betts, J. A. et al. Long noncoding RNAs CUPID1 and CUPID2 mediate breast cancer risk at 11q13 by modulating the response to DNA damage. Am. J. Hum. Genet. 101, 255–266 (2017).

  14. 14.

    French, J. D. et al. Functional variants at the 11q13 risk locus for breast cancer regulate cyclin D1 expression through long-range enhancers. Am. J. Hum. Genet. 92, 489–503 (2013).

  15. 15.

    Ghoussaini, M. et al. Evidence that the 5p12 variant rs10941679 confers susceptibility to estrogen-receptor-positive breast cancer through FGF10 and MRPS30 regulation. Am. J. Hum. Genet. 99, 903–911 (2016).

  16. 16.

    Horne, H. N. et al. Fine-mapping of the 1p11.2 breast cancer susceptibility locus. PLoS ONE 11, e0160316 (2016).

  17. 17.

    Zeng, C. et al. Identification of independent association signals and putative functional variants for breast cancer risk through fine-scale mapping of the 12p11 locus. Breast Cancer Res. 18, 64 (2016).

  18. 18.

    Lin, W. Y. et al. Identification and characterization of novel associations in the CASP8/ALS2CR12 region on chromosome 2 with breast cancer risk. Hum. Mol. Genet. 24, 285–298 (2015).

  19. 19.

    Bojesen, S. E. et al. Multiple independent variants at the TERT locus are associated with telomere length and risks of breast and ovarian cancer. Nat. Genet. 45, 371–384.e2 (2013).

  20. 20.

    Lawrenson, K. et al. Functional mechanisms underlying pleiotropic risk alleles at the 19p13.1 breast–ovarian cancer susceptibility locus. Nat. Commun. 7, 12675 (2016).

  21. 21.

    Amos, C. I. et al. The OncoArray Consortium: a network for understanding the genetic architecture of common cancers. Cancer Epidemiol. Biomark. Prev. 26, 126–135 (2017).

  22. 22.

    Michailidou, K. et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat. Genet. 45, 353–361.e2 (2013).

  23. 23.

    Michailidou, K. et al. Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nat. Genet. 47, 373–380 (2015).

  24. 24.

    Udler, M. S., Tyrer, J. & Easton, D. F. Evaluating the power to discriminate between highly correlated SNPs in genetic association studies. Genet. Epidemiol. 34, 463–468 (2010).

  25. 25.

    Mavaddat, N., Antoniou, A. C., Easton, D. F. & Garcia-Closas, M. Genetic susceptibility to breast cancer. Mol. Oncol. 4, 174–191 (2010).

  26. 26.

    Lakhani, S. R. et al. Prediction of BRCA1 status in patients with breast cancer using estrogen receptor and basal phenotype. Clin. Cancer Res. 11, 5175–5180 (2005).

  27. 27.

    Taberlay, P. C., Statham, A. L., Kelly, T. K., Clark, S. J. & Jones, P. A. Reconfiguration of nucleosome-depleted regions at distal regulatory elements accompanies DNA methylation of enhancers and insulators in cancer. Genome Res. 24, 1421–1432 (2014).

  28. 28.

    Hnisz, D. et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013).

  29. 29.

    Farh, K. K. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).

  30. 30.

    Cowper-Sal lari, R. et al. Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression. Nat. Genet. 44, 1191–1198 (2012).

  31. 31.

    Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).

  32. 32.

    Quiroz-Zarate, A. et al. Expression quantitative trait loci (QTL) in tumor adjacent normal breast tissue and breast tumor tissue. PLoS ONE 12, e0170181 (2017).

  33. 33.

    Cancer Genome Atlas Research Networket al. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

  34. 34.

    Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012).

  35. 35.

    Ciriello, G. et al. Comprehensive molecular portraits of invasive lobular breast cancer. Cell 163, 506–519 (2015).

  36. 36.

    Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).

  37. 37.

    Pereira, B. et al. The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes. Nat. Commun. 7, 11479 (2016).

  38. 38.

    Cancer Genome Atlas Network Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).

  39. 39.

    Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385.e18 (2018).

  40. 40.

    Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018).

  41. 41.

    Artero-Castro, A. et al. Disruption of the ribosomal P complex leads to stress-induced autophagy. Autophagy 11, 1499–1519 (2015).

  42. 42.

    Wang, X. Y. et al. Musashi1 modulates mammary progenitor cell expansion through proliferin-mediated activation of the Wnt and Notch pathways. Mol. Cell Biol. 28, 3589–3599 (2008).

  43. 43.

    Vijayan, D., Young, A., Teng, M. W. L. & Smyth, M. J. Targeting immunosuppressive adenosine in cancer. Nat. Rev. Cancer 17, 709–724 (2017).

  44. 44.

    Takebe, N. et al. Targeting Notch, Hedgehog, and Wnt pathways in cancer stem cells: clinical update. Nat. Rev. Clin. Oncol. 12, 445–464 (2015).

  45. 45.

    Thorpe, L. M., Yuzugullu, H. & Zhao, J. J. PI3K in cancer: divergent roles of isoforms, modes of activation and therapeutic targeting. Nat. Rev. Cancer 15, 7–24 (2015).

  46. 46.

    Nusse, R. & Clevers, H. Wnt/β-catenin signaling, disease, and emerging therapeutic modalities. Cell 169, 985–999 (2017).

  47. 47.

    Massague, J. TGFβ signalling in context. Nat. Rev. Mol. Cell Biol. 13, 616–630 (2012).

  48. 48.

    Meeks, H. D. et al. BRCA2 polymorphic stop codon K3326X and the risk of breast, prostate, and ovarian cancers. J. Natl Cancer Inst. 108, djv315 (2016).

  49. 49.

    CHEK2 Breast Cancer Case-Control Consortium CHEK2*1100delC and susceptibility to breast cancer: a collaborative analysis involving 10,860 breast cancer cases and 9,065 controls from 10 studies. Am. J. Hum. Genet. 74, 1175–1182 (2004).

  50. 50.

    Schmidt, M. K. et al. Age- and tumor subtype-specific breast cancer risk estimates for CHEK2*1100delC carriers. J. Clin. Oncol. 34, 2750–2760 (2016).

  51. 51.

    Kilpivaara, O. et al. CHEK2 variant I157T may be associated with increased breast cancer risk. Int. J. Cancer 111, 543–547 (2004).

  52. 52.

    Muranen, T. A. et al. Patient survival and tumor characteristics associated with CHEK2:p.I157T—findings from the Breast Cancer Association Consortium. Breast Cancer Res. 18, 98 (2016).

  53. 53.

    Killedar, A. et al. A common cancer risk-associated allele in the hTERT locus encodes a dominant negative inhibitor of telomerase. PLoS Genet. 11, e1005286 (2015).

  54. 54.

    De Basio, A. et al. Unusual roles of caspase-8 in triple-negative breast cancer cell line MDA-MB-231. Int. J. Oncol. 48, 2339–2348 (2016).

  55. 55.

    Haupt, S. et al. Targeting Mdmx to treat breast cancers with wild-type p53. Cell Death Dis. 6, e1821 (2015).

  56. 56.

    Pandya, P. H., Murray, M. E., Pollok, K. E. & Renbarger, J. L. The immune system in cancer pathogenesis: potential therapeutic approaches. J. Immunol. Res. 2016, 4273943 (2016).

  57. 57.

    Gionet, N., Jansson, D., Mader, S. & Pratt, M. A. NF-κB and estrogen receptor α interactions: differential function in estrogen receptor-negative and -positive hormone-independent breast cancer cells. J. Cell Biochem. 107, 448–459 (2009).

  58. 58.

    Fleischer, T. et al. DNA methylation at enhancers identifies distinct breast cancer lineages. Nat. Commun. 8, 1379 (2017).

  59. 59.

    Couch, F. J. et al. Genome-wide association study in BRCA1 mutation carriers identifies novel loci associated with breast and ovarian cancer risk. PLoS Genet. 9, e1003212 (2013).

  60. 60.

    Gaudet, M. M. et al. Identification of a BRCA2-specific modifier locus at 6p24 related to breast cancer risk. PLoS Genet. 9, e1003173 (2013).

  61. 61.

    Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).

  62. 62.

    Antoniou, A. C. et al. RAD51 135G → C modifies breast cancer risk among BRCA2 mutation carriers: results from a combined analysis of 19 studies. Am. J. Hum. Genet. 81, 1186–1200 (2007).

  63. 63.

    Barnes, D. R. et al. Evaluation of association methods for analysing modifiers of disease risk in carriers of high-risk mutations. Genet. Epidemiol. 36, 274–291 (2012).

  64. 64.

    Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

  65. 65.

    Zhong, H. & Prentice, R. L. Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics 9, 621–634 (2008).

  66. 66.

    Hunter, D. J. et al. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat. Genet. 39, 870–874 (2007).

  67. 67.

    Baran, Y. et al. Fast and accurate inference of local ancestry in Latino populations. Bioinformatics 28, 1359–1367 (2012).

  68. 68.

    Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G. R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).

  69. 69.

    Genomes Project, C. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  70. 70.

    Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

  71. 71.

    Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).

  72. 72.

    Li, Q. et al. Integrative eQTL-based analyses reveal the biology of breast cancer risk loci. Cell 152, 633–641 (2013).

  73. 73.

    Shabalin, A. A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353–1358 (2012).

  74. 74.

    The ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  75. 75.

    Sloan, C. A. et al. ENCODE data at the ENCODE portal. Nucleic Acids Res. 44, D726–D732 (2016).

  76. 76.

    Roadmap Epigenomics, C. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  77. 77.

    Stunnenberg, H. G. International Human Epigenome Consortium & Hirst, M. The International Human Epigenome Consortium: a blueprint for scientific collaboration and discovery. Cell 167, 1145–1149 (2016).

  78. 78.

    Pellacani, D. et al. Analysis of normal human mammary epigenomes reveals cell-specific active enhancer states and associated transcription factor networks. Cell Rep. 17, 2060–2074 (2016).

  79. 79.

    Cheneby, J., Gheorghe, M., Artufel, M., Mathelier, A. & Ballester, B. ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP-Seq experiments. Nucleic Acids Res. 46, D267–D275 (2018).

  80. 80.

    Pruitt, K. D. et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–D763 (2014).

  81. 81.

    Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

  82. 82.

    Wang, J. et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 22, 1798–1812 (2012).

  83. 83.

    Mathelier, A. et al. JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 44, D110–D115 (2016).

  84. 84.

    Tan, G. & Lenhard, B. TFBSTools: an R/bioconductor package for transcription factor binding site analysis. Bioinformatics 32, 1555–1556 (2016).

  85. 85.

    Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).

  86. 86.

    Grassi, E., Zapparoli, E., Molineris, I. & Provero, P. Total binding affinity profiles of regulatory regions predict transcription factor binding and gene expression in human cells. PLoS ONE 10, e0143627 (2015).

  87. 87.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

  88. 88.

    McLeay, R. C. & Bailey, T. L. Motif enrichment analysis: a unified framework and an evaluation on ChIP data. BMC Bioinformatics 11, 165 (2010).

  89. 89.

    Kichaev, G. et al. Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinformatics 33, 248–255 (2017).

  90. 90.

    McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

  91. 91.

    Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

  92. 92.

    Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).

  93. 93.

    Stone, E. A. & Sidow, A. Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res. 15, 978–986 (2005).

  94. 94.

    Yeo, G. & Burge, C. B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 11, 377–394 (2004).

  95. 95.

    Desmet, F. O. et al. Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res. 37, e67 (2009).

  96. 96.

    Beesley, J. et al. Chromatin interactome mapping at 139 independent breast cancer risk signals. Preprint at bioRxiv (2019).

  97. 97.

    Fullwood, M. J. et al. An oestrogen-receptor-α-bound human chromatin interactome. Nature 462, 58–64 (2009).

  98. 98.

    Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

  99. 99.

    Corradin, O. et al. Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome Res. 24, 1–13 (2014).

  100. 100.

    He, B. et al. Global view of enhancer-promoter interactome in human cells. Proc. Natl Acad. Sci. USA 111, e2191–e21999 (2014).

  101. 101.

    Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).

  102. 102.

    Moradi Marjaneh, M. et al. High-throughput allelic expression imbalance analyses identify 14 candidate breast cancer risk genes. Preprint at bioRxiv (2019).

  103. 103.

    Dixon, J. R. et al. Integrative detection and analysis of structural variation in cancer genomes. Nat. Genet. 50, 1388–1398 (2018).

  104. 104.

    Yang, Y. et al. AWESOME: a database of SNPs that affect protein post-translational modifications. Nucleic Acids Res. 47, D874–D880 (2019).

  105. 105.

    Merico, D., Isserlin, R. & Bader, G. D. Visualizing gene-set enrichment results using the Cytoscape plug-in enrichment map. Methods Mol. Biol. 781, 257–277 (2011).

  106. 106.

    Vastrik, I. et al. Reactome: a knowledge base of biologic pathways and processes. Genome Biol. 8, R39 (2007).

  107. 107.

    Schaefer, C. F. et al. PID: the Pathway Interaction Database. Nucleic Acids Res. 37, D674–D679 (2009).

  108. 108.

    Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

  109. 109.

    Romero, P. et al. Computational prediction of human metabolic pathways from the complete human genome. Genome Biol. 6, R2 (2005).

  110. 110.

    Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

  111. 111.

    Kandasamy, K. et al. NetPath: a public resource of curated signal transduction pathways. Genome Biol. 11, R3 (2010).

  112. 112.

    Thomas, P. D. et al. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 13, 2129–2141 (2003).

  113. 113.

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995).

Download references


We thank all of the individuals who took part in these studies, as well as all of the researchers, clinicians, technicians and administrative staff who enabled this work to be carried out. This work was supported by the European Union’s Horizon 2020 Research and Innovation Programme under Marie Sklodowska-Curie grant agreement number 656144. Genotyping of the OncoArray was principally funded from three sources: the PERSPECTIVE project (funded by the Government of Canada through Genome Canada and the Canadian Institutes of Health Research, the ‘Ministère de l’Économie de la Science et de l’Innovation du Québec’ (through Genome Québec) and the Quebec Breast Cancer Foundation); the NCI Genetic Associations and Mechanisms in Oncology (GAME-ON) initiative and the Discovery, Biology and Risk of Inherited Variants in Breast Cancer (DRIVE) project (NIH grants U19 CA148065 and X01HG007492); and Cancer Research UK (C1287/A10118, C8197/A16565 and C1287/A16563). BCAC is funded by Cancer Research UK (C1287/A16563), by the European Community’s Seventh Framework Programme under grant agreement 223175 (HEALTH-F2-2009-223175) (COGS) and by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreements 633784 (B-CAST) and 634935 (BRIDGES). Genotyping of the iCOGS array was funded by the European Union (HEALTH-F2-2009-223175), Cancer Research UK (C1287/A10710), the Canadian Institutes of Health Research for the ‘CIHR Team in Familial Risks of Breast Cancer’ program, and the Ministry of Economic Development, Innovation and Export Trade of Quebec (grant PSR-SIIRI-701). Combining of the GWAS data was supported in part by NIH Cancer Post-Cancer GWAS initiative grant U19 CA 148065 (DRIVE; part of the GAME-ON initiative). For a full description of funding and acknowledgments, see the Supplementary Note.

Author information