Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Association analysis identifies 65 new breast cancer risk loci

This article has been updated


Breast cancer risk is influenced by rare coding variants in susceptibility genes, such as BRCA1, and many common, mostly non-coding variants. However, much of the genetic contribution to breast cancer risk remains unknown. Here we report the results of a genome-wide association study of breast cancer in 122,977 cases and 105,974 controls of European ancestry and 14,068 cases and 13,104 controls of East Asian ancestry1. We identified 65 new loci that are associated with overall breast cancer risk at P < 5 × 10−8. The majority of credible risk single-nucleotide polymorphisms in these loci fall in distal regulatory elements, and by integrating in silico data to predict target genes in breast cells at each locus, we demonstrate a strong overlap between candidate target genes and somatic driver genes in breast tumours. We also find that heritability of breast cancer due to all single-nucleotide polymorphisms in regulatory features was 2–5-fold enriched relative to the genome-wide average, with strong enrichment for particular transcription factor binding sites. These results provide further insight into genetic susceptibility to breast cancer and will improve the use of genetic risk scores for individualized screening and prevention.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: SNP associations with breast cancer risk.

Change history

  • 08 March 2018

    The link to Supplementary Table 20 was corrected.


  1. 1

    Amos, C. I. et al. The OncoArray Consortium: a network for understanding the genetic architecture of common cancers. Cancer Epidemiol. Biomarkers Prev. 26, 126–135 (2017)

    PubMed  Google Scholar 

  2. 2

    Michailidou, K. et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat. Genet. 45, 353–361 (2013)

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3

    Long, J. et al. Genome-wide association study in East Asians identifies novel susceptibility loci for breast cancer. PLoS Genet. 8, e1002532 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4

    Cai, Q. et al. Genome-wide association analysis in East Asians identifies breast cancer susceptibility loci at 1q32.1, 5q14.3 and 15q26.1. Nat. Genet. 46, 886–890 (2014)

    CAS  PubMed  PubMed Central  Google Scholar 

  5. 5

    Long, J. et al. A common deletion in the APOBEC3 genes and breast cancer risk. J. Natl Cancer Inst. 105, 573–579 (2013)

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6

    He, C. et al. Genome-wide association studies identify loci associated with age at menarche and age at natural menopause. Nat. Genet. 41, 724–728 (2009)

    CAS  PubMed  PubMed Central  Google Scholar 

  7. 7

    Kawase, T. et al. PH domain-only protein PHLDA3 is a p53-regulated repressor of Akt. Cell 136, 535–550 (2009)

    CAS  PubMed  Google Scholar 

  8. 8

    Corradin, O. et al. Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome Res. 24, 1–13 (2014)

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9

    He, B., Chen, C., Teng, L. & Tan, K. Global view of enhancer–promoter interactome in human cells. Proc. Natl Acad. Sci. USA 111, E2191–E2199 (2014)

    ADS  CAS  PubMed  Google Scholar 

  10. 10

    Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  11. 11

    Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  12. 12

    Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012)

  13. 13

    Ciriello, G. et al. Comprehensive molecular portraits of invasive lobular breast cancer. Cell 163, 506–519 (2015)

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14

    Pereira, B. et al. The somatic mutation profiles of 2,433 breast cancers refine their genomic and transcriptomic landscapes. Nat. Commun. 7, 11479 (2016)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  15. 15

    Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015)

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    Merico, D., Isserlin, R., Stueker, O., Emili, A. & Bader, G. D. Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PLoS ONE 5, e13984 (2010)

    ADS  PubMed  PubMed Central  Google Scholar 

  17. 17

    Turner, N. & Grose, R. Fibroblast growth factor signalling: from development to cancer. Nat. Rev. Cancer 10, 116–129 (2010)

    CAS  PubMed  Google Scholar 

  18. 18

    Heldin, C. H. Targeting the PDGF signaling pathway in tumor treatment. Cell Commun. Signal. 11, 97 (2013)

    PubMed  PubMed Central  Google Scholar 

  19. 19

    Howe, L. R. & Brown, A. M. Wnt signaling and breast cancer. Cancer Biol. Ther. 3, 36–41 (2004)

    CAS  PubMed  Google Scholar 

  20. 20

    Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015)

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21

    Lin, Y ., Ma, W. & Benchimol, S. Pidd, a new death-domain-containing protein, is induced by p53 and promotes apoptosis. Nat. Genet. 26, 122–127 (2000)

    CAS  PubMed  Google Scholar 

  22. 22

    Fox, S. B. et al. CITED4 inhibits hypoxia-activated transcription in cancer cells, and its cytoplasmic location in breast cancer is associated with elevated expression of tumor cell hypoxia-inducible factor 1α. Cancer Res. 64, 6075–6081 (2004)

    CAS  PubMed  Google Scholar 

  23. 23

    Michailidou, K. et al. Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nat. Genet. 47, 373–380 (2015)

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24

    O’Connell, J. et al. A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet. 10, e1004234 (2014)

    PubMed  PubMed Central  Google Scholar 

  25. 25

    Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009)

    PubMed  PubMed Central  Google Scholar 

  26. 26

    Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G. R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27

    Li, Y ., Willer, C. J ., Ding, J ., Scheet, P . & Abecasis, G. R. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010)

    PubMed  PubMed Central  Google Scholar 

  28. 28

    Aulchenko, Y. S., Struchalin, M. V. & van Duijn, C. M. ProbABEL package for genome-wide association analysis of imputed data. BMC Bioinformatics 11, 134 (2010)

    PubMed  PubMed Central  Google Scholar 

  29. 29

    Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010)

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30

    Skol, A. D., Scott, L. J., Abecasis, G. R. & Boehnke, M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 38, 209–213 (2006)

    CAS  PubMed  Google Scholar 

  31. 31

    R Core Team. R: A Language and Environment for Statistical Computing (2016)

  32. 32

    The ENCODE Project Consortium. A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9, e1001046 (2011)

  33. 33

    Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015)

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34

    Udler, M. S., Tyrer, J. & Easton, D. F. Evaluating the power to discriminate between highly correlated SNPs in genetic association studies. Genet. Epidemiol. 34, 463–468 (2010)

    PubMed  Google Scholar 

  35. 35

    Maller, J. B. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36

    Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37

    Baran, Y. et al. Fast and accurate inference of local ancestry in Latino populations. Bioinformatics 28, 1359–1367 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38

    The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)

  39. 39

    Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011)

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40

    Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011)

    PubMed  PubMed Central  Google Scholar 

  41. 41

    Li, Q. et al. Integrative eQTL-based analyses reveal the biology of breast cancer risk loci. Cell 152, 633–641 (2013)

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42

    Shabalin, A. A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353–1358 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  43. 43

    Karolchik, D. et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32, D493–D496 (2004)

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44

    Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014)

    CAS  PubMed  PubMed Central  Google Scholar 

  45. 45

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010)

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46

    Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47

    Hnisz, D. et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013)

    CAS  Google Scholar 

  48. 48

    Joly Beauparlant, C. et al. metagene profiles analyses reveal regulatory element’s factor-specific recruitment patterns. PLOS Comput. Biol. 12, e1004751 (2016)

    PubMed  PubMed Central  Google Scholar 

  49. 49

    Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014)

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50

    Shihab, H. A. et al. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum. Mutat. 34, 57–65 (2013)

    CAS  PubMed  Google Scholar 

  51. 51

    Chun, S. & Fay, J. C. Identification of deleterious mutations within three human genomes. Genome Res. 19, 1553–1561 (2009)

    CAS  PubMed  PubMed Central  Google Scholar 

  52. 52

    Reva, B., Antipin, Y. & Sander, C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 39, e118 (2011)

    CAS  PubMed  PubMed Central  Google Scholar 

  53. 53

    Schwarz, J. M., Cooper, D. N., Schuelke, M. & Seelow, D. MutationTaster2: mutation prediction for the deep-sequencing age. Nat. Methods 11, 361–362 (2014)

    CAS  PubMed  Google Scholar 

  54. 54

    Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010)

    CAS  PubMed  PubMed Central  Google Scholar 

  55. 55

    Choi, Y., Sims, G. E., Murphy, S., Miller, J. R. & Chan, A. P. Predicting the functional effect of amino acid substitutions and indels. PLoS ONE 7, e46688 (2012)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  56. 56

    Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009)

    CAS  Google Scholar 

  57. 57

    Desmet, F. O. et al. Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res. 37, e67 (2009)

    PubMed  PubMed Central  Google Scholar 

  58. 58

    Yeo, G. & Burge, C. B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 11, 377–394 (2004)

    CAS  Google Scholar 

  59. 59

    Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP–trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014)

    CAS  Google Scholar 

  60. 60

    Ghoussaini, M. et al. Evidence that breast cancer risk at the 2q35 locus is mediated through IGFBP5 regulation. Nat. Commun. 4, 4999 (2014)

    PubMed  Google Scholar 

  61. 61

    Joshi-Tope, G. et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33, D428–D432 (2005)

    CAS  PubMed  Google Scholar 

  62. 62

    Schaefer, C. F. et al. PID: the pathway interaction database. Nucleic Acids Res. 37, D674–D679 (2009)

    CAS  PubMed  Google Scholar 

  63. 63

    Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000)

    CAS  PubMed  PubMed Central  Google Scholar 

  64. 64

    Romero, P. et al. Computational prediction of human metabolic pathways from the complete human genome. Genome Biol. 6, R2 (2004)

    PubMed  PubMed Central  Google Scholar 

  65. 65

    Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  66. 66

    Kandasamy, K. et al. NetPath: a public resource of curated signal transduction pathways. Genome Biol. 11, R3 (2010)

    PubMed  PubMed Central  Google Scholar 

  67. 67

    Thomas, P. D. et al. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 13, 2129–2141 (2003)

    CAS  PubMed  PubMed Central  Google Scholar 

  68. 68

    Wang, L., Jia, P., Wolfinger, R. D., Chen, X. & Zhao, Z. Gene set analysis of genome-wide association studies: methodological issues and perspectives. Genomics 98, 1–8 (2011)

    CAS  PubMed  Google Scholar 

  69. 69

    Wang, K., Li, M. & Hakonarson, H. Analysing biological pathways in genome-wide association studies. Nat. Rev. Genet. 11, 843–854 (2010)

    CAS  PubMed  Google Scholar 

  70. 70

    Wang, K., Li, M. & Bucan, M. Pathway-based approaches for analysis of genomewide association studies. Am. J. Hum. Genet. 81, 1278–1283 (2007)

    CAS  PubMed  PubMed Central  Google Scholar 

  71. 71

    Mogushi, K. & Tanaka, H. PathAct: a novel method for pathway analysis using gene expression profiles. Bioinformation 9, 394–400 (2013)

    PubMed  PubMed Central  Google Scholar 

  72. 72

    Medina, I. et al. Gene set-based analysis of polymorphisms: finding pathways or biological processes associated to traits in genome-wide association studies. Nucleic Acids Res. 37, W340–W344 (2009)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  73. 73

    Lee, Y. H., Kim, J. H. & Song, G. G. Genome-wide pathway analysis of breast cancer. Tumour Biol. 35, 7699–7705 (2014)

    CAS  PubMed  Google Scholar 

  74. 74

    Jia, P., Zheng, S., Long, J., Zheng, W. & Zhao, Z. dmGWAS: dense module searching for genome-wide association studies in protein–protein interaction networks. Bioinformatics 27, 95–102 (2011)

    CAS  PubMed  Google Scholar 

  75. 75

    Braun, R. & Buetow, K. Pathways of distinction analysis: a new technique for multi-SNP analysis of GWAS data. PLoS Genet. 7, e1002101 (2011)

    CAS  PubMed  PubMed Central  Google Scholar 

  76. 76

    Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003)

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank all the individuals who took part in these studies and all the researchers, clinicians, technicians and administrative staff who have enabled this work to be carried out. Genotyping of the OncoArray was principally funded from three sources: the PERSPECTIVE project, funded by the Government of Canada through Genome Canada and the Canadian Institutes of Health Research, the ‘Ministère de l’Économie, de la Science et de l’Innovation du Québec’ through Genome Québec, and the Quebec Breast Cancer Foundation; the NCI Genetic Associations and Mechanisms in Oncology (GAME-ON) initiative and Discovery, Biology and Risk of Inherited Variants in Breast Cancer (DRIVE) project (NIH Grants U19 CA148065 and X01HG007492); and Cancer Research UK (C1287/A10118 and C1287/A16563). BCAC is funded by Cancer Research UK (C1287/A16563), by the European Community’s Seventh Framework Programme under grant agreement 223175 (HEALTH-F2-2009-223175) (COGS) and by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreements 633784 (B-CAST) and 634935 (BRIDGES). Genotyping of the iCOGS array was funded by the European Union (HEALTH-F2-2009-223175), Cancer Research UK (C1287/A10710), the Canadian Institutes of Health Research for the ‘CIHR Team in Familial Risks of Breast Cancer’ program, and the Ministry of Economic Development, Innovation and Export Trade of Quebec, grant PSR-SIIRI-701. Combining of the GWAS data was supported in part by The National Institute of Health (NIH) Cancer Post-Cancer GWAS initiative grant U19 CA 148065 (DRIVE, part of the GAME-ON initiative). For a full description of funding and acknowledgments, see Supplementary Note.

Author information