Sustained expression of the estrogen receptor-α (ESR1) drives two-thirds of breast cancer and defines the ESR1-positive subtype. ESR1 engages enhancers upon estrogen stimulation to establish an oncogenic expression program1. Somatic copy number alterations involving the ESR1 gene occur in approximately 1% of ESR1-positive breast cancers2,3,4,5, suggesting that other mechanisms underlie the persistent expression of ESR1. We report significant enrichment of somatic mutations within the set of regulatory elements (SRE) regulating ESR1 in 7% of ESR1-positive breast cancers. These mutations regulate ESR1 expression by modulating transcription factor binding to the DNA. The SRE includes a recurrently mutated enhancer whose activity is also affected by rs9383590, a functional inherited single-nucleotide variant (SNV) that accounts for several breast cancer risk–associated loci. Our work highlights the importance of considering the combinatorial activity of regulatory elements as a single unit to delineate the impact of noncoding genetic alterations on single genes in cancer.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


Primary accessions

Gene Expression Omnibus

Referenced accessions

Gene Expression Omnibus


  1. 1.

    & Oestrogen-receptor-mediated transcription and the influence of co-factors and chromatin state. Nat. Rev. Cancer 7, 713–722 (2007).

  2. 2.

    , , , & ESR1 gene amplification in breast cancer: a common phenomenon? Nat. Genet. 40, 809, author reply 810–812 (2008).

  3. 3.

    et al. ESR1 gene amplification in breast cancer: a common phenomenon? Nat. Genet. 40, 806–807, author reply 810–812 (2008).

  4. 4.

    et al. ESR1 gene amplification in breast cancer: a common phenomenon? Nat. Genet. 40, 807–808, author reply 810–812 (2008).

  5. 5.

    et al. ESR1 gene amplification in breast cancer: a common phenomenon? Nat. Genet. 40, 809–810, author reply 810–812 (2008).

  6. 6.

    et al. Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression. Nat. Genet. 44, 1191–1198 (2012).

  7. 7.

    et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

  8. 8.

    , , , & Linking disease associations with regulatory information in the human genome. Genome Res. 22, 1748–1759 (2012).

  9. 9.

    et al. Reduced local mutation density in regulatory DNA of cancer genomes is linked to DNA repair. Nat. Biotechnol. 32, 71–75 (2014).

  10. 10.

    et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364 (2015).

  11. 11.

    et al. Recessive mutations in a distal PTF1A enhancer cause isolated pancreatic agenesis. Nat. Genet. 46, 61–64 (2014).

  12. 12.

    et al. TERT promoter mutations in familial and sporadic melanoma. Science 339, 959–961 (2013).

  13. 13.

    et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).

  14. 14.

    et al. Genome-wide association study identifies a new breast cancer susceptibility locus at 6q25.1. Nat. Genet. 41, 324–328 (2009).

  15. 15.

    et al. Novel breast cancer susceptibility locus at 9q31.2: results of a genome-wide association study. J. Natl. Cancer Inst. 103, 425–435 (2011).

  16. 16.

    et al. Genome-wide association study identifies five new breast cancer susceptibility loci. Nat. Genet. 42, 504–507 (2010).

  17. 17.

    et al. A meta-analysis of genome-wide association studies of breast cancer identifies two novel susceptibility loci at 6q14 and 20q11. Hum. Mol. Genet. 21, 5373–5384 (2012).

  18. 18.

    et al. Ancestry-shift refinement mapping of the C6orf97ESR1 breast cancer susceptibility locus. PLoS Genet. 6, e1001029 (2010).

  19. 19.

    ENCODE Project Consortium. A user's guide to the Encyclopedia of DNA Elements (ENCODE). PLoS Biol. 9, e1001046 (2011).

  20. 20.

    1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  21. 21.

    et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).

  22. 22.

    et al. ZNF143 provides sequence specificity to secure chromatin interactions at gene promoters. Nat. Commun. 2, 6186 (2015).

  23. 23.

    et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012).

  24. 24.

    et al. Integrative eQTL-based analyses reveal the biology of breast cancer risk loci. Cell 152, 633–641 (2013).

  25. 25.

    et al. Breast cancer risk variants at 6q25 display different phenotype associations and regulate ESR1, RMND1 and CCDC170. Nat. Genet. 48, 374–386 (2016).

  26. 26.

    et al. Cell type–specific binding patterns reveal that TCF7L2 can be tethered to the genome by association with GATA3. Genome Biol. 13, R52 (2012).

  27. 27.

    et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

  28. 28.

    , , & The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012).

  29. 29.

    & Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010).

  30. 30.

    & Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Res. 33, 3390–3400 (2005).

  31. 31.

    et al. The transcription factor GABP selectively binds and activates the mutant TERT promoter in cancer. Science 348, 1036–1039 (2015).

  32. 32.

    et al. ESR1 is co-expressed with closely adjacent uncharacterised genes spanning a breast cancer susceptibility locus at 6q25.1. PLoS Genet. 7, e1001382 (2011).

  33. 33.

    et al. C6ORF97ESR1 breast cancer susceptibility locus: influence on progression and survival in breast cancer patients. Eur. J. Hum. Genet. 23, 949–956 (2015).

  34. 34.

    et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).

  35. 35.

    et al. CTCF/cohesin-binding sites are frequently mutated in cancer. Nat. Genet. 47, 818–821 (2015).

  36. 36.

    , , , & Reconfiguration of nucleosome-depleted regions at distal regulatory elements accompanies DNA methylation of enhancers and insulators in cancer. Genome Res. 24, 1421–1432 (2014).

  37. 37.

    et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat. Genet. 40, 1253–1260 (2008).

  38. 38.

    et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

  39. 39.

    et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

  40. 40.

    et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

  41. 41.

    et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

  42. 42.

    et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

  43. 43.

    , , & A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003).

  44. 44.

    , , & ABC: a tool to identify SNVs causing allele-specific transcription factor binding from ChIP-Seq experiments. Bioinformatics 31, 3057–3059 (2015).

  45. 45.

    & BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

  46. 46.

    et al. Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nat. Genet. 43, 264–268 (2011).

  47. 47.

    et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

  48. 48.

    et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).

  49. 49.

    et al. Exonic transcription factor binding directs codon choice and affects protein evolution. Science 342, 1367–1372 (2013).

  50. 50.

    et al. Frequent somatic transfer of mitochondrial DNA into the nuclear genome of human cancer cells. Genome Res. 25, 814–824 (2015).

  51. 51.

    & Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

  52. 52.

    et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

  53. 53.

    et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

  54. 54.

    et al. HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Res. 41, D195–D202 (2013).

  55. 55.

    , & FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).

  56. 56.

    & Analysis of relative gene expression data using real-time quantitative PCR and the 2(−ΔΔCT) method. Methods 25, 402–408 (2001).

  57. 57.

    , , & Genotyping with TaqMAMA. Genomics 83, 311–320 (2004).

  58. 58.

    et al. Quantitative analysis of chromosome conformation capture assays (3C-qPCR). Nat. Protoc. 2, 1722–1733 (2007).

Download references


We thank A. Razak, C. Elser, D. Cescon, D. Warr, E. Amir, L. Siu, N. Leighl and S. Sridhar for their involvement in recruiting the IMPACT and COMPACT samples used in this study. We also thank M. Lemaire for helpful discussions. We thank R. Rottapel and O. Kent for use of and help with the Glomax Multi-Detection system. We acknowledge the ENCODE consortium and the ENCODE production laboratories that generated the data sets provided by the ENCODE Data Coordination Center used in the manuscript. We also acknowledge the Cancer Genome Project, for making all the breast cancer and liver cancer called mutations publicly available, and the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), for making the genotyping and expression data from primary breast tumors data available. We acknowledge the Princess Margaret Genomics Centre and the Bioinformatics group for providing the infrastructure assisting us with the targeted sequencing and analysis of the ESR1 SRE. Supported by the National Cancer Institute (NCI) at the National Institute of Health (NIH) (R01CA155004 to M.L.), the Princess Margaret Cancer Foundation (T.J.P. and M.L.), The Canadian Cancer Society (CCSRI702922 to M.L.), the Susan G. Komen Foundation (CCR15332792 to T.J.P.) and the Gattuso-Slaight Personalized Cancer Medicine Fund/PMCF (B.H.-K.). M.L. is funded by a young investigator award from the Ontario Institute for Cancer Research (OICR), a new investigator salary award from the Canadian Institute of Health Research (CIHR) and a Movember Rising Star award from Prostate Cancer Canada (PCC) (RS2014-04). K.J.K. and R.C.P. are supported by Canadian Breast Cancer Foundation (CBCF) postdoctoral fellowships. S.D.B. is supported by a Knudson and CIHR postdoctoral fellowship.

Author information

Author notes

    • Xue Wu

    Present address: Geneseeq Technology, Inc., Toronto, Ontario, Canada.

    • Swneke D Bailey
    •  & Kinjal Desai

    These authors contributed equally to this work.


  1. Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.

    • Swneke D Bailey
    • , Ken J Kron
    • , Parisa Mazrooei
    • , Aislinn E Treloar
    • , Mark Dowar
    • , David W Cescon
    • , S Y Cindy Yang
    • , Xue Wu
    • , Rossanna C Pezo
    • , Benjamin Haibe-Kains
    • , Philippe L Bedard
    • , Trevor J Pugh
    •  & Mathieu Lupien
  2. Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.

    • Swneke D Bailey
    • , Ken J Kron
    • , Parisa Mazrooei
    • , Aislinn E Treloar
    • , S Y Cindy Yang
    • , Benjamin Haibe-Kains
    • , Tak W Mak
    • , Trevor J Pugh
    •  & Mathieu Lupien
  3. Department of Genetics, Norris Cotton Cancer Center, Dartmouth Medical School, Lebanon, New Hampshire, USA.

    • Kinjal Desai
  4. Department of Genetics, Stanford University School of Medicine, Stanford, California, USA.

    • Nicholas A Sinnott-Armstrong
  5. Campbell Family Institute for Breast Cancer Research, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.

    • Kelsie L Thu
    • , David W Cescon
    • , Jennifer Silvester
    •  & Tak W Mak
  6. Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.

    • Benjamin Haibe-Kains
  7. Division of Medical Oncology, Department of Medicine, University of Toronto, Toronto, Ontario, Canada.

    • Philippe L Bedard
  8. Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA.

    • Richard C Sallari
  9. Ontario Institute for Cancer Research, Toronto, Ontario, Canada.

    • Mathieu Lupien


  1. Search for Swneke D Bailey in:

  2. Search for Kinjal Desai in:

  3. Search for Ken J Kron in:

  4. Search for Parisa Mazrooei in:

  5. Search for Nicholas A Sinnott-Armstrong in:

  6. Search for Aislinn E Treloar in:

  7. Search for Mark Dowar in:

  8. Search for Kelsie L Thu in:

  9. Search for David W Cescon in:

  10. Search for Jennifer Silvester in:

  11. Search for S Y Cindy Yang in:

  12. Search for Xue Wu in:

  13. Search for Rossanna C Pezo in:

  14. Search for Benjamin Haibe-Kains in:

  15. Search for Tak W Mak in:

  16. Search for Philippe L Bedard in:

  17. Search for Trevor J Pugh in:

  18. Search for Richard C Sallari in:

  19. Search for Mathieu Lupien in:


The concept of interrogating the mutational load in regulatory elements converging on single genes arose through discussions between S.D.B., N.A.S.-A., R.C.S. and M.L. S.D.B. designed and/or implemented all the computational and statistical approaches except for IGR and analyzed the results under the supervision of M.L. Experimental assessment of the effect of SNVs on enhancer activity, transcription factor binding and gene expression was designed by K.D., S.D.B. and M.L. and conducted by K.D. with assistance from K.J.K., A.E.T. and X.W. The CRISPR–Cas9-based enhancer deletion was conducted by K.D., K.J.K., K.L.T., J.S. and D.W.C. under the supervision of T.W.M. and M.L. P.M. and N.A.S.-A. implemented the IGR approach to predict allele-bias binding of transcription factors on SNVs after improvements to IGR by N.A.S.-A. and R.C.S. R.C.P. and P.L.B. assessed the ESR1, PR and HER2 expression status on primary breast tumors included in our validation cohort. S.Y.C.Y. performed the alignment and gene expression quantification of the TCGA RNA-seq data. M.D. assisted in DNA capture sequencing of the primary breast tumor validation cohort under T.J.P.'s supervision. B.H.-K. oversaw the expression analysis of the METABRIC data set. M.L. oversaw the project. Figures were designed and prepared by S.D.B. and K.D. The manuscript was written by S.D.B., K.D. and M.L. with assistance from all other authors.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Mathieu Lupien.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–17, Supplementary Tables 1–9

About this article

Publication history






Further reading