A somatic-mutational process recurrently duplicates germline susceptibility loci and tissue-specific super-enhancers in breast cancers

Article metrics

  • A Corrigendum to this article was published on 27 October 2017


Somatic rearrangements contribute to the mutagenized landscape of cancer genomes. Here, we systematically interrogated rearrangements in 560 breast cancers by using a piecewise constant fitting approach. We identified 33 hotspots of large (>100 kb) tandem duplications, a mutational signature associated with homologous-recombination-repair deficiency. Notably, these tandem-duplication hotspots were enriched in breast cancer germline susceptibility loci (odds ratio (OR) = 4.28) and breast-specific 'super-enhancer' regulatory elements (OR = 3.54). These hotspots may be sites of selective susceptibility to double-strand-break damage due to high transcriptional activity or, through incrementally increasing copy number, may be sites of secondary selective pressure. The transcriptomic consequences ranged from strong individual oncogene effects to weak but quantifiable multigene expression effects. We thus present a somatic-rearrangement mutational process affecting coding sequences and noncoding regulatory elements and contributing a continuum of driver consequences, from modest to strong effects, thereby supporting a polygenic model of cancer development.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Spectrum of distribution of rearrangements in human breast cancers.
Figure 2: Identifying hotspots of rearrangements.
Figure 3: Hotspots of dispersed rearrangements.
Figure 4: Genomic consequences of the tandem-duplication signatures.
Figure 5: From selective susceptibility to selective pressure.

Change history

  • 13 February 2017

    In the version of this article initially published online, in the Methods section, under subheading "Rearrangement signatures," the statement "Putative regions of clustered rearrangements were identified as having an average inter-rearrangement distance at least ten times greater than the whole-genome average for the individual sample" should have read "ten times less than." The error has been corrected in the print, PDF and HTML versions of this article.


  1. 1

    Huang, F.W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).

  2. 2

    Vinagre, J. et al. Frequency of TERT promoter mutations in human cancers. Nat. Commun. 4, 2185 (2013).

  3. 3

    Puente, X.S. et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature 526, 519–524 (2015).

  4. 4

    Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).

  5. 5

    Alexandrov, L.B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

  6. 6

    Mehta, A. & Haber, J.E. Sources of DNA double-strand breaks and models of recombinational DNA repair. Cold Spring Harb. Perspect. Biol. 6, a016428 (2014).

  7. 7

    Ceccaldi, R., Rondinelli, B. & D'Andrea, A.D. Repair pathway choices and consequences at the double-strand break. Trends Cell Biol. 26, 52–64 (2016).

  8. 8

    Morganella, S. et al. The topography of mutational processes in breast cancer genomes. Nat. Commun. 7, 11383 (2016).

  9. 9

    Helleday, T., Eshtad, S. & Nik-Zainal, S. Mechanisms underlying mutational signatures in human cancers. Nat. Rev. Genet. 15, 585–598 (2014).

  10. 10

    Waddell, N. et al. Whole genomes redefine the mutational landscape of pancreatic cancer. Nature 518, 495–501 (2015).

  11. 11

    Patch, A.M. et al. Whole-genome characterization of chemoresistant ovarian cancer. Nature 521, 489–494 (2015).

  12. 12

    Menghi, F. et al. The tandem duplicator phenotype as a distinct genomic configuration in cancer. Proc. Natl. Acad. Sci. USA 113, E2373–E2382 (2016).

  13. 13

    McBride, D.J. et al. Tandem duplication of chromosomal segments is common in ovarian and breast cancer genomes. J. Pathol. 227, 446–455 (2012).

  14. 14

    Stephens, P.J. et al. Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature 462, 1005–1010 (2009).

  15. 15

    Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).

  16. 16

    Nilsson, B., Johansson, M., Heyden, A., Nelander, S. & Fioretos, T. An improved method for detecting and delineating genomic regions with altered gene expression in cancer. Genome Biol. 9, R13 (2008).

  17. 17

    Nilsen, G. et al. Copynumber: efficient algorithms for single- and multi-track copy number segmentation. BMC Genomics 13, 591 (2012).

  18. 18

    Garcia-Closas, M. et al. Genome-wide association studies identify four ER negative-specific breast cancer risk loci. Nat. Genet. 45, 392–398, e1–e2 (2013).

  19. 19

    Easton, D.F. et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447, 1087–1093 (2007).

  20. 20

    Li, S. et al. Endocrine-therapy-resistant ESR1 variants revealed by genomic characterization of breast-cancer-derived xenografts. Cell Rep. 4, 1116–1130 (2013).

  21. 21

    Robinson, D.R. et al. Activating ESR1 mutations in hormone-resistant metastatic breast cancer. Nat. Genet. 45, 1446–1451 (2013).

  22. 22

    Soucek, L. et al. Modelling Myc inhibition as a cancer therapy. Nature 455, 679–683 (2008).

  23. 23

    Shi, J. et al. Role of SWI/SNF in acute leukemia maintenance and enhancer-mediated Myc regulation. Genes Dev. 27, 2648–2662 (2013).

  24. 24

    Zhang, X. et al. Identification of focally amplified lineage-specific super-enhancers in human epithelial cancers. Nat. Genet. 48, 176–182 (2016).

  25. 25

    Costantino, L. et al. Break-induced replication repair of damaged forks induces genomic duplications in human cells. Science 343, 88–91 (2014).

  26. 26

    Willis, N.A., Rass, E. & Scully, R. Deciphering the code of the cancer genome: mechanisms of chromosome rearrangement. Trends Cancer 1, 217–230 (2015).

  27. 27

    Saini, N. et al. Migrating bubble during break-induced replication drives conservative DNA synthesis. Nature 502, 389–392 (2013).

  28. 28

    Sloan, C.A. et al. ENCODE data at the ENCODE portal. Nucleic Acids Res. 44, D726–D732 (2016).

  29. 29

    Castro-Giner, F., Ratcliffe, P. & Tomlinson, I. The mini-driver model of polygenic cancer evolution. Nat. Rev. Cancer 15, 680–685 (2015).

  30. 30

    Roy, A. et al. Recurrent internal tandem duplications of BCOR in clear cell sarcoma of the kidney. Nat. Commun. 6, 8891 (2015).

  31. 31

    Zerbino, D.R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

  32. 32

    Cox, A. et al. A common coding variant in CASP8 is associated with breast cancer risk. Nat. Genet. 39, 352–358 (2007).

  33. 33

    Easton, D.F. et al. A systematic genetic assessment of 1,433 sequence variants of unknown clinical significance in the BRCA1 and BRCA2 breast cancer-predisposition genes. Am. J. Hum. Genet. 81, 873–883 (2007).

  34. 34

    Ahmed, S. et al. Newly discovered breast cancer susceptibility loci on 3p24 and 17q23.2. Nat. Genet. 41, 585–590 (2009).

  35. 35

    Michailidou, K. et al. Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nat. Genet. 47, 373–380 (2015).

  36. 36

    Siddiq, A. et al. A meta-analysis of genome-wide association studies of breast cancer identifies two novel susceptibility loci at 6q14 and 20q11. Hum. Mol. Genet. 21, 5373–5384 (2012).

  37. 37

    Stacey, S.N. et al. Common variants on chromosome 5p12 confer susceptibility to estrogen receptor-positive breast cancer. Nat. Genet. 40, 703–706 (2008).

  38. 38

    Thomas, G. et al. A multistage genome-wide association study in breast cancer identifies two new risk alleles at 1p11.2 and 14q24.1 (RAD51L1). Nat. Genet. 41, 579–584 (2009).

  39. 39

    Turnbull, C. et al. Genome-wide association study identifies five new breast cancer susceptibility loci. Nat. Genet. 42, 504–507 (2010).

  40. 40

    Wei, Y. et al. SEA: a super-enhancer archive. Nucleic Acids Res. 44, D172–D179 (2016).

  41. 41

    Zerbino, D.R., Wilder, S.P., Johnson, N., Juettemann, T. & Flicek, P.R. The ensembl regulatory build. Genome Biol. 16, 56 (2015).

Download references


Data used in this analysis were funded through the ICGC Breast Cancer Working group by the Breast Cancer Somatic Genetics Study (BASIS), a European research project funded by the European Community's Seventh Framework Programme (FP7/2010-2014) under grant agreement number 242006; the Triple Negative project, funded by the Wellcome Trust (grant reference 077012/Z/05/Z); and the HER2+ project, funded by Institut National du Cancer (INCa) in France (grant nos. 226-2009, 02-2011, 41-2012, 144-2008 and 06-2012). J.W.M.M. received funding for this project through an ERC Advanced grant (no. 322737). G.K. is supported by National Research Foundation of Korea grants (NRF 2015R1A2A1A10052578). The ICGC Asian Breast Cancer Project was funded through a grant of the Korean Health Technology R&D Project, Ministry of Health & Welfare, Republic of Korea (A111218-SC01). D.G. is supported by the EU-FP7-SUPPRESSTEM project. S.N.-Z. is funded by a Wellcome Trust Intermediate Fellowship (WT100183MA) and is supported as a Wellcome Beit Fellow.

Author information

D.G. and S.N.-Z. designed the study, analyzed data and wrote the manuscript. M.R.S., P.J.C., D.E. and G.E. contributed to idea development. D.G. and S.M. performed all statistical analyses. H.D., S.M., J.D.-P., J.S., M.S. and X.Z. performed curation and contributed to analyses. M.S. contributed to curation and analysis of transcriptomic data. Y.L. and L.B.A. contributed to analysis. C.A.P., P.T.S., S.R.L., I.H.R. and H.R. contributed pathology assessment and/or samples and FISH analyses. K.R. contributed IT expertise. A.B.B., A.M.T., E.B., H.G.S., M.J.v.d.V., J.W.M.M., A.-L.B.-D., A.L.R., G.K. and A.V. contributed samples, clinical data collection and intellectual input to the project. All authors discussed the results and commented on the manuscript.

Correspondence to Serena Nik-Zainal.

Ethics declarations

Competing interests

D.G. and S.N.-Z. are inventors on a patent application relating to the use of hotspots as breast and ovarian cancer diagnostics.

Integrated supplementary information

Supplementary Figure 1 Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

(a,b) Values of coefficients associated with genomic features, separately for RS1 (a) and RS3 (b). The values of coefficients and 95% confidence intervals were obtained through negative binomial regression, where we divided the genome into 0.5-Mb bins. The panels show the exponentiated values, ewi, for ease of interpretation. The further a coefficient deviates from 1, the more it influences expected number of breakpoints in genomic regions.

Supplementary Figure 2 Tuning settings of the PCF algorithm for identification of hotspots.

This summarizes the experiments conducted to gauge optimal parameters. Experiments were performed on observed data as well as simulations of rearrangements that took into account the background model of rearrangements. The x-axis indicates the setting of PCF parameters (g and i). The y-axis indicates the number of hotspots found in the observed (black dots) and simulated (grey dots) datasets. The blue rectangles highlight the PCF parameters that were finally selected to categorize hotspots of rearrangements in the observed data. The error bars at the grey dots denote standard deviation of the count when analysing 10 different simulated datasets. Red stars show estimated false discovery rate for the range of algorithm settings.

Supplementary Figure 3 Visualization of 33 hotspots of large (>100 kb) tandem duplications.

The images display overlap of the rearrangements across the cohort, by showing cumulative number of samples with a tandem duplication involving each of the genomic regions. Dashed vertical lines represent boundaries of the hotspots. Thick red lines represent breast-tissue specific super enhancers. Blue vertical line represents position of germline susceptibility locus of breast cancer. Black lines above show positions of genes.

Supplementary Figure 4 Tandem-duplication hotspots are enriched in breast-tissue-specific super-enhancers and germline breast cancer–susceptibility loci.

(a) The likelihood of observing germline susceptibility loci coinciding with tandem duplication hotspots. Single-sided Poisson test. OR, odds ratio; error bars denote 95% confidence levels. (b) The likelihood of observing super-enhancers falling into tandem duplication hotspots. Density of breast-tissue specific super-enhancer and germline susceptibility loci for tandem duplication hotspots versus other tandemly duplicated regions that do not fall within hotspots. Single-sided Poisson test. OR, odds ratio; error bars denote 95% confidence levels. (c) Simulations were used to obtain an empirical null distribution of number of super-enhancer elements within the hotspots, presented as a histogram. We observed 59 super-enhancers in the hotspots. The likelihood of that observation according to the simulations is <0.0001.

Supplementary Figure 5 Enrichment of hotspots in breast-tissue super-enhancers and germline breast cancer–susceptibility loci is robust with respect to the parameters of the PCF algorithm.

The x-axis shows the parameter i of the PCF algorithm. First top panel shows which hotspot are detected at more stringent values of the i parameter. Second panel shows number of hotspots detected. Third and fourth panels depict the enrichments of breast cancer SNP loci and super-enhancers at more stringent values of the i parameter. Error bars denote 95% confidence intervals for the enrichment from Fisher’s exact test.

Supplementary Figure 6 Relationship between tandem-duplicated segments and breast-tissue super-enhancer loci and germline breast cancer–susceptibility SNP loci.

In this analysis, all tandem duplication that had a breakpoint that fell within 1 Mb of super-enhancers (SENH, top panel) and/or breast cancer susceptibility SNPs (lower panel) were included. The x-axis reports on a 1-Mb genomic window surrounding SENH and SNPs, respectively. The y-axis reports the fraction of tandem duplications that have duplicated any given location within the 2-Mb window, out of all rearrangements in each group. The data are presented for RS1 tandem duplications in hotspots, RS1 tandem duplications that are not within hotspots and simulated RS1 rearrangements. Note the peak demonstrated for hotspot tandem duplication centered on the regulatory element/SNP, which is not exhibited by tandem duplications that are not within hotspots or simulated data.

Supplementary Figure 7 Tandem duplications wholly or partially increase the number of copies of ESR1, which correlates with high expression of the gene.

The top panel compares the expression of ESR1 between samples with and without tandem duplications in the hotspot. Samples that have tandem duplicated ESR1, even by just a single tandem duplication, have ESR1 expression levels that are in a similar high range as ER-positive tumors and are distinctly elevated when compared to the triple-negative tumors. The boxes highlight median expression level of the gene, with lower and upper quartiles. The second panel shows expression of ESR1 in individual samples with tandem duplications in the hotspot. The bottom panel shows the position of the rearrangements with respect to ESR1 gene body on the left, and across entire chromosome 6 on the right. Copy number (y-axis) depicted as black dots (10-kb bins). Green lines present tandem duplication breakpoints.

Supplementary Figure 8 Tandem duplications in some hotspots (wholly or partially) increase the number of copies of specific driver genes associated with breast cancer, even if by only one or two copies.

Left shows focus on the hotspot. Right shows entire chromosome of the hotspot. Rows correspond to individual samples. Copy number (y-axis) depicted as black dots (10-kb bins). Green lines present tandem duplication breakpoints. The ZNF217 locus is an example of a tandem duplication hotspot. Each patient has an apparent increase in copy number through a long tandem duplication, wholly of the gene. This site is enriched for breast tissue-specific super-enhancers.

Supplementary Figure 9 Tandem duplications in the hotspots are a feature of samples with many or few rearrangements in their genomes.

A histogram of the frequency of each of the 33 RS1-enriched tandem duplication hotspots is shown in the topmost panel with the 33 hotspots noted across the horizontal axis. The number of samples with rearrangements within any of the 33 hotspots is noted on the vertical axis on the left. A histogram of the number of hotspots per sample is provided on the right (purple, BRCA1-intact HR-deficient cancers; blue, BRCA1-null HR-deficient cancers; black, all other groups). Central matrix depicts the relationship between samples and number of hotspots (black, hotspot rearrangement present).

Supplementary Figure 10 Expression of MYC in samples with and without tandem duplications in the hotspot, distinguishing among breast cancer subtypes.

The boxes highlight median expression level of the gene, with lower and upper quartiles. These data were used to fit a linear model, suggesting that a tandem duplication in the hotspot was correlated with increased expression of the gene by 0.99 log2 FPKM, with P = 4.4 x 10-4.

Supplementary Figure 11 Hotspots of tandem duplications can be detected only in cohorts with an adequate number of rearrangements.

We sub-sampled the rearrangement dataset from the breast cancer cohort, in order to assess how many hotspots we could have detected in smaller cohorts. The number of RS1 rearrangements in the ovarian cohort was sufficient to detect hotspots, and indeed, in the ovarian cohort we detected seven hotspots. The number of rearrangements in pancreatic cohort was insufficient to detect hotspots, and indeed we detected none there.

Supplementary Figure 12 A visualization of the RS1 hotspots in ovarian cancers.

The images display overlap of the rearrangements across the cohort, by showing cumulative number of samples with a tandem duplication involving each of the genomic regions. Thick red lines represent ovarian-tissue specific super enhancers. Black lines above show positions of genes. Dashed vertical lines represent boundaries of the hotspots.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–12 and Supplementary Note (PDF 2351 kb)

Supplementary Table 1

Hotspots of rearrangement signatures RS1 and RS3 identified through a PCF-based method. (a) Description of headers. (b) Summary of hotspots. (XLSX 63 kb)

Supplementary Table 2

Genomic consequences of RS1 and RS3 duplications (related to Fig. 4). Numbers of duplications and transections of genomic elements, separately for RS1 and RS3, inside and outside of the hotspots. (XLSX 39 kb)

Supplementary Table 3

Hotspots of other rearrangement signatures (RS2, RS4, RS5, RS6) identified through PCF-based method. (a) Description of headers. (b) Summary of hotspots. (XLSX 81 kb)

Supplementary Table 4

Genomic features of the RS1 hotspots. Comparison with the rest of tandem-duplicated genome with respect to: breast cancer susceptibility SNPs, breast tissue super-enhancers, non-breast super-enhancers, known oncogenes, promoters, enhancers, broad fragile sites, narrow fragile sites. (a) Description of headers. (b) Associations. (XLSX 44 kb)

Supplementary Table 5

Modeling the effects of RS1 tandem duplications on gene expression. Rows, coefficients used in the regression models. Columns, experiments with different sets of genes. In the table we show the fitted values of regression coefficients. (XLSX 37 kb)

Supplementary Table 6

Hotspots of rearrangement signatures RS1 and RS3 identified through PCF-based method in ovarian tumors. (a) Description of headers. (b) Summary of hotspots. (XLSX 55 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Glodzik, D., Morganella, S., Davies, H. et al. A somatic-mutational process recurrently duplicates germline susceptibility loci and tissue-specific super-enhancers in breast cancers. Nat Genet 49, 341–348 (2017) doi:10.1038/ng.3771

Download citation

Further reading