We analysed whole-genome sequences of 560 breast cancers to advance understanding of the driver mutations conferring clonal advantage and the mutational processes generating somatic mutations. We found that 93 protein-coding cancer genes carried probable driver mutations. Some non-coding regions exhibited high mutation frequencies, but most have distinctive structural features probably causing elevated mutation rates and do not contain driver mutations. Mutational signature analysis was extended to genome rearrangements and revealed twelve base substitution and six rearrangement signatures. Three rearrangement signatures, characterized by tandem duplications or deletions, appear associated with defective homologous-recombination-based DNA repair: one with deficient BRCA1 function, another with deficient BRCA1 or BRCA2 function, the cause of the third is unknown. This analysis of all classes of somatic mutation across exons, introns and intergenic regions highlights the repertoire of cancer genes and mutational processes operating, and progresses towards a comprehensive account of the somatic genetic basis of breast cancer.
At a glance
- The cancer genome. Nature 458, 719–724 (2009) , &
- Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012) et al.
- The life history of 21 breast cancers. Cell 149, 994–1007 (2012) et al.
- Novel patterns of genome rearrangement and their association with survival in breast cancer. Genome Res. 16, 1465–1479 (2006) et al.
- Extracellular matrix signature identifies breast cancer subgroups with different clinical outcome. J. Pathol. 214, 357–367 (2008) et al.
- Integrated analysis of copy number and loss of heterozygosity in primary breast carcinomas using high-density SNP array. Int. J. Oncol. 39, 621–633 (2011) , , , &
- Genomic differences between estrogen receptor (ER)-positive and ER-negative human breast carcinoma identified by single nucleotide polymorphism array comparative genome hybridization analysis. Cancer 117, 2024–2034 (2011) et al.
- The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012) et al.
- A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191–196 (2010) et al.
- A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184–190 (2010) et al.
- Sequence analysis of mutations and translocations across breast cancer subtypes. Nature 486, 405–409 (2012) et al.
- Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature 486, 353–360 (2012) et al.
- The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature 486, 395–399 (2012) et al.
- The landscape of cancer genes and mutational processes in breast cancer. Nature 486, 400–404 (2012) et al.
- The Cancer Genome Atlas Network Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012)
- Identification of targetable FGFR gene fusions in diverse cancers. Cancer Discovery 3, 636–647 (2013) et al.
- Breakpoint analysis of transcriptional and genomic profiles uncovers novel gene fusions spanning multiple human cancer types. PLoS Genet. 9, e1003464 (2013) et al.
- Functionally recurrent rearrangements of the MAST kinase and Notch gene families in breast cancer. Nature Med. 17, 1646–1651 (2011) et al.
- Activation of human telomerase reverse transcriptase through gene fusion in clear cell sarcoma of the kidney. Cancer Lett. 357, 498–501 (2015) et al.
- Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013) et al.
- The long noncoding RNAs NEAT1 and MALAT1 bind active chromatin sites. Mol. Cell 55, 791–802 (2014) et al.
- Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013) et al.
- Frequency of TERT promoter mutations in human cancers. Nature Commun. 4, 2185 (2013) et al.
- Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013) et al.
- Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3, 246–259 (2013) , , , &
- Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014) et al.
- Characterization of the genomic features and expressed fusion genes in micropapillary carcinomas of the breast. J. Pathol. 232, 553–565 (2014) et al.
- Gene fusions associated with recurrent amplicons represent a class of passenger aberrations in breast cancer. Neoplasia 14, 702–708 (2012) et al.
- Somatic structural variation and cancer. Brief. Func. Genomics 14, 339–351 (2015)
- Genome-wide analysis of noncoding regulatory mutations in cancer. Nature Genet. 46, 1160–1165 (2014) , , , &
- Genome update: DNA repeats in bacterial genomes. Microbiology 150, 3519–3521 (2004) , , , &
- Short inverted repeats are hotspots for genetic instability: relevance to cancer genomes. Cell Rep. 10, 1674–1680 (2015) et al.
- Replication stalling at unstable inverted repeats: interplay between DNA hairpins and fork stabilizing proteins. Proc. Natl Acad. Sci. USA 105, 9936–9941 (2008) , , &
- Direct and inverted repeats elicit genetic instability by both exploiting and eluding DNA double-strand break repair systems in mycobacteria. PLoS ONE 7, e51064 (2012) et al.
- Inverted repeats, stem-loops, and cruciforms: significance for initiation of DNA replication. J. Cell. Biochem. 63, 1–22 (1996) , , &
- Interpreting cDNA sequences: some insights from studies on translation. Mamm. Genome 7, 563–574 (1996)
- Mechanisms underlying mutational signatures in human cancers. Nature Rev. Genet. 15, 585–598 (2014) , &
- Telomeric allelic imbalance indicates defective DNA repair and sensitivity to DNA-damaging agents. Cancer Disc. 2, 366–375 (2012) et al.
- Patterns of genomic loss of heterozygosity predict homologous recombination repair defects in epithelial ovarian cancer. Br. J. Cancer 107, 1776–1782 (2012) et al.
- Ploidy and large-scale genomic instability consistently identify basal-like breast carcinomas with BRCA1/2 inactivation. Cancer Res. 72, 5454–5462 (2012) et al.
- Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature 475, 101–105 (2011) et al.
- The topography of mutational processes in breast cancer genomes. Nature Commun. http://dx.doi.org/10.1038/ncomms11383 (2016) et al.
- Inhibition of poly(ADP-ribose) polymerase in tumors from BRCA mutation carriers. N. Engl. J. Med. 361, 123–134 (2009) et al.
- Treatment with olaparib in a patient with PTEN-deficient endometrioid endometrial cancer. Nature Rev. Clin. Oncol. 8, 302–306 (2011) et al.
- Targeting the DNA repair defect of BRCA tumours. Curr. Opin. Pharmacol. 5, 388–393 (2005) , &
- Whole genomes redefine the mutational landscape of pancreatic cancer. Nature 518, 495–501 (2015) et al.
- Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nature Methods 6, 291–295 (2009) et al.
- Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009) &
- Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009) , , , &
- Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008) &
- Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010) et al.
- Statistical analysis of pathogenicity of somatic mutations in cancer. Genetics 173, 2187–2198 (2006) , , , &
- Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013) et al.
- Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies. Genet. Epidemiol. 30, 519–530 (2006) , , &
- The ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)
- ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 26, 1572–1573 (2010) &
- RCircos: an R package for Circos 2D track plots. BMC Bioinformatics 14, 244 (2013) , &
Extended data figures and tables
Extended Data Figures
- Extended Data Figure 1: Landscape of driver mutations. (261 KB)
a, Summary of subtypes of cohort of 560 breast cancers. b, Driver mutations by mutation type. c, Distribution of rearrangements throughout the genome. Black line represents background rearrangement density (calculation based on rearrangement breakpoints in intergenic regions only). Red lines represent frequency of rearrangement within breast cancer genes.
- Extended Data Figure 2: Rearrangements in oncogenes. (209 KB)
a, Variation in rearrangement and copy number events affecting ESR1. Clear amplification in top panel, transection of ESR1 in middle panel and focused tandem duplication events in bottom panel. b, Predicted outcomes of some rearrangements affecting ETV6. Red crosses indicate exons deleted as a result of rearrangements within the ETV6 genes, black dotted lines indicate rearrangement break points resulting in fusions between ETV6 and ERC, WNK1, ATP2B1 or LRP6. ETV6 domains indicated are: N-terminal (NT) pointed domain and E26 transformation-specific DNA binding domain (ETS).
- Extended Data Figure 3: Recurrent non-coding events in breast cancers. (650 KB)
a, Manhattan plot demonstrating sites with most significant P values as identified by binning analysis. Purple highlighted sites were also detected by the method seeking recurrence when partitioned by genomic features. b, Locus at chr11 65 Mb, which was identified by independent analyses as being more mutated than expected by chance. Bottom, a rearrangement hotspot analysis identified this region as a tandem duplication hotspot, with nested tandem duplications noted at this site. Partitioning the genome into different regulatory elements, an analysis of substitutions and indels identified lncRNAs MALAT1 and NEAT1 (topmost panels) with significant P values.
- Extended Data Figure 4: Copy number analyses. (919 KB)
a, Frequency of copy number aberrations across the cohort. Chromosome position along x axis, frequency of copy number gains (red) and losses (green) y axis. b, Identification of focal recurrent copy number gains by the GISTIC method (Supplementary Methods). c, Identification of focal recurrent copy number losses by the GISTIC method. d, Heatmap of GISTIC regions following unsupervised hierarchical clustering. Five cluster groups are noted and relationships with expression subtype (basal, red; luminal B, light blue; luminal A, dark blue), immunohistopathology status (ER, PR, HER2 status; black, positive), abrogation of BRCA1 (red) and BRCA2 (blue) (whether germline, somatic or through promoter hypermethylation), driver mutations (black, positive), HRD index (top 25% or lowest 25%; black, positive).
- Extended Data Figure 5: miRNA analyses. (664 KB)
Hierarchical clustering of the most variant miRNAs using complete linkage and Euclidean distance. miRNA clusters were assigned using the partitioning algorithm using recursive thresholding (PART) method. Five main patient clusters were revealed. The horizontal annotation bars show (from top to bottom): PART cluster group, PAM50 mRNA expression subtype, GISTIC cluster, rearrangement cluster, lymphocyte infiltration score and histological grade. The heatmap shows clustered and centred miRNA expression data (log2 transformed). Details on colour coding of the annotation bars are presented below the heatmap.
- Extended Data Figure 6: Rearrangement cluster groups and associated features. (264 KB)
a, Overall survival (OS) by rearrangement cluster group. b, Age of diagnosis. c, Tumour grade. d, Menopausal status. e, ER status. f, Immune response metagene panel. g, Lymphocytic infiltration score.
- Extended Data Figure 7: Contrasting tandem duplication phenotypes. (171 KB)
Contrasting tandem duplication phenotypes of two breast cancers using chromosome X. Copy number (y axis) depicted as black dots. Lines represent rearrangements breakpoints (green, tandem duplications; pink, deletions; blue, inversions; black, translocations with partner breakpoint provided). Top, PD4841a has numerous large tandem duplications (>100 kb, rearrangement signature 1), whereas PD4833a has many short tandem duplications (<10 kb, rearrangement signature 3) appearing as ‘single’ lines in its plot.
- Extended Data Figure 8: Hotspots of tandem duplications. (445 KB)
A tandem duplication hotspot occurring in six different patients.
- Extended Data Figure 9: Rearrangement breakpoint junctions. (201 KB)
a, Breakpoint features of rearrangements in 560 breast cancers by rearrangement signature. b, Breakpoint features in BRCA and non-BRCA cancers.
- Extended Data Figure 10: Signatures of focal hypermutation. (331 KB)
a, Kataegis and alternative kataegis occurring at the same locus (ERBB2 amplicon in PD13164a). Copy number (y axis) depicted as black dots. Lines represent rearrangements breakpoints (green, tandem duplications; pink, deletions; blue, inversions). Top, an ~10 Mb region including the ERBB2 locus. Middle, zoomed-in tenfold to an ~1 Mb window highlighting co-occurrence of rearrangement breakpoints, with copy number changes and three different kataegis loci. Bottom, demonstrates kataegis loci in more detail. log10 intermutation distance on y axis. Black arrow, kataegis; blue arrows, alternative kataegis. b, Sequence context of kataegis and alternative kataegis identified in this data set.
- Supplementary Information (2.2 MB)
This file contains Supplementary Methods and Data and additional references.
- Supplementary Information (187 KB)
This file contains some acknowledgements and the EGA accession numbers.
- Supplementary Tables (41.2 MB)
This file zipped contains Supplementary Tables 1-21.