Breast carcinoma is the leading cause of cancer-related mortality in women worldwide, with an estimated 1.38 million new cases and 458,000 deaths in 2008 alone1. This malignancy represents a heterogeneous group of tumours with characteristic molecular features, prognosis and responses to available therapy2,3,4. Recurrent somatic alterations in breast cancer have been described, including mutations and copy number alterations, notably ERBB2 amplifications, the first successful therapy target defined by a genomic aberration5. Previous DNA sequencing studies of breast cancer genomes have revealed additional candidate mutations and gene rearrangements6,7,8,9,10. Here we report the whole-exome sequences of DNA from 103 human breast cancers of diverse subtypes from patients in Mexico and Vietnam compared to matched-normal DNA, together with whole-genome sequences of 22 breast cancer/normal pairs. Beyond confirming recurrent somatic mutations in PIK3CA11, TP536, AKT112, GATA313 and MAP3K110, we discovered recurrent mutations in the CBFB transcription factor gene and deletions of its partner RUNX1. Furthermore, we have identified a recurrent MAGI3–AKT3 fusion enriched in triple-negative breast cancer lacking oestrogen and progesterone receptors and ERBB2 expression. The MAGI3–AKT3 fusion leads to constitutive activation of AKT kinase, which is abolished by treatment with an ATP-competitive AKT small-molecule inhibitor.
Breast cancers are classified according to gene-expression subtypes: luminal A, luminal B, Her2-enriched (Her2 is also known as ERBB2), and basal-like14. Luminal subtypes are associated with expression of oestrogen and progesterone receptors and differentiated luminal epithelial cell markers. The subtypes differ in genomic complexity, key genetic alterations and clinical prognosis2,3,4,15. To discover genomic alterations in breast cancers, we performed whole-genome and whole-exome sequencing of 108 primary, treatment-naive, breast carcinoma/normal DNA pairs from all major expression subtypes (Table 1 and Supplementary Tables 1–3), 17 cases by whole-exome and whole-genome sequencing, 5 cases by whole-genome sequencing alone, and 86 cases by whole-exome sequencing alone.
In total, whole-exome sequencing was performed on 103 tumour/normal pairs, 54 from Mexico and 49 from Vietnam, targeting 189,980 exons comprising 33 megabases (Mb) of the genome and with a median of 85.1% of targeted bases covered at least 30-fold across the sample set. This analysis revealed a total of 4,985 candidate somatic substitutions (see https://confluence.broadinstitute.org/display/CGATools/MuTect for methods and data sets) and insertions/deletions (indels, see https://confluence.broadinstitute.org/display/CGATools/Indelocator for methods) in the target protein-coding regions and the adjacent splice sites, ranging from 14 to 307 putative events in individual samples (Supplementary Table 4). These mutations represented 3,153 missense, 1,157 silent, 242 nonsense, 97 splice site, 194 deletions, 110 insertions and 32 other mutations (Supplementary Table 5). The total mutation rate was 1.66 per Mb (range 0.47–10.5) with a non-silent mutation rate of 1.27 per Mb (range 0.31–8.05), similar to previous reports in breast carcinoma6,7,8,9. The mutation rate in breast cancer exceeds that of haematologic malignancies and prostate cancer, but is significantly lower than in lung cancer and melanoma10,16,17,18,19. The most common mutation events observed are C to T transition events in CpG dinucleotides (Fig. 1 and Supplementary Fig. 4).
We performed validation experiments on 494 candidate mutations (representing all significantly mutated genes and genes in significantly mutated gene sets) using a combination of mass-spectrometric genotyping, 454 pyrosequencing, Pacific Biosciences sequencing and Illumina sequencing of matched formalin-fixed paraffin-embedded tissue, and confirmed the presence of 94% of protein-altering point mutations (Supplementary Table 4 and Supplementary Fig. 5); this validation rate is consistent with previous results that 95% of point mutations can be validated with orthogonal methods16,17. Only 18 out of 39 (46%) indels among significantly mutated genes were confirmed.
Six genes were found to be mutated with significant recurrence in the 103 whole-exome sequenced samples, by analysis with the MutSig algorithm16,17 (https://confluence.broadinstitute.org/display/CGATools/MutSig) at a false discovery rate (FDR) < 0.1 after correction for multiple hypothesis testing (Supplementary Table 6a), manual review of reads, and subsequent orthogonal confirmation of somatic events (Fig. 1 and Supplementary Fig. 6). One gene, CBFB, is identified for the first time as a significantly mutated gene in breast cancer or any other epithelial cancer, to our knowledge, whereas the other five genes (TP53, PIK3CA, AKT1, GATA3 and MAP3K1) have previously been reported as mutated in breast cancer7,10,13. This significantly mutated genes list, as any list produced by a statistical method, is probably incomplete and reflects the statistical power of our cohort size—larger sample sets will provide further statistical power.
Somatic mutations in TP53 and PIK3CA were each present in 27% of samples, consistent with published frequencies10,20 (Fig. 1). TP53 mutations occur in samples with a higher mutation rate (t-test P = 0.0079 comparing samples with mutation rates greater than or less than the median 1.66 mutations per Mb) and were distributed across the gene in sites reported in COSMIC (http://www.sanger.ac.uk/genetics/CGP/cosmic/). Also, using the ABSOLUTE algorithm for determining allele-specific copy number21, we observed that 21 out of 31 TP53 mutations were homozygous (Supplementary Table 4). PIK3CA mutations were clustered in the helical (amino acids 542/545; 40%) and kinase domains (amino acid 1047; 47%)20. Six samples harboured the AKT1 E17K mutation that alters the pleckstrin-homology (PH) domain and leads to activation of the kinase12. AKT1 and PIK3CA mutations, which activate the phosphatidylinositol-3-kinase (PI3K) pathway, were mutually exclusive in our data set. MAP3K1, recently reported as mutated in oestrogen-receptor-positive breast cancers10, harboured five mutations in three patients with oestrogen-receptor-positive disease, and followed a pattern consistent with positive selection for recessive inactivation of the gene. In total, two frameshift, two nonsense and one missense mutation, combined with a homozygous deletion spanning the coding region were observed. Although the point mutations seemed to be heterozygous by copy-number analysis, two patients harboured dual mutations, consistent with compound heterozygous inactivation, although confirmatory phasing data were not available. The GATA3 transcription factor gene harboured mutations in four patients with luminal tumours, including three previously unknown frameshift mutations near the 3′-end of the coding sequence. We also identified one previously described splice-site mutation that disrupts zinc-finger domains in GATA3 required for DNA binding13.
CBFB, encoding the core-binding-factor beta subunit, was mutated in four oestrogen-receptor-positive samples, with one nonsense mutation and three truncating frameshift mutations (Fig. 2a). CBFB somatic mutations have been noted in isolated cases of breast cancer6,10. This is the first report of these mutations recurring at a significant rate above background; the sample size is not sufficient to determine whether these mutations are specific for oestrogen-receptor-positive subtypes. CBFB encodes the non-DNA-binding component of a heterodimeric protein complex, together with the DNA-binding RUNX proteins encoded by RUNX1, RUNX2 and RUNX3. Copy-number analysis, using the ABSOLUTE algorithm21, provides further evidence for loss of function of the RUNX1/CBFB complex in breast cancer: the cases with CBFB mutations seem to have hemizygous deletions of one parental allele, whereas two additional cases harbour homozygous deletions of RUNX1 (Fig. 2b, c and Supplementary Figs 7 and 8). Oncogenic rearrangements of RUNX1 or CBFB are common in acute myeloid leukaemia22,23 (including the CBFB–MYH11 translocation believed to have dominant negative function22). This is to our knowledge the first report of inactivation of this transcription factor complex in epithelial cancers.
Significance analysis restricted to somatic mutations in genes reported in COSMIC revealed three significantly mutated genes, PIK3CA, TP53 and ERBB2, the latter below the significance threshold in the complete analysis (Supplementary Table 7). ERBB2 contained somatic mutations in three samples, with two being identical S310F mutations (these two samples are distinct on the basis of their germline and somatic genotypes). The S310F mutation can activate ERBB2 and is transforming in vitro (personal communication from H. Greulich). Neither sample with the S310F activating mutation has ERBB2 amplification (Supplementary Fig. 9). The two samples belong to the Her2-enriched and luminal B subtypes, which typically have ERBB2 amplification; this supports the notion that the observed mutations have a driving role in these tumours10,24.
To identify candidate genomic rearrangements, we applied the dRanger algorithm16,17 to the 22 cases with paired tumour/normal whole-genome sequencing data (Supplementary Table 8). The rate of rearrangements ranged from a median of 30 rearrangements per sample in the luminal A subtype (range 0–218) to the basal-like and Her2-enriched subtypes with a median of 237 and 246 rearrangements, respectively (Supplementary Fig. 10); the rates are similar to a recent report15. We performed polymerase chain reaction (PCR) amplification on a subset of the candidate rearrangements (Supplementary Methods) and confirmed 89 out of 165 events (54%). No rearrangement was seen in more than one sample (Supplementary Table 8). In addition, we did not identify rearrangements previously observed by DNA sequencing15 nor by complementary DNA (cDNA)-sequencing, including MAST and NOTCH family-gene fusions25.
The discovery of recurrent driver rearrangements in other epithelial cancers26,27 led to a closer examination of the list of confirmed rearrangements. In a triple-negative, basal-like subtype tumour, we observed a rearrangement between the genes MAGI3 (membrane-associated guanylate kinase, WW and PDZ domain containing 3) on chromosome 1p and AKT3 (v-akt murine thymoma viral oncogene homologue 3) on chromosome 1q, resulting in a balanced translocation from intron 9 in MAGI3 to intron 1 of AKT3 (Fig. 3a). The previously unknown fusion genes were confirmed in tumour DNA by sequencing the product of PCR amplification (Fig. 3b). The MAGI3 disruption is complemented by a hemizygous deletion of the other allele (Supplementary Fig. 11a). The expression levels of individual exons of MAGI3 and AKT3 correspond to the predicted 5′-MAGI3–AKT3-3′ fusion (Supplementary Fig. 11b), with this sample having the highest AKT3 expression in the data set. Expression of the fusion gene was confirmed in the tumour sample by PCR amplification of the cDNA (Fig. 3b).
The rearrangement produces an in-frame fusion gene with a predicted MAGI3–AKT3 fusion protein that combines MAGI3 lacking the second PDZ domain, reported to bind to PTEN and be required for the inhibitory effect of PTEN on the PI3K pathway28, together with an AKT3 region that retains an intact kinase domain but has a disruption of the pleckstrin homology domain before the glutamate at position 17 (Fig. 3c). AKT3 shares significant homology to AKT1 and is reported to be the dominant AKT family member expressed in hormone-receptor-negative breast cancers29. Together, the MAGI3–AKT3 translocation and deletion of MAGI3 could result in the combined loss of function of a tumour suppressor gene (PTEN) and activation of an oncogene (AKT3).
To evaluate oncogenic activity of the MAGI3–AKT3 fusion, we expressed the fusion gene ectopically in ZR-75 cells. The MAGI3–AKT3 fusion protein is constitutively phosphorylated at serine 473 in the AKT3 kinase domain (numbered according to the wild-type protein) in the absence of growth factors (Fig. 3d); ectopically expressed AKT1 with an engineered E17K mutation is likewise constitutively phosphorylated (Fig. 3d), as previously reported12. Constitutive activation of the MAGI3–AKT3 kinase in turn activates downstream pathways as demonstrated by phosphorylation of GSK3β, an AKT substrate (Fig. 3d). Phosphorylation of GSK3β by the MAGI3–AKT3 fusion can be inhibited with an ATP-competitive small molecule AKT inhibitor, GSK-690693, but not with an allosteric AKT inhibitor, MK-2206, that interacts with the PH domain of AKT (Fig. 3d). Overexpression of the MAGI3–AKT3 fusion gene in Rat-1 fibroblast cell lines led to loss of contact inhibition and focus formation (Fig. 3e).
We screened 235 additional breast cancer samples for the presence of the 5′-MAGI3–AKT3-3′ fusion event by PCR with reverse transcription (RT–PCR) of cDNA followed by Sanger sequencing of breakpoints. The fusion was present in 8 of the 235 samples, including 5 out of 72 triple-negative (oestrogen-receptor-, progesterone-receptor- and Her2-negative) samples (Supplementary Fig. 12).
The power provided by whole-genome and whole-exome sequencing of a relatively large and diverse breast cancer sample set has enabled several significant discoveries, including the identification of recurrent inactivating mutations in CBFB and of a recurrent translocation of MAGI3–AKT3. The mutations in CBFB, RUNX1 and GATA3 suggest the importance of understanding epithelial cell differentiation and its regulatory transcription factors in breast cancer pathogenesis. The recurrent genomic fusion involving AKT3 suggests that the use of ATP-competitive AKT inhibitors should be evaluated in clinical trials for the treatment of fusion-positive triple-negative breast cancers, a subtype where limited therapeutic options exist beyond systemic cytotoxic chemotherapy.
All samples were obtained under institutional IRB approval and with documented informed consent. Breast cancer specimens from Mexico were paired with peripheral blood normal DNA whereas the Vietnamese samples were paired with DNA from normal adjacent breast tissue. Tumour RNA for each case was analysed on exon arrays to determine breast cancer expression subtype using the PAM50 classification method, whereas tumour/normal DNA pairs were analysed for copy number, allelic imbalance, and ancestry using single nucleotide polymorphism (SNP) arrays. A total of 108 samples, 17 both whole-genome sequencing and whole-exome sequencing, 86 whole-exome sequencing only, and 5 whole-genome sequencing only, passed initial qualification metrics, library construction, and successfully achieved desired sequencing depth (100× whole-exome sequencing; 30× whole-genome sequencing) on the Illumina sequencing platform (Supplementary Figs 1–3, Supplementary Tables 2 and 3). Tumour-specific point mutations, small insertions/deletions (indels), and rearrangements were detected by comparing tumour DNA to its paired normal DNA and using a series of algorithms to identify somatic events (Supplementary Fig. 2)16,17. Additional mutation calling was performed separately on tumour and normal DNA to identify germline mutation events that may confer susceptibility to breast carcinoma. Allele-specific copy number of each gene/mutation was determined using the HAPSEG and ABSOLUTE analysis methods. Confirmation of point mutations and indels was performed using mass-spectrometry-based genotyping and orthogonal next-generation sequencing methods, whereas putative in-frame genomic rearrangements were PCR-amplified from DNA to confirm the presence of the event.
A complete description of the materials and methods is provided in the Supplementary Information. Access to the data and computational algorithms used in this study can be found at https://confluence.broadinstitute.org/display/CGATools/Home.
We would like to thank all patients who contributed samples to this study. This study was a collaboration of the Broad Institute in Cambridge, Massachusetts, USA, and the National Institute of Genomic Medicine (INMEGEN) in Mexico City, Mexico. The work was conducted as part of the Slim Initiative for Genomic Medicine, a project funded by the Carlos Slim Health Institute in Mexico. This work is part of a global effort in collaboration with the International Cancer Genome Consortium (ICGC). The authors would also like to acknowledge J. Barretina and H. Greulich for their critical review of the manuscript. In addition, we would like to acknowledge the technical expertise and data generation efforts of The Broad Institute Biological Samples, Genome Sequencing, and Genetic Analysis Platforms. S.B. has received fellowship support co-sponsored by CancerCare Manitoba and the University of Manitoba. K.K.B. is a recipient of the John Gavin Post-doctoral Fellowship, Genesis Oncology Trust of New Zealand. R.R.-V. and S.L.R.-C. received a scholarship from the Mexican Council of Science and Technology (CONACyT). R.B. is a V Foundation Scholar. A.T. is funded by NIH grant CA122099. This work was partially supported by the Dana-Farber/Harvard SPORE in breast cancer under NCI grant reference CA089393.
Excel spreadsheet containing details of all mutations identified using whole exome sequencing.
Excel spreadsheet with multiple tabs detailing significantly mutated genes in indicated breast cancer subtypes
Excel spreadsheet detailing all rearrangements identified using dRanger and associated PCR validation.