The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data and analytical results for point mutations and small insertions/deletions from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate the distributions of mutation frequencies, types and contexts across tumour types, and establish their links to tissues of origin, environmental/carcinogen influences, and DNA repair defects. Using the integrated data sets, we identified 127 significantly mutated genes from well-known (for example, mitogen-activated protein kinase, phosphatidylinositol-3-OH kinase, Wnt/β-catenin and receptor tyrosine kinase signalling pathways, and cell cycle control) and emerging (for example, histone, histone modification, splicing, metabolism and proteolysis) cellular processes in cancer. The average number of mutations in these significantly mutated genes varies across tumour types; most tumours have two to six, indicating that the number of driver mutations required during oncogenesis is relatively small. Mutations in transcriptional factors/regulators show tissue specificity, whereas histone modifiers are often mutated across several cancer types. Clinical association analysis identifies genes having a significant effect on survival, and investigations of mutations with respect to clonal/subclonal architecture delineate their temporal orders during tumorigenesis. Taken together, these results lay the groundwork for developing new diagnostics and individualizing cancer treatment.
- SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012) et al.
- VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012) et al.
- MuSiC: Identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012) et al.
- JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 28, 907–913 (2012) et al.
- Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature Biotechnol. 31, 213–219 (2013) et al.
- Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 321, 1801–1806 (2008) et al.
- An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807–1812 (2008) et al.
- The consensus coding sequences of human breast and colorectal cancers. Science 314, 268–274 (2006) et al.
- The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008)
- Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455, 1069–1075 (2008) et al.
- The genomic landscapes of human breast and colorectal cancers. Science 318, 1108–1113 (2007) et al.
- The Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615 (2011)
- The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012)
- Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013)
- The Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013)
- The Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012)
- Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature 486, 353–360 (2012) et al.
- The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013)
- The hallmarks of cancer. Cell 100, 57–70 (2000) &
- The Pediatric Cancer Genome Project. Nature Genet. 44, 619–622 (2012) et al.
- Bayesian estimation of beta mixture models with variational inference. IEEE Trans. Pattern Anal. Mach. Intell. 33, 2160–2173 (2011) &
- Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013) et al.
- DNA methylation in endometrial cancer. Epigenetics 5, 491–498 (2010) &
- DNA methylation in glioblastoma: impact on gene expression and clinical outcome. BMC Genomics 11, 701 (2010) et al.
- Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma. Nature 469, 539–542 (2011) et al.
- BAP1 loss defines a new class of renal cell carcinoma. Nature Genet. 44, 751–759 (2012) et al.
- The biology of chromatin remodeling complexes. Annu. Rev. Biochem. 78, 273–304 (2009) &
- Effects on survival of BAP1 and PBRM1 mutations in sporadic clear-cell renal-cell carcinoma: a retrospective analysis with independent validation. Lancet Oncol. 14, 159–167 (2013) et al.
- Frequent ATRX, CIC, FUBP1 and IDH1 mutations refine the classification of malignant gliomas. Oncotarget 3, 709–722 (2012) et al.
- De novo discovery of mutated driver pathways in cancer. Genome Res. 22, 375–385 (2012) , &
- Recurrent SETBP1 mutations in atypical chronic myeloid leukemia. Nature Genet. 45, 18–24 (2013) et al.
- Association of BRCA1 and BRCA2 mutations with survival, chemotherapy sensitivity, and gene mutator phenotype in patients with ovarian cancer. J. Am. Med. Assoc. 306, 1557–1565 (2011) et al.
- Association between BRCA1 and BRCA2 mutations and survival in women with invasive epithelial ovarian cancer. J. Am. Med. Assoc. 307, 382–390 (2012) et al.
- DNMT3A mutations in acute myeloid leukemia. N. Engl. J. Med. 363, 2424–2433 (2010) et al.
- IDH1 mutation of gliomas with long-term survival analysis. Oncol. Rep. 28, 1639–1644 (2012) et al.
- Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506–510 (2012) et al.
- The origin and evolution of mutations in acute myeloid leukemia. Cell 150, 264–278 (2012) et al.
- Cancer genome landscapes. Science 339, 1546–1558 (2013) et al.
Extended data figures and tables
Extended Data Figures
- Extended Data Figure 1: Mutation context across 12 cancer types. (628 KB)
Mutation context showing proportions of A, T, C and G nucleotides within ±5 bp for all validated mutations of type C>G/G>C and C>T/G>A across all 12 cancer types. The y axis denotes the total number of mutations in each category.
- Extended Data Figure 2: The distribution of KRAS hotspot mutations across tumour types. (143 KB)
Distribution of changes caused by mutations of the KRAS hotspot at amino acids 12 and 13. Lung adenocarcinoma has a significantly higher proportion of Gly12Cys mutations than other cancers (P < 3.2 × 10−10), caused by the increase in C>A transversions in the genomic DNA at that location.
- Extended Data Figure 3: Unsupervised clustering based on mutation status of SMGs. (294 KB)
Tumours having no mutation or more than 500 mutations were excluded to reduce noise. A mutation status matrix was constructed for 2,611 tumours. Major clusters of mutations detected in UCEC, COAD, GBM, AML, KIRC, OV and BRCA were highlighted. The shorter version is shown in Fig. 4.
- Extended Data Figure 4: Mutation relation analysis in individual tumour types and the Pan-Cancer set. (591 KB)
a, Exclusivity and co-occurrence between SMGs in each tumour type. The −log10 P value appears in either red or green if the pair shows exclusivity or co-occurrence, respectively. b, Exclusivity and co-occurrence between genes in the most significant (q < 0.05) pairs in Pan-Cancer set. Colour scheme is as in a.
- Extended Data Figure 5: Mutually exclusive mutations identified by Dendrix in the Pan-Cancer and individual cancer type data sets. (326 KB)
a, The highest scoring exclusive set of mutated genes in 127 SMGs contains several genes that are strongly associated with one cancer type. b, The highest scoring exclusive set of mutations in the top 600 genes (not enriched for mutations in one cancer type) reported by MuSiC. c, Relationships between exclusive gene sets identified by Dendrix in individual cancer types. Eight types include TP53 in the most exclusive set, three include KRAS, and two include PTEN, with the remaining genes appearing in only a single type. d, Exclusivity and co-occurrence assessed at the Pan-Cancer level. The −log10 P value appears in red or green if the pair shows exclusivity or co-occurrence, respectively. KIRC is most exclusive to other tumour types, whereas COAD/READ presented strong co-occurrence with other types.
- Extended Data Figure 6: Kaplan–Meier plots for genes significantly associated with survival. (749 KB)
Plots are shown for 24 genes showing significant (P ≤ 0.05) association in individual cancer types. Although NPM1 mutations in patients with AML having intermediate cytogenetic risk are relatively benign in the absence of internal tandem duplications in FLT3, we did not stratify patients based on cytogenetics or FLT3 internal tandem duplication status in this analysis, and cannot discern this effect. Because most patients with OV (95%) have TP53 mutations, we could not obtain sufficient non-TP53 mutant controls for confidently dissecting the relationship between TP53 status and survival in OV.
- Extended Data Figure 7: VAF distribution of mutations in SMGs across tumours from BLCA, KIRC, HNSC, LUAD, LUSC, COAD/READ, OV and GBM. (391 KB)
To minimize the effect of copy number alterations on VAFs, only mutations residing in copy number neutral segments were used for this analysis. Only mutation sites with ≥20× coverage were used for analysis and plotting. SMGs with at least five data points were included in the plot.
- Extended Data Figure 8: Mutation expression and tumour clonal architecture in AML, BRCA and UCEC. (398 KB)
a, Density plots of expressed VAFs for mutations in SMGs (blue) and non-SMGs (red). b, SciClone clonality example plots for AML (validation data), BRCA and UCEC. Two plots are shown for each case: kernel density (top), followed by the plot of tumour VAF by sequence depth for sites from selected copy number neutral regions. Mutations (with annotations) in SMGs were shown.
- Extended Data Figure 9: Summary of major findings in Pan-Cancer 12. (284 KB)
Systematic analysis of the TCGA Pan-Cancer mutation dataset identifies SMGs, cancer-related cellular processes, and genes associated with clinical features and tumour progression.