Mutational landscape and significance across 12 major cancer types

Journal name:
Nature
Volume:
502,
Pages:
333–339
Date published:
DOI:
doi:10.1038/nature12634
Received
Accepted
Published online

Abstract

The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data and analytical results for point mutations and small insertions/deletions from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate the distributions of mutation frequencies, types and contexts across tumour types, and establish their links to tissues of origin, environmental/carcinogen influences, and DNA repair defects. Using the integrated data sets, we identified 127 significantly mutated genes from well-known (for example, mitogen-activated protein kinase, phosphatidylinositol-3-OH kinase, Wnt/β-catenin and receptor tyrosine kinase signalling pathways, and cell cycle control) and emerging (for example, histone, histone modification, splicing, metabolism and proteolysis) cellular processes in cancer. The average number of mutations in these significantly mutated genes varies across tumour types; most tumours have two to six, indicating that the number of driver mutations required during oncogenesis is relatively small. Mutations in transcriptional factors/regulators show tissue specificity, whereas histone modifiers are often mutated across several cancer types. Clinical association analysis identifies genes having a significant effect on survival, and investigations of mutations with respect to clonal/subclonal architecture delineate their temporal orders during tumorigenesis. Taken together, these results lay the groundwork for developing new diagnostics and individualizing cancer treatment.

At a glance

Figures

  1. Mutation frequencies, spectra and contexts across 12 cancer types.
    Figure 1: Mutation frequencies, spectra and contexts across 12 cancer types.

    a, Distribution of mutation frequencies across 12 cancer types. Dashed grey and solid white lines denote average across cancer types and median for each type, respectively. b, Mutation spectrum of six transition (Ti) and transversion (Tv) categories for each cancer type. c, Hierarchically clustered mutation context (defined by the proportion of A, T, C and G nucleotides within±2bp of variant site) for six mutation categories. Cancer types correspond to colours in a. Colour denotes degree of correlation: yellow (r = 0.75) and red (r = 1).

  2. The 127 SMGs from 20 cellular processes in cancer identified in 12 cancer types.
    Figure 2: The 127 SMGs from 20 cellular processes in cancer identified in 12 cancer types.

    Percentages of samples mutated in individual tumour types and Pan-Cancer are shown, with the highest percentage in each gene among 12 cancer types in bold.

  3. Distribution of mutations in 127 SMGs across Pan-Cancer cohort.
    Figure 3: Distribution of mutations in 127 SMGs across Pan-Cancer cohort.

    Box plot displays median numbers of non-synonymous mutations, with outliers shown as dots. In total, 3,210 tumours were used for this analysis (hypermutators excluded).

  4. Unsupervised clustering based on mutation status of SMGs.
    Figure 4: Unsupervised clustering based on mutation status of SMGs.

    Tumours having no mutation or more than 500 mutations were excluded. A mutation status matrix was constructed for 2,611 tumours. Major clusters of mutations detected in UCEC, COAD, GBM, AML, KIRC, OV and BRCA were highlighted. Complete gene list shown in Extended Data Fig. 3.

  5. Driver initiation and progression mutations and tumour clonal architecture.
    Figure 5: Driver initiation and progression mutations and tumour clonal architecture.

    a, Variant allele fraction (VAF) distribution of mutations in SMGs across tumours from AML, BRCA and UCEC for mutations (≥20× coverage) in copy neutral segments. SMGs having≥5 mutation data points were included. ChrX, chromosome X. b, In AML sample TCGA-AB-2968 (WGS), two DNMT3A mutations are in the founding clone, and one NRAS mutation is in the subclone. In BRCA tumour TCGA-BH-A18P (exome), one FOXA1 mutation is in the founding clone, and PIK3R1 and MLL3 mutations are in the subclone. In UCEC tumour TCGA-B5-A0JV (exome), PIK3CA, ARID1A and CTCF mutations are in the founding clone, and NRAS, PTEN and KRAS mutations are in the secondary clone. Asterisk denotes stop codon.

  6. Mutation context across 12 cancer types.
    Extended Data Fig. 1: Mutation context across 12 cancer types.

    Mutation context showing proportions of A, T, C and G nucleotides within±5bp for all validated mutations of type C>G/G>C and C>T/G>A across all 12 cancer types. The y axis denotes the total number of mutations in each category.

  7. The distribution of KRAS hotspot mutations across tumour types.
    Extended Data Fig. 2: The distribution of KRAS hotspot mutations across tumour types.

    Distribution of changes caused by mutations of the KRAS hotspot at amino acids 12 and 13. Lung adenocarcinoma has a significantly higher proportion of Gly12Cys mutations than other cancers (P<3.2×10−10), caused by the increase in C>A transversions in the genomic DNA at that location.

  8. Unsupervised clustering based on mutation status of SMGs.
    Extended Data Fig. 3: Unsupervised clustering based on mutation status of SMGs.

    Tumours having no mutation or more than 500 mutations were excluded to reduce noise. A mutation status matrix was constructed for 2,611 tumours. Major clusters of mutations detected in UCEC, COAD, GBM, AML, KIRC, OV and BRCA were highlighted. The shorter version is shown in Fig. 4.

  9. Mutation relation analysis in individual tumour types and the Pan-Cancer set.
    Extended Data Fig. 4: Mutation relation analysis in individual tumour types and the Pan-Cancer set.

    a, Exclusivity and co-occurrence between SMGs in each tumour type. The −log10 P value appears in either red or green if the pair shows exclusivity or co-occurrence, respectively. b, Exclusivity and co-occurrence between genes in the most significant (q<0.05) pairs in Pan-Cancer set. Colour scheme is as in a.

  10. Mutually exclusive mutations identified by Dendrix in the Pan-Cancer and individual cancer type data sets.
    Extended Data Fig. 5: Mutually exclusive mutations identified by Dendrix in the Pan-Cancer and individual cancer type data sets.

    a, The highest scoring exclusive set of mutated genes in 127 SMGs contains several genes that are strongly associated with one cancer type. b, The highest scoring exclusive set of mutations in the top 600 genes (not enriched for mutations in one cancer type) reported by MuSiC. c, Relationships between exclusive gene sets identified by Dendrix in individual cancer types. Eight types include TP53 in the most exclusive set, three include KRAS, and two include PTEN, with the remaining genes appearing in only a single type. d, Exclusivity and co-occurrence assessed at the Pan-Cancer level. The −log10 P value appears in red or green if the pair shows exclusivity or co-occurrence, respectively. KIRC is most exclusive to other tumour types, whereas COAD/READ presented strong co-occurrence with other types.

  11. Kaplan-Meier plots for genes significantly associated with survival.
    Extended Data Fig. 6: Kaplan–Meier plots for genes significantly associated with survival.

    Plots are shown for 24 genes showing significant (P0.05) association in individual cancer types. Although NPM1 mutations in patients with AML having intermediate cytogenetic risk are relatively benign in the absence of internal tandem duplications in FLT3, we did not stratify patients based on cytogenetics or FLT3 internal tandem duplication status in this analysis, and cannot discern this effect. Because most patients with OV (95%) have TP53 mutations, we could not obtain sufficient non-TP53 mutant controls for confidently dissecting the relationship between TP53 status and survival in OV.

  12. VAF distribution of mutations in SMGs across tumours from BLCA, KIRC, HNSC, LUAD, LUSC, COAD/READ, OV and GBM.
    Extended Data Fig. 7: VAF distribution of mutations in SMGs across tumours from BLCA, KIRC, HNSC, LUAD, LUSC, COAD/READ, OV and GBM.

    To minimize the effect of copy number alterations on VAFs, only mutations residing in copy number neutral segments were used for this analysis. Only mutation sites with≥20× coverage were used for analysis and plotting. SMGs with at least five data points were included in the plot.

  13. Mutation expression and tumour clonal architecture in AML, BRCA and UCEC.
    Extended Data Fig. 8: Mutation expression and tumour clonal architecture in AML, BRCA and UCEC.

    a, Density plots of expressed VAFs for mutations in SMGs (blue) and non-SMGs (red). b, SciClone clonality example plots for AML (validation data), BRCA and UCEC. Two plots are shown for each case: kernel density (top), followed by the plot of tumour VAF by sequence depth for sites from selected copy number neutral regions. Mutations (with annotations) in SMGs were shown.

  14. Summary of major findings in Pan-Cancer 12.
    Extended Data Fig. 9: Summary of major findings in Pan-Cancer 12.

    Systematic analysis of the TCGA Pan-Cancer mutation dataset identifies SMGs, cancer-related cellular processes, and genes associated with clinical features and tumour progression.

Tables

  1. Clinical correlation and survival analysis for genes mutated at[thinsp][ge]2% frequency in at least 2 tumour types
    Extended Data Table 1: Clinical correlation and survival analysis for genes mutated at≥2% frequency in at least 2 tumour types

References

  1. Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311317 (2012)
  2. Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568576 (2012)
  3. Dees, N. D. et al. MuSiC: Identifying mutational significance in cancer genomes. Genome Res. 22, 15891598 (2012)
  4. Roth, A. et al. JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 28, 907913 (2012)
  5. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature Biotechnol. 31, 213219 (2013)
  6. Jones, S. et al. Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 321, 18011806 (2008)
  7. Parsons, D. W. et al. An integrated genomic analysis of human glioblastoma multiforme. Science 321, 18071812 (2008)
  8. Sjöblom, T. et al. The consensus coding sequences of human breast and colorectal cancers. Science 314, 268274 (2006)
  9. The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 10611068 (2008)
  10. Ding, L. et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455, 10691075 (2008)
  11. Wood, L. D. et al. The genomic landscapes of human breast and colorectal cancers. Science 318, 11081113 (2007)
  12. The Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609615 (2011)
  13. The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 6170 (2012)
  14. Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497, 6773 (2013)
  15. The Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 20592074 (2013)
  16. The Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330337 (2012)
  17. Ellis, M. J. et al. Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature 486, 353360 (2012)
  18. The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 4349 (2013)
  19. Hanahan, D. & Weinberg, R. A. The hallmarks of cancer. Cell 100, 5770 (2000)
  20. Downing, J. R. et al. The Pediatric Cancer Genome Project. Nature Genet. 44, 619622 (2012)
  21. Ma, Z. & Leijon, A. Bayesian estimation of beta mixture models with variational inference. IEEE Trans. Pattern Anal. Mach. Intell. 33, 21602173 (2011)
  22. Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214218 (2013)
  23. Tao, M. H. & Freudenheim, J. L. DNA methylation in endometrial cancer. Epigenetics 5, 491498 (2010)
  24. Etcheverry, A. et al. DNA methylation in glioblastoma: impact on gene expression and clinical outcome. BMC Genomics 11, 701 (2010)
  25. Varela, I. et al. Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma. Nature 469, 539542 (2011)
  26. Peña-Llopis, S. et al. BAP1 loss defines a new class of renal cell carcinoma. Nature Genet. 44, 751759 (2012)
  27. Clapier, C. R. & Cairns, B. R. The biology of chromatin remodeling complexes. Annu. Rev. Biochem. 78, 273304 (2009)
  28. Kapur, P. et al. Effects on survival of BAP1 and PBRM1 mutations in sporadic clear-cell renal-cell carcinoma: a retrospective analysis with independent validation. Lancet Oncol. 14, 159167 (2013)
  29. Jiao, Y. et al. Frequent ATRX, CIC, FUBP1 and IDH1 mutations refine the classification of malignant gliomas. Oncotarget 3, 709722 (2012)
  30. Vandin, F., Upfal, E. & Raphael, B. J. De novo discovery of mutated driver pathways in cancer. Genome Res. 22, 375385 (2012)
  31. Piazza, R. et al. Recurrent SETBP1 mutations in atypical chronic myeloid leukemia. Nature Genet. 45, 1824 (2013)
  32. Yang, D. et al. Association of BRCA1 and BRCA2 mutations with survival, chemotherapy sensitivity, and gene mutator phenotype in patients with ovarian cancer. J. Am. Med. Assoc. 306, 15571565 (2011)
  33. Bolton, K. L. et al. Association between BRCA1 and BRCA2 mutations and survival in women with invasive epithelial ovarian cancer. J. Am. Med. Assoc. 307, 382390 (2012)
  34. Ley, T. J. et al. DNMT3A mutations in acute myeloid leukemia. N. Engl. J. Med. 363, 24242433 (2010)
  35. Myung, J. K. et al. IDH1 mutation of gliomas with long-term survival analysis. Oncol. Rep. 28, 16391644 (2012)
  36. Ding, L. et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506510 (2012)
  37. Welch, J. S. et al. The origin and evolution of mutations in acute myeloid leukemia. Cell 150, 264278 (2012)
  38. Vogelstein, B. et al. Cancer genome landscapes. Science 339, 15461558 (2013)

Download references

Author information

  1. These authors contributed equally to this work.

    • Cyriac Kandoth &
    • Michael D. McLellan

Affiliations

  1. The Genome Institute, Washington University in St Louis, Missouri 63108, USA

    • Cyriac Kandoth,
    • Michael D. McLellan,
    • Kai Ye,
    • Beifang Niu,
    • Charles Lu,
    • Mingchao Xie,
    • Qunyuan Zhang,
    • Joshua F. McMichael,
    • Matthew A. Wyczalkowski,
    • Christopher A. Miller,
    • Michael C. Wendl,
    • Timothy J. Ley,
    • Richard K. Wilson &
    • Li Ding
  2. Department of Computer Science, Brown University, Providence, Rhode Island 02912, USA

    • Fabio Vandin,
    • Mark D. M. Leiserson &
    • Benjamin J. Raphael
  3. Department of Genetics, Washington University in St Louis, Missouri 63108, USA

    • Kai Ye,
    • Qunyuan Zhang,
    • Michael C. Wendl,
    • Timothy J. Ley,
    • Richard K. Wilson &
    • Li Ding
  4. Department of Medicine, Washington University in St Louis, Missouri 63108, USA

    • John S. Welch,
    • Matthew J. Walter,
    • Timothy J. Ley &
    • Li Ding
  5. Siteman Cancer Center, Washington University in St Louis, Missouri 63108, USA

    • John S. Welch,
    • Matthew J. Walter,
    • Timothy J. Ley,
    • Richard K. Wilson &
    • Li Ding
  6. Department of Mathematics, Washington University in St Louis, Missouri 63108, USA

    • Michael C. Wendl

Contributions

L.D. and R.K.W. supervised the research. L.D., C.K., M.D.M., F.V., K.Y., B.N., C.L., M.X., M.D.M.L., M.A.W., J.F.M., M.J.W., C.A.M., J.S.W. and B.J.R. analysed the data. M.C.W. and Q.Z. performed statistical analysis. M.D.M., C.K., F.V., C.L., M.X., K.Y., B.N., Q.Z., M.C.W., J.F.M., M.D.M., M.A.W. and L.D. prepared the figures and tables. L.D., T.J.L., C.K. and B.J.R conceived and designed the experiments. L.D., M.D.M., C.K., F.V., C.L., B.J.R., K.Y. and M.C.W. wrote the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: Mutation context across 12 cancer types. (628 KB)

    Mutation context showing proportions of A, T, C and G nucleotides within±5bp for all validated mutations of type C>G/G>C and C>T/G>A across all 12 cancer types. The y axis denotes the total number of mutations in each category.

  2. Extended Data Figure 2: The distribution of KRAS hotspot mutations across tumour types. (143 KB)

    Distribution of changes caused by mutations of the KRAS hotspot at amino acids 12 and 13. Lung adenocarcinoma has a significantly higher proportion of Gly12Cys mutations than other cancers (P<3.2×10−10), caused by the increase in C>A transversions in the genomic DNA at that location.

  3. Extended Data Figure 3: Unsupervised clustering based on mutation status of SMGs. (294 KB)

    Tumours having no mutation or more than 500 mutations were excluded to reduce noise. A mutation status matrix was constructed for 2,611 tumours. Major clusters of mutations detected in UCEC, COAD, GBM, AML, KIRC, OV and BRCA were highlighted. The shorter version is shown in Fig. 4.

  4. Extended Data Figure 4: Mutation relation analysis in individual tumour types and the Pan-Cancer set. (591 KB)

    a, Exclusivity and co-occurrence between SMGs in each tumour type. The −log10 P value appears in either red or green if the pair shows exclusivity or co-occurrence, respectively. b, Exclusivity and co-occurrence between genes in the most significant (q<0.05) pairs in Pan-Cancer set. Colour scheme is as in a.

  5. Extended Data Figure 5: Mutually exclusive mutations identified by Dendrix in the Pan-Cancer and individual cancer type data sets. (326 KB)

    a, The highest scoring exclusive set of mutated genes in 127 SMGs contains several genes that are strongly associated with one cancer type. b, The highest scoring exclusive set of mutations in the top 600 genes (not enriched for mutations in one cancer type) reported by MuSiC. c, Relationships between exclusive gene sets identified by Dendrix in individual cancer types. Eight types include TP53 in the most exclusive set, three include KRAS, and two include PTEN, with the remaining genes appearing in only a single type. d, Exclusivity and co-occurrence assessed at the Pan-Cancer level. The −log10 P value appears in red or green if the pair shows exclusivity or co-occurrence, respectively. KIRC is most exclusive to other tumour types, whereas COAD/READ presented strong co-occurrence with other types.

  6. Extended Data Figure 6: Kaplan–Meier plots for genes significantly associated with survival. (749 KB)

    Plots are shown for 24 genes showing significant (P0.05) association in individual cancer types. Although NPM1 mutations in patients with AML having intermediate cytogenetic risk are relatively benign in the absence of internal tandem duplications in FLT3, we did not stratify patients based on cytogenetics or FLT3 internal tandem duplication status in this analysis, and cannot discern this effect. Because most patients with OV (95%) have TP53 mutations, we could not obtain sufficient non-TP53 mutant controls for confidently dissecting the relationship between TP53 status and survival in OV.

  7. Extended Data Figure 7: VAF distribution of mutations in SMGs across tumours from BLCA, KIRC, HNSC, LUAD, LUSC, COAD/READ, OV and GBM. (391 KB)

    To minimize the effect of copy number alterations on VAFs, only mutations residing in copy number neutral segments were used for this analysis. Only mutation sites with≥20× coverage were used for analysis and plotting. SMGs with at least five data points were included in the plot.

  8. Extended Data Figure 8: Mutation expression and tumour clonal architecture in AML, BRCA and UCEC. (398 KB)

    a, Density plots of expressed VAFs for mutations in SMGs (blue) and non-SMGs (red). b, SciClone clonality example plots for AML (validation data), BRCA and UCEC. Two plots are shown for each case: kernel density (top), followed by the plot of tumour VAF by sequence depth for sites from selected copy number neutral regions. Mutations (with annotations) in SMGs were shown.

  9. Extended Data Figure 9: Summary of major findings in Pan-Cancer 12. (284 KB)

    Systematic analysis of the TCGA Pan-Cancer mutation dataset identifies SMGs, cancer-related cellular processes, and genes associated with clinical features and tumour progression.

Extended Data Tables

  1. Extended Data Table 1: Clinical correlation and survival analysis for genes mutated at≥2% frequency in at least 2 tumour types (223 KB)

Supplementary information

Zip files

  1. Supplementary Data (22.2 MB)

    This zipped file contains Supplementary Tables 1 and 3-14.

  2. Supplementary Data (32 MB)

    This zipped file contains Supplementary Table 2.

Additional data