Discovery and saturation analysis of cancer genes across 21 tumour types

Journal name:
Nature
Volume:
505,
Pages:
495–501
Date published:
DOI:
doi:10.1038/nature12912
Received
Accepted
Published online

Abstract

Although a few cancer genes are mutated in a high proportion of tumours of a given type (>20%), most are mutated at intermediate frequencies (2–20%). To explore the feasibility of creating a comprehensive catalogue of cancer genes, we analysed somatic point mutations in exome sequences from 4,742 human cancers and their matched normal-tissue samples across 21 cancer types. We found that large-scale genomic analysis can identify nearly all known cancer genes in these tumour types. Our analysis also identified 33 genes that were not previously known to be significantly mutated in cancer, including genes related to proliferation, apoptosis, genome stability, chromatin regulation, immune evasion, RNA processing and protein homeostasis. Down-sampling analysis indicates that larger sample sizes will reveal many more genes mutated at clinically important frequencies. We estimate that near-saturation may be achieved with 600–5,000 samples per tumour type, depending on background mutation frequency. The results may help to guide the next stage of cancer genomics.

At a glance

Figures

  1. Mutation patterns for one known and two novel cancer genes.
    Figure 1: Mutation patterns for one known and two novel cancer genes.

    EGFR shows distinctive tumour-type-specific concentrations of mutations in different regions of the gene. RHEB, which encodes a small GTPase in the Ras superfamily, shows a mutational hotspot in the effector domain. RHOA, another a member of the Ras superfamily, also shows a mutational hotspot in the effector domain. Coloured bars after tumour-type names are copy-ratio distributions for the gene, when available (red, amplified; blue,deleted). Missense mutations are represented by green circles of varying saturation indicating degree of evolutionary conservation of the mutated base pair, from highly conserved (dark green), to medium conservation (light green), to totally unconserved (white). Tumour types with names shown in red were significantly mutated in this gene, in dark red were nearly significantly mutated, or in black were not significantly mutated. Thin red strokes in the protein ideogram represent splice sites (see also Supplementary Fig. 4; similar diagrams for all genes are available at http://www.tumorportal.org).

  2. Cancer genes in selected tumour types.
    Figure 2: Cancer genes in selected tumour types.

    Genes are arranged on the horizontal line according to P value (combined value for the three tests in MutSig). Yellow region contains genes that achieve FDR q0.1. Orange interval contains P values for the next 20 genes. Gene-name colour indicates whether the gene is a known cancer gene (blue), a novel gene with clear connection to cancer (red, discussed in text), or an additional novel gene (black). Circle colour indicates the frequency (percentage of patients carrying non-silent somatic mutations) in that tumour type; see also Supplementary Fig. 5.

  3. Cancer genes identified from a data set of 4,742 tumours.
    Figure 3: Cancer genes identified from a data set of 4,742 tumours.

    Genes are plotted by the q value (FDR) in the most significant of the 21 tumour types (x axis) and the q value when the 4,742 tumours are analysed as a combined (‘pan-cancer’) cohort (y axis). Genes in the top-left quadrant reached significance only in the combined analysis. Genes in the bottom-right quadrant reached significance only in one or more single-type analyses. Genes in the top-right quadrant were significant in both the combined set and in individual tumour types. Colour of gene names is as in Fig. 2.

  4. Down-sampling analysis shows that gene discovery is continuing as samples and tumour types are added.
    Figure 4: Down-sampling analysis shows that gene discovery is continuing as samples and tumour types are added.

    a, Analysis within tumour types. Each point represents a random subset of patients. Line is a smoothed fit. b, Analysis by adding tumour types. Each grey line represents a random ordering of the 21 tumour types. c, Analysis by adding samples. Each point is a random subset of the 4,742 patients. d, Analysis in c broken down by mutation frequency. Genes mutated at frequencies≥20% are nearing saturation, and intermediate frequencies show steep growth; see also Supplementary Figs 7 and 8.

  5. Number of samples needed to detect significantly mutated genes, as a function of a tumour type/'s median background mutation frequency and a cancer gene/'s mutation rate above background.
    Figure 5: Number of samples needed to detect significantly mutated genes, as a function of a tumour type’s median background mutation frequency and a cancer gene’s mutation rate above background.

    The number of samples needed to achieve 90% power for 90% of genes (y axis). Grey vertical lines indicate tumour type median background mutation frequencies (x axis). Black dots indicate sample sizes in the current study. For most tumour types, the current sample size is inadequate to reliably detect genes mutated at 5% or less above background; see also Supplementary Fig. 9. Adeno., adenocarcinoma.

References

  1. Garraway, L. A. & Lander, E. S. Lessons from the cancer genome. Cell 153, 1737 (2013)
  2. Vogelstein, B. et al. Cancer genome landscapes. Science 339, 15461558 (2013)
  3. Imielinski, M. et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 150, 11071120 (2012)
  4. Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nature Biotechnol. 30, 413421 (2012)
  5. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature Biotechnol. 31, 213219 (2013)
  6. Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214218 (2013)
  7. Zack, T. I. et al. Pan-cancer patterns of somatic copy number alteration. Nature Genet. 45, 11341140 (2013)
  8. Lohr, J. G. et al. Discovery and prioritization of somatic mutations in diffuse large B-cell lymphoma (DLBCL) by whole-exome sequencing. Proc. Natl Acad. Sci. USA 109, 38793884 (2012)
  9. Cancer Genome Atlas Research. Integrated genomic characterization of endometrial carcinoma. Nature 497, 6773 (2013)
  10. Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333339 (2013)
  11. Tamborero, D. et al. Comprehensive identification of mutational cancer driver genes across 12 tumor types. Sci. Rep. 3, 2650 (2013)
  12. Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646674 (2011)
  13. Ferlay, J. et al. Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. Int. J. Cancer 127, 28932917 (2010)

Download references

Author information

  1. These authors contributed equally to this work.

    • Eric S. Lander &
    • Gad Getz

Affiliations

  1. Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA

    • Michael S. Lawrence,
    • Petar Stojanov,
    • Craig H. Mermel,
    • James T. Robinson,
    • Levi A. Garraway,
    • Todd R. Golub,
    • Matthew Meyerson,
    • Stacey B. Gabriel,
    • Eric S. Lander &
    • Gad Getz
  2. Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, Massachusetts 02215, USA

    • Petar Stojanov,
    • Levi A. Garraway,
    • Todd R. Golub &
    • Matthew Meyerson
  3. Massachusetts General Hospital, Cancer Center and Department of Pathology, 55 Fruit Street, Boston, Massachusetts 02114, USA

    • Craig H. Mermel &
    • Gad Getz
  4. Harvard Medical School, 25 Shattuck Street, Boston, Massachusetts 02115, USA

    • Levi A. Garraway,
    • Todd R. Golub,
    • Matthew Meyerson,
    • Eric S. Lander &
    • Gad Getz
  5. Howard Hughes Medical Institute, 4000 Jones Bridge Road, Chevy Chase, Maryland 20815, USA

    • Todd R. Golub
  6. Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA

    • Eric S. Lander

Contributions

G.G., E.S.L., T.R.G., M.M., L.A.G. and S.B.G. conceived the project and provided leadership. M.S.L., G.G., E.S.L., P.S. and C.H.M. analysed the data and contributed to scientific discussions. M.S.L., E.S.L. and G.G. wrote the paper. J.T.R., M.S.L., E.S.L. and G.G. created the website for visualizing this data set.

Competing financial interests

A patent related to this work has been filed.

Corresponding authors

Correspondence to:

The data analysed in this manuscript have been deposited in Synapse (http://www.synapse.org), accession number syn1729383, and in dbGaP (http://www.ncbi.nlm.nih.gov/gap), accession numbers phs000330.v1.p1, phs000348.v1.p1, phs000369.v1.p1, phs000370.v1.p1, phs000374.v1.p1, phs000435.v2.p1, phs000447.v1.p1, phs000450.v1.p1, phs000452.v1.p1, phs000467.v6.p1, phs000488.v1.p1, phs000504.v1.p1, phs000508.v1.p1, phs000579.v1.p1, phs000598.v1.p1.

Author details

Supplementary information

PDF files

  1. Supplementary Information (1.4 MB)

    This file contains Supplementary Figures 1-9 and legends for Supplementary Tables 1-6 (see separate files for tables).

Excel files

  1. Supplementary Table 1 (28 KB)

    This file contains a list of source datasets analyzed in this work, and references to the corresponding publications.

  2. Supplementary Table 2 (709 KB)

    This file contains the 260 significantly mutated cancer genes found by analysis with the MutSig suite (see Supplementary Information file for full legend).

  3. Supplementary Table 3 (16 KB)

    This file contains a list of the 21 tumor types studied, and the significantly mutated genes found by the MutSig suite in each tumor type (see Supplementary Information file for full legend).

  4. Supplementary Table 4 (58 KB)

    The file contains a list of references reporting the identification of candidate cancer genes (see Supplementary Information file for full legend).

  5. Supplementary Table 5 (21 KB)

    This file contains a list of references to biological literature supporting the 33 novel candidate cancer genes with clear and compelling connections to cancer biology.

  6. Supplementary Table 6 (24 KB)

    This file contains a summary of the analysis comparing the performance of each of the three MutSig metrics separately, in pairwise combinations, and all three combined as in the main analysis (see Supplementary Information file for full legend).

Additional data