Abstract

Many cancer-associated somatic copy number alterations (SCNAs) are known. Currently, one of the challenges is to identify the molecular downstream effects of these variants. Although several SCNAs are known to change gene expression levels, it is not clear whether each individual SCNA affects gene expression. We reanalyzed 77,840 expression profiles and observed a limited set of 'transcriptional components' that describe well-known biology, explain the vast majority of variation in gene expression and enable us to predict the biological function of genes. On correcting expression profiles for these components, we observed that the residual expression levels (in 'functional genomic mRNA' profiling) correlated strongly with copy number. DNA copy number correlated positively with expression levels for 99% of all abundantly expressed human genes, indicating global gene dosage sensitivity. By applying this method to 16,172 patient-derived tumor samples, we replicated many loci with aberrant copy numbers and identified recurrently disrupted genes in genomically unstable cancers.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. Trans-eQTLs reveal that independent genetic variants associated with a complex phenotype converge on intermediate genes, with a major role for the HLA. PLoS Genet. 7, e1002197 (2011).

  2. 2.

    et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).

  3. 3.

    et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315, 848–853 (2007).

  4. 4.

    et al. Emerging landscape of oncogenic signatures across human cancers. Nat. Genet. 45, 1127–1133 (2013).

  5. 5.

    , & Systematic survey reveals general applicability of 'guilt-by-association' within gene coexpression networks. BMC Bioinformatics 6, 227 (2005).

  6. 6.

    & Mitotic homologous recombination maintains genomic stability and suppresses tumorigenesis. Nat. Rev. Mol. Cell Biol. 11, 196–207 (2010).

  7. 7.

    , , , & Substrate recognition and catalysis by flap endonucleases and related enzymes. Biochem. Soc. Trans. 38, 433–437 (2010).

  8. 8.

    et al. Fen1 mutations result in autoimmunity, chronic inflammation and cancers. Nat. Med. 13, 812–819 (2007).

  9. 9.

    et al. Fen-1 facilitates homologous recombination by removing divergent sequences at DNA break ends. Mol. Cell. Biol. 25, 6948–6955 (2005).

  10. 10.

    , , & XRCC3 promotes homology-directed repair of DNA damage in mammalian cells. Genes Dev. 13, 2633–2638 (1999).

  11. 11.

    et al. Targeting the DNA repair defect in BRCA mutant cells as a therapeutic strategy. Nature 434, 917–921 (2005).

  12. 12.

    et al. Specific killing of BRCA2-deficient tumours with inhibitors of poly(ADP-ribose) polymerase. Nature 434, 913–917 (2005).

  13. 13.

    et al. SMIM1 underlies the Vel blood group and influences red blood cell traits. Nat. Genet. 45, 542–545 (2013).

  14. 14.

    et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 1467–1471 (2013).

  15. 15.

    , & Developmental studies of Brca1 and Brca2 knock-out mice. J. Mammary Gland Biol. Neoplasia 3, 431–445 (1998).

  16. 16.

    Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615 (2011).

  17. 17.

    et al. The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature 486, 395–399 (2012).

  18. 18.

    Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).

  19. 19.

    et al. The mutational landscape of head and neck squamous cell carcinoma. Science 333, 1157–1160 (2011).

  20. 20.

    et al. Frequent somatic mutations and homozygous deletions of the p16 (MTS1) gene in pancreatic adenocarcinoma. Nat. Genet. 8, 27–32 (1994).

  21. 21.

    et al. CDKN2A mutations in multiple primary melanomas. N. Engl. J. Med. 338, 879–887 (1998).

  22. 22.

    , , , & p16 (CDKN2) is a major deletion target at 9p21 in bladder cancer. Hum. Mol. Genet. 4, 1569–1577 (1995).

  23. 23.

    , , , & A census of amplified and overexpressed human cancer genes. Nat. Rev. Cancer 10, 59–64 (2010).

  24. 24.

    Cancer Genome Atlas Research Network. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

  25. 25.

    et al. Genetic alterations in primary bladder cancers and their metastases. Cancer Res. 58, 3555–3560 (1998).

  26. 26.

    et al. Studies of the HER-2/neu proto-oncogene in human breast and ovarian cancer. Science 244, 707–712 (1989).

  27. 27.

    et al. Gene copy numbers of erbB oncogenes in human pheochromocytoma. Oncol. Rep. 9, 1373–1378 (2002).

  28. 28.

    , , & Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998).

  29. 29.

    et al. Identification of aneuploidy-tolerating mutations. Cell 143, 71–83 (2010).

  30. 30.

    , & Aneuploidy causes proteotoxic stress in yeast. Genes Dev. 26, 2696–2708 (2012).

  31. 31.

    et al. Concordance among gene-expression-based predictors for breast cancer. N. Engl. J. Med. 355, 560–569 (2006).

  32. 32.

    et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 41, D991–D995 (2013).

  33. 33.

    et al. Survival-related profile, pathways, and transcription factors in ovarian cancer. PLoS Med. 6, e24 (2009).

  34. 34.

    et al. A bioinformatical and functional approach to identify novel strategies for chemoprevention of colorectal cancer. Oncogene 30, 2026–2036 (2011).

Download references

Acknowledgements

We thank J.L. Senior for editing the manuscript. This work was financially supported by grants from the Netherlands Organization for Scientific Research (NWO-VENI grant 916-10135 to L.F., NWO VIDI grant 916-76062 to M.A.T.M.v.V. and NWO VIDI grant 917-14374 to L.F.), a Horizon Breakthrough grant from the Netherlands Genomics Initiative (grant 92519031 to L.F.), a grant from the Van der Meer–Boerema Foundation to M.K. and grants from the Dutch Cancer Society (RUG 2011-5093 to M.A.T.M.v.V. and RUG 2013-5960 to R.S.N.F.). In addition, this study was financed in part by the SIA-raakPRO subsidy for project BioCOMP. The research leading to these results has received funding from the European Community's Health Seventh Framework Programme (FP7/2007–2013) under grant agreement 259867. This publication was made possible through the support of a grant from the John Templeton Foundation. Additional support for E.A.S. was provided by the Leiden Centre for Data Sciences.

Author information

Author notes

    • Rudolf S N Fehrmann
    •  & Juha M Karjalainen

    These authors contributed equally to this work.

Affiliations

  1. Department of Medical Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands.

    • Rudolf S N Fehrmann
    • , Małgorzata Krajewska
    • , Elisabeth G E de Vries
    •  & Marcel A T M van Vugt
  2. Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands.

    • Rudolf S N Fehrmann
    • , Juha M Karjalainen
    • , Harm-Jan Westra
    • , Gerard J te Meerman
    • , Cisca Wijmenga
    •  & Lude Franke
  3. National Center for Advancing Translational Sciences, US National Institutes of Health, Rockville, Maryland, USA.

    • David Maloney
    •  & Anton Simeonov
  4. Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Tune H Pers
    •  & Joel N Hirschhorn
  5. Division of Endocrinology, Children's Hospital Boston, Boston, Massachusetts, USA.

    • Tune H Pers
    •  & Joel N Hirschhorn
  6. Center for Basic and Translational Obesity Research, Children's Hospital Boston, Boston, Massachusetts, USA.

    • Tune H Pers
    •  & Joel N Hirschhorn
  7. Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark.

    • Tune H Pers
  8. Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA.

    • Joel N Hirschhorn
  9. Groningen Bioinformatics Centre, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Haren, the Netherlands.

    • Ritsert C Jansen
  10. Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands.

    • Erik A Schultes
    •  & Herman H H B M van Haagen
  11. BioSemantics Group, Leiden Institute of Advanced Computer Science, Leiden University, Leiden, the Netherlands.

    • Erik A Schultes

Authors

  1. Search for Rudolf S N Fehrmann in:

  2. Search for Juha M Karjalainen in:

  3. Search for Małgorzata Krajewska in:

  4. Search for Harm-Jan Westra in:

  5. Search for David Maloney in:

  6. Search for Anton Simeonov in:

  7. Search for Tune H Pers in:

  8. Search for Joel N Hirschhorn in:

  9. Search for Ritsert C Jansen in:

  10. Search for Erik A Schultes in:

  11. Search for Herman H H B M van Haagen in:

  12. Search for Elisabeth G E de Vries in:

  13. Search for Gerard J te Meerman in:

  14. Search for Cisca Wijmenga in:

  15. Search for Marcel A T M van Vugt in:

  16. Search for Lude Franke in:

Contributions

R.S.N.F. and L.F. conceived this study. M.K. performed the in vitro experiments. D.M. and A.S. synthesized chemical compounds. R.S.N.F., J.M.K., T.H.P., J.N.H., R.C.J., E.A.S., H.H.H.B.M.v.H., H.-J.W., G.J.t.M., M.A.T.M.v.V. and L.F. performed analyses. R.S.N.F., J.M.K., T.H.P., E.G.E.d.V., C.W., M.A.T.M.v.V. and L.F. wrote the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Rudolf S N Fehrmann or Lude Franke.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–9 and Supplementary Note.

Excel files

  1. 1.

    Supplementary Table 1

    Platform descriptives. The first column contains the platform identifiers used by the manufacturer (Affymetrix). The second column contains the accession identifier for the Gene Expression Omnibus (GEO). The third column contains the number of probes on each platform. The fourth column describes the species for which the platform was designed. The fifth column contains the number of samples initially downloaded from GEO. The sixth column contains the number of remaining samples after quality control.

  2. 2.

    Supplementary Table 2

    Distribution of samples according to MeSH anatomy tree structure. Overview of a sample MeSH annotation according to the MeSH anatomy tree structure.

  3. 3.

    Supplementary Table 3

    Principal-component analysis summary for GPL96. PCA summary for GPL96 (Homo sapiens, HG-U133A). The second column contains the percentage of explained variance per transcriptional component (TC). The third column contains the cumulative explained variance. The fourth column contains the split-half correlation per TC. The fifth column contains the Cronbach's α per TC.

  4. 4.

    Supplementary Table 4

    Principal-component analysis summary for GPL570. PCA summary for GPL570 (Homo sapiens, HG-U133 2.0). The second column contains the percentage of explained variance per transcriptional component (TC). The third column contains the cumulative explained variance. The fourth column contains the split-half correlation per TC. The fifth column contains the Cronbach's α per TC.

  5. 5.

    Supplementary Table 5

    Principal-component analysis summary for GPL1261. PCA summary for GPL1261 (Mus musculus, MG-430 2.0). The second column contains the percentage of explained variance per transcriptional component (TC). The third column contains the cumulative explained variance. The fourth column contains the split-half correlation per TC. The fifth column contains the Cronbach's α per TC.

  6. 6.

    Supplementary Table 6

    Principal component analysis summary for GPL1355. PCA summary for GPL1355 (Rat norvegicus, RG-230 2.0). The second column contains the percentage of explained variance per transcriptional component (TC). The third column contains the cumulative explained variance. The fourth column contains the split-half correlation per TC. The fifth column contains the Cronbach's α per TC.

  7. 7.

    Supplementary Table 7

    Gene set enrichment summary for GPL96. This table contains the number of gene sets per transcriptional component and per gene set database that were significantly enriched at a permutation-based false discovery rate of 5% for GPL96 (Homo sapiens, HG-U133A).

  8. 8.

    Supplementary Table 8

    Gene set enrichment summary for GPL570. This table contains the number of gene sets per transcriptional component and per gene set database that were significantly enriched at a permutation-based false discovery rate of 5% for GPL570 (Homo sapiens, HG-U133 2.0).

  9. 9.

    Supplementary Table 9

    Gene set enrichment summary for GPL1261. This table contains the number of gene sets per transcriptional component and per gene set database that were significantly enriched at a permutation-based false discovery rate of 5% for GPL1261 (Mus musculus, MG-430 2.0).

  10. 10.

    Supplementary Table 10

    Gene set enrichment summary for GPL1355. This table contains the number of gene sets per transcriptional component and per gene set database that were significantly enriched at a permutation-based false discovery rate of 5% for GPL1355 (Rat norvegicus, RG-230 2.0).

  11. 11.

    Supplementary Table 11

    Estimated sensitivity and specificity. We estimated the sensitivity for the detection of copy number variations of different segment length (gain and loss). In addition, specificity was estimated for the different segment lengths.

  12. 12.

    Supplementary Table 12

    Distribution of tumor subtypes. We annotated each of the 16,172 cancer samples according to 1 of 41 cancer subtypes. The distribution of these 41 cancer subtypes is shown here.

  13. 13.

    Supplementary Table 13

    Final meta-analysis statistics for gene association with the degree of genomic instability. Genome-wide association analysis between individual genes (i.e., functional genomic mRNA expression signal) and the degree of genomic instability in 15,204 unrelated, patient-derived tumor samples was performed. Association was determined by the Pearson product-moment correlation coefficient within meta-analysis batches (272 meta-analysis batches with a range of 10–488 tumor samples per batch). Meta-analysis P values were calculated according to the Liptak's trend method (z-transformed P values, weighted according to the square root of the number of samples in a meta-analysis batch).

Text files

  1. 1.

    Supplementary Table 14

    Predicted frequencies of increased signals. For each individual gene, the percentage of samples with a significantly increased signal was quantified across 41 tumor types.

  2. 2.

    Supplementary Table 15

    Predicted frequencies of decreased signals. For each individual gene, the percentage of samples with a significantly decreased signal was quantified across 41 tumor types.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/ng.3173

Further reading