Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Principles and methods of integrative genomic analyses in cancer

Key Points

  • Genomic, metabolomic and clinical data on a range of solid cancers and model systems are emerging and can be used to identify novel patient subgroups for tailored therapy and monitoring.

  • Molecular markers identified at the DNA, mRNA, microRNA and protein levels have been used to develop profiles associated with taxonomy, tumour aggressiveness, response to therapy and patient outcome.

  • The information content is higher in integrated analysis than in any of the molecular levels studied separately, and a large number of statistical methods for the integration of 'omics' data have emerged.

  • The access to large data sets that have been made available by the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) has made it possible to compare the performance of some of the statistical methods of omic data integration on the same data set.

  • These recent developments will fundamentally alter the way that we statistically model and evaluate treatment strategies, from identifying patient groups that respond to treatment above random, to identifying pathways and biological entities that are druggable and altered above random.

  • A shift from large randomized clinical trials towards treatment modalities that are tailored for stratified patient groups, down to N-of-1 trials, in which a single patient constitutes the entire trial, will require new statistical methods.

  • Outsourcing data and searching for solutions in open competition will allow new ideas to instantly emerge to 'embrace the complexity' that is associated with the exponentially increasing amounts of data and find new ways of shared analysis.

Abstract

Combined analyses of molecular data, such as DNA copy-number alteration, mRNA and protein expression, point to biological functions and molecular pathways being deregulated in multiple cancers. Genomic, metabolomic and clinical data from various solid cancers and model systems are emerging and can be used to identify novel patient subgroups for tailored therapy and monitoring. The integrative genomics methodologies that are used to interpret these data require expertise in different disciplines, such as biology, medicine, mathematics, statistics and bioinformatics, and they can seem daunting. The objectives, methods and computational tools of integrative genomics that are available to date are reviewed here, as is their implementation in cancer research.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: The systems biology of breast cancer.
Figure 2: Classifying breast cancer using unsupervised clustering.
Figure 3: Classifying breast cancer using PARADIGM.
Figure 4: Classifying breast cancer using clustering of clusters.
Figure 5: Classifying breast cancer using integrative clustering.

Similar content being viewed by others

References

  1. Hood, L., Heath, J. R., Phelps, M. E. & Lin, B. Systems biology and new technologies enable predictive and preventative medicine. Science 306, 640–643 (2004).

    Article  CAS  PubMed  Google Scholar 

  2. Ideker, T., Galitski, T. & Hood, L. A new approach to decoding life: systems biology. Annu. Rev. Genomics Hum. Genet. 2, 343–372 (2001).

    Article  CAS  PubMed  Google Scholar 

  3. Auffray, C. & Hood, L. Editorial: Systems biology and personalized medicine - the future is now. Biotechnol. J. 7, 938–939 (2012). This paper outlines the definitions and state of the art methodology in systems biology.

    Article  CAS  PubMed  Google Scholar 

  4. Tian, Q., Price, N. D. & Hood, L. Systems cancer medicine: towards realization of predictive, preventive, personalized and participatory (P4) medicine. J. Intern. Med. 271, 111–121 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Schadt, E. Eric Schadt. Interview by H. Craig Mak. Nature Biotech. 30, 769–770 (2012).

    Article  CAS  Google Scholar 

  6. Joyce, A. R. & Palsson, B. Ø. The model organism as a system: integrating 'omics' data sets. Nat. Rev. Mol. Cell. Biol. 7, 198–210 (2006).

    Article  CAS  PubMed  Google Scholar 

  7. Martin, M. Semantic Web may be cancer information's next step forward. J. Natl. Cancer Inst. 103, 1215–1218 (2011).

    Article  PubMed  Google Scholar 

  8. Forbes, S. A. et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 39, D945–D950 (2011).

    Article  CAS  PubMed  Google Scholar 

  9. Cheung, H. W. et al. Systematic investigation of genetic vulnerabilities across cancer cell lines reveals lineage-specific dependencies in ovarian cancer. Proc. Natl Acad. Sci. USA 108, 12372–12377 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Martin, M. Rewriting the mathematics of tumor growth. J. Natl Cancer Inst. 103, 1564–1565 (2011).

    Article  PubMed  Google Scholar 

  11. Forbes, S. A. et al. The Catalogue of Somatic Mutations in Cancer (COSMIC). Curr. Protoc. Hum. Genet. Chapter 10, Unit 10.11 (2008).

  12. Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. International Cancer Genome Consortium. International network of cancer genome projects. Nature 464, 993–998 (2010). This is a description and the first results of the ICGC, a worldwide endeavour to characterize a wide range of tumours by next-generation sequencing.

  14. The Cancer Genome Atlas Research Network. The Cancer Genome Atlas Pan-Cancer analysis project. Nature Genet. 45, 1113–1120 (2013).

  15. ENCODE Project Consortium. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9, e1001046 (2011). This is a genome-wide encyclopaedia of structural and regulatory elements in the genome.

  16. Quigley, D. A. et al. The 5p12 breast cancer susceptibility locus affects MRPS30 expression in estrogen-receptor positive tumors. Mol. Oncol. 8, 273–284 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Fletcher, M. N. C. et al. Master regulators of FGFR2 signalling and breast cancer risk. Nature Commun. 4, 2464 (2013).

    Article  CAS  Google Scholar 

  18. Brower, V. Epigenetics: Unravelling the cancer code. Nature 471, S12–13 (2011).

    Article  CAS  PubMed  Google Scholar 

  19. Chin, L., Andersen, J. N. & Futreal, P. A. Cancer genomics: from discovery science to personalized medicine. Nature Med. 17, 297–303 (2011).

    Article  CAS  PubMed  Google Scholar 

  20. Yuan, Y. et al. Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling. Sci. Transl. Med. 4, 157ra143–157ra143 (2012).

    Article  PubMed  Google Scholar 

  21. Kumar, V. et al. Radiomics: the process and the challenges. Magn. Reson. Imaging 30, 1234–1248 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Kilpinen, S. et al. Systematic bioinformatic analysis of expression levels of 17,330 human genes across 9,783 samples from 175 types of healthy and pathological tissues. Genome Biol. 9, R139 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Wong, A. K. et al. IMP: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks. Nucleic Acids Res. 40, W484–W490 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Engreitz, J. M., Daigle, B. J., Marshall, J. J. & Altman, R. B. Independent component analysis: mining microarray data for fundamental human gene expression modules. J. Biomed. Inform. 43, 932–944 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Engreitz, J. M. et al. ProfileChaser: searching microarray repositories based on genome-wide patterns of differential expression. Bioinformatics 27, 3317–3318 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Rhodes, D. R. et al. ONCOMINE: a cancer microarray database and integrated data-mining platform. Neoplasia 6, 1–6 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Madhavan, S. et al. Rembrandt: helping personalized medicine become a reality through integrative translational research. Mol. Cancer Res. 7, 157–167 (2009). This paper describes integrated genomic analyses in medicine.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Saito, R. et al. A travel guide to Cytoscape plugins. Nature Methods 9, 1069–1076 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Cline, M. S. et al. Integration of biological networks and gene expression data using Cytoscape. Nature Protocol. 2, 2366–2382 (2007). This paper describes a widely used space for genomic analysis and visualization.

    Article  CAS  Google Scholar 

  31. Gundem, G. et al. IntOGen: integration and data mining of multidimensional oncogenomic data. Nature Methods 7, 92–93 (2010).

    Article  CAS  PubMed  Google Scholar 

  32. Gonzalez-Perez, A. & López-Bigas, N. Functional impact bias reveals cancer drivers. Nucleic Acids Res. 40, e169 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Margolin, A. A. et al. Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci. Transl. Med. 5, 181re1–181re1 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Schadt, E. E., Linderman, M. D., Sorenson, J., Lee, L. & Nolan, G. P. Computational solutions to large-scale data management and analysis. Nature Rev. Genet. 11, 647–657 (2010).

    Article  CAS  PubMed  Google Scholar 

  35. Quigley, D. & Balmain, A. Systems genetics analysis of cancer susceptibility: from mouse models to humans. Nature Rev. Genet. 10, 651–657 (2009).

    Article  CAS  PubMed  Google Scholar 

  36. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013). This paper describes an integration of next-generation sequencing data from DNA and RNA levels that reveals the structure of many regulatory elements.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Chin, K. et al. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell 10, 529–541 (2006).

    Article  CAS  PubMed  Google Scholar 

  38. Lando, M. et al. Gene dosage, expression, and ontology analysis identifies driver genes in the carcinogenesis and chemoradioresistance of cervical cancer. PLoS Genet. 5, e1000719 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Sun, Z. et al. Integrated analysis of gene expression, CpG island methylation, and gene copy number in breast cancer cells by deep sequencing. PLoS ONE 6, e17490 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Ovaska, K. et al. Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme. Genome Med. 2, 65 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Aure, M. R. et al. Identifying in-trans process associated genes in breast cancer by integrated analysis of copy number and expression data. PLoS ONE 8, e53014 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Chibon, F. et al. Validated prediction of clinical outcome in sarcomas and multiple types of cancer on the basis of a gene expression signature related to genome complexity. Nature Med. 16, 781–787 (2010).

    Article  CAS  PubMed  Google Scholar 

  44. Chari, R., Coe, B. P., Vucic, E. A., Lockwood, W. W. & Lam, W. L. An integrative multi-dimensional genetic and epigenetic strategy to identify aberrant genes and pathways in cancer. BMC Syst. Biol. 4, 67 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Louhimo, R. & Hautaniemi, S. CNAmet: an R package for integrating copy number, methylation and expression data. Bioinformatics 27, 887–888 (2011).

    Article  CAS  PubMed  Google Scholar 

  46. R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/

  47. Shen, Y., Sun, W. & Li, K.-C. Dynamically weighted clustering with noise set. Bioinformatics 26, 341–347 (2010).

    Article  CAS  PubMed  Google Scholar 

  48. Shen, R. et al. Integrative subtype discovery in glioblastoma using iCluster. PLoS ONE 7, e35236 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Yuan, Y., Savage, R. S. & Markowetz, F. Patient-specific data fusion defines prognostic cancer subtypes. PLoS Comput. Biol. 7, e1002227 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Bøvelstad, H. M. et al. Predicting survival from microarray data—a comparative study. Bioinformatics 23, 2080–2087 (2007).

    Article  CAS  PubMed  Google Scholar 

  52. Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Statist. Soc. Series B. 58, 267–288 (1996).

    Google Scholar 

  53. Nowak, G., Hastie, T., Pollack, J. R. & Tibshirani, R. A fused lasso latent feature model for analyzing multi-sample aCGH data. Biostatistics 12, 776–791 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  54. Mankoo, P. K., Shen, R., Schultz, N., Levine, D. A. & Sander, C. Time to recurrence and survival in serous ovarian tumors predicted from integrated genomic profiles. PLoS ONE 6, e24709 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Statist. Soc.: Series B (Statist. Methodol.) 67, 301–320 (2005).

    Article  Google Scholar 

  56. Segal, E., Friedman, N., Koller, D. & Regev, A. A module map showing conditional activity of expression modules in cancer. Nature Genet. 36 1090–1098 (2004). This landmark publication establishes the principles of identification of regulatory modules.

    Article  PubMed  Google Scholar 

  57. Kelder, T. et al. WikiPathways: building research communities on biological pathways. Nucleic Acids Res. 40, D1301–D1307 (2012).

    Article  CAS  PubMed  Google Scholar 

  58. Rhee, S. Y., Wood, V., Dolinski, K. & Draghici, S. Use and misuse of the gene ontology annotations. Nature Rev. Genet. 9, 509–515 (2008).

    Article  CAS  PubMed  Google Scholar 

  59. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Dittrich, M. T., Klau, G. W., Rosenwald, A., Dandekar, T. & Müller, T. Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics 24, i223–i231 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Qiu, Y.-Q., Zhang, S., Zhang, X.-S. & Chen, L. Detecting disease associated modules and prioritizing active genes based on high throughput data. BMC Bioinformatics 11, 26 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Guo, Z. et al. Edge-based scoring and searching method for identifying condition-responsive protein-protein interaction sub-network. Bioinformatics 23, 2121–2128 (2007).

    Article  CAS  PubMed  Google Scholar 

  63. Chuang, H.-Y. et al. Subnetwork-based analysis of chronic lymphocytic leukemia identifies pathways that associate with disease progression. Blood 120, 2639–2649 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Doniger, S. W. et al. MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biol. 4, R7 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Tarca, A. L. et al. A novel signaling pathway impact analysis. Bioinformatics 25, 75–82 (2009).

    Article  CAS  PubMed  Google Scholar 

  66. Efroni, S., Schaefer, C. F. & Buetow, K. H. Identification of key processes underlying cancer phenotypes using biologic pathway analysis. PLoS ONE 2, e425 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Drier, Y., Sheffer, M. & Domany, E. Pathway-based personalized analysis of cancer. Proc. Natl Acad. Sci. USA 110, 6388–6393 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Huttenhower, C. et al. Detailing regulatory networks through large scale data integration. Bioinformatics 25, 3267–3274 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Huttenhower, C. et al. Exploring the human genome with functional maps. Genome Res. 19, 1093–1106 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Mayer, C.-D., Lorent, J. & Horgan, G. W. Exploratory analysis of multiple omics datasets using the adjusted RV coefficient. Stat. Appl. Genet. Mol. Biol. 10, Article 14 (2011).

  71. Quigley, D. A. et al. Genetic architecture of mouse skin inflammation and tumour susceptibility. Nature 458, 505–508 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Lê Cao, K.-A., González, I. & Déjean, S. integrOmics: an R package to unravel relationships between two omics datasets. Bioinformatics 25, 2855–2856 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Margolin, A. A., Wang, K., Califano, A. & Nemenman, I. Multivariate dependence and genetic networks inference. IET Syst. Biol. 4, 428–440 (2010).

    Article  CAS  PubMed  Google Scholar 

  75. Margolin, A. A. & Califano, A. Theory and limitations of genetic network inference from microarray data. Ann. NY Acad. Sci. 1115, 51–72 (2007).

    Article  CAS  PubMed  Google Scholar 

  76. Koller, D. & Friedman, N. Probabilistic graphical models: principles and techniques. (Massachusetts Institute of Technology, 2009). This study describes one of the basic approaches for studying gene–gene dependencies.

    Google Scholar 

  77. Califano, A., Butte, A. J., Friend, S., Ideker, T. & Schadt, E. Leveraging models of cell regulation and GWAS data in integrative network-based association studies. Nature Genet. 44, 841–847 (2012)). This paper describes a fundamental attempt to identify genotype–phenotype interactions.

    Article  CAS  PubMed  Google Scholar 

  78. Ideker, T., Ozier, O., Schwikowski, B. & Siegel, A. F. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18 (Suppl. 1), S233–240 (2002).

    Article  PubMed  Google Scholar 

  79. Breitling, R., Amtmann, A. & Herzyk, P. Graph-based iterative Group Analysis enhances microarray interpretation. BMC Bioinformatics 5, 100 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Ideker, T. & Krogan, N. J. Differential network biology. Mol. Syst. Biol. 8, 565 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  81. Stingo, F. C. & Vannucci, M. Variable selection for discriminant analysis with Markov random field priors for the analysis of microarray data. Bioinformatics 27, 495–501 (2011).

    Article  CAS  PubMed  Google Scholar 

  82. Bauer, S., Gagneur, J. & Robinson, P. N. GOing Bayesian: model-based gene set analysis of genome-scale data. Nucleic Acids Res. 38, 3523–3532 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Newton, M. A., He, Q. & Kendziorski, C. A model-based analysis to infer the functional content of a gene list. Stat. Appl. Genet. Mol. Biol. 11, http://dx.doi.org/10.2202/1544-6115.1716 (2012).

  84. Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature Genet. 34, 166–176 (2003).

    Article  PubMed  Google Scholar 

  85. Segal, E., Friedman, N., Kaminski, N., Regev, A. & Koller, D. From signatures to models: understanding cancer using microarrays. Nature Genet. 37 S38–S45 (2005).

    Article  CAS  PubMed  Google Scholar 

  86. Vaske, C. J. et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 26, i237–i245 (2010). This paper describes an application of approaches from the probabilistic graphical models in the identification of pathways or dependencies deviating from a given norm.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Kristensen, V. N. et al. Integrated molecular profiles of invasive breast tumors and ductal carcinoma in situ (DCIS) reveal differential vascular and interleukin signaling. Proc. Natl Acad. Sci. USA 109, 2802–2807 (2012).

    Article  PubMed  Google Scholar 

  88. Ferkingstad, E., Frigessi, A. & Lyng, H. Indirect genomic effects on survival from gene expression data. Genome Biol. 9, R58 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Imoto, S. et al. Combining microarrays and biological knowledge for estimating gene networks via bayesian networks. J. Bioinform. Comput. Biol. 2, 77–98 (2004).

    Article  CAS  PubMed  Google Scholar 

  90. Bottolo, L. et al. Bayesian detection of expression quantitative trait loci hot spots. Genetics 189, 1449–1459 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Akavia, U. D. et al. An integrated approach to uncover drivers of cancer. Cell 143, 1005–1017 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Birtwistle, M. R. et al. Ligand-dependent responses of the ErbB signaling network: experimental and modeling analyses. Mol. Syst. Biol. 3, 144 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Nik-Zainal, S. A. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Shah, S. P. et al. The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature 486, 395–399 (2012).

    Article  CAS  PubMed  Google Scholar 

  95. Cancer, Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).

  96. Ciriello, G. et al. Emerging landscape of oncogenic signatures across human cancers. Nature Genet. 45, 1127–1133 (2013).

    Article  CAS  PubMed  Google Scholar 

  97. Zack, T. I. et al. Pan-cancer patterns of somatic copy number alteration. Nature Genet. 45, 1134–1140 (2013).

    Article  CAS  PubMed  Google Scholar 

  98. Cancer Genome Atlas Research Network. The Cancer Genome Atlas Pan-Cancer analysis project. Nature Genet. 45, 1113–1120 (2013).

  99. Newman, M. E. J. Fast algorithm for detecting community structure in networks. Phys. Rev. E Stat. Nonlin Soft Matter Phys. 69, 066133 (2004).

    Article  CAS  PubMed  Google Scholar 

  100. Louhimo, R., Lepikhova, T., Monni, O. & Hautaniemi, S. Comparative analysis of algorithms for integration of copy number and expression data. Nature Methods 9, 351–355 (2012).

    Article  CAS  PubMed  Google Scholar 

  101. Solvang, H. K., Lingjærde, O. C., Frigessi, A., Børresen-Dale, A.-L. & Kristensen, V. N. Linear and non-linear dependencies between copy number aberrations and mRNA expression reveal distinct molecular pathways in breast cancer. BMC Bioinformatics 12, 197 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  102. Heiser, L. M. et al. Subtype and pathway specific responses to anticancer compounds in breast cancer. Proc. Natl Acad. Sci. USA 109, 2724–2729 (2012).

    Article  PubMed  Google Scholar 

  103. Hoshino, D. et al. Network analysis of the focal adhesion to invadopodia transition identifies a PI3K-PKCα invasive signaling axis. Sci. Signal. 5, ra66 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Stronach, E. A. et al. DNA-PK mediates AKT activation and apoptosis inhibition in clinically acquired platinum resistance. Neoplasia 13, 1069–1080 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Mok, T. S. et al. Gefitinib or carboplatin-paclitaxel in pulmonary adenocarcinoma. N. Engl. J. Med. 361, 947–957 (2009).

    Article  CAS  PubMed  Google Scholar 

  106. Shepherd, F. A. et al. Erlotinib in previously treated non-small-cell lung cancer. N. Engl. J. Med. 353, 123–132 (2005).

    Article  CAS  PubMed  Google Scholar 

  107. Piccart-Gebhart, M. J. et al. Trastuzumab after adjuvant chemotherapy in HER2-positive breast cancer. N. Engl. J. Med. 353, 1659–1672 (2005).

    Article  CAS  PubMed  Google Scholar 

  108. Romond, E. H. et al. Trastuzumab plus adjuvant chemotherapy for operable HER2-positive breast cancer. N. Engl. J. Med. 353, 1673–1684 (2005).

    Article  CAS  PubMed  Google Scholar 

  109. Chapman, P. B. et al. Improved survival with vemurafenib in melanoma with BRAF V600E mutation. N. Engl. J. Med. 364, 2507–2516 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Jonker, D. J. et al. Cetuximab for the treatment of colorectal cancer. N. Engl. J. Med. 357, 2040–2048 (2007).

    Article  CAS  PubMed  Google Scholar 

  111. Karapetis, C. S. et al. K-ras mutations and benefit from cetuximab in advanced colorectal cancer. N. Engl. J. Med. 359, 1757–1765 (2008).

    Article  CAS  PubMed  Google Scholar 

  112. Iadevaia, S., Lu, Y., Morales, F. C., Mills, G. B. & Ram, P. T. Identification of optimal drug combinations targeting cellular networks: integrating phospho-proteomics and computational network analysis. Cancer Res. 70, 6704–6714 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  113. van de Vijver, M. J. et al. A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347, 1999–2009 (2002).

    Article  CAS  PubMed  Google Scholar 

  114. Paik, S. et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 351, 2817–2826 (2004).

    Article  CAS  PubMed  Google Scholar 

  115. Cooper, S. et al. Predicting protein structures with a multiplayer online game. Nature 466, 756–760 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  116. Radivojac, P. et al. A large-scale evaluation of computational protein function prediction. Nature Methods 10, 221–227 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Cheng, W.-Y., Ou Yang, T.-H. & Anastassiou, D. Biomolecular events in cancer revealed by attractor metagenes. PLoS Comput. Biol. 9, e1002920 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Cheng, W.-Y., Ou Yang, T.-H. & Anastassiou, D. Development of a prognostic model for breast cancer survival in an open challenge environment. Sci. Transl. Med. 5, 181ra50–181ra50 (2013).

    Article  PubMed  Google Scholar 

  119. Perou, C. M. et al. Molecular portraits of human breast tumours. Nature 406, 747–752 (2000).

    Article  CAS  PubMed  Google Scholar 

  120. Sørlie, T. et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl Acad. Sci. USA 98, 10869–10874 (2001).

    Article  PubMed  PubMed Central  Google Scholar 

  121. Sørlie, T. et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc. Natl Acad. Sci. USA 100, 8418–8423 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Russnes, H. G. et al. Genomic architecture characterizes tumor progression paths and fate in breast cancer patients.Sci. Transl. Med. 2, 38ra47–38ra47 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Chin, S.-F. et al. Using array-comparative genomic hybridization to define molecular portraits of primary breast cancers. Oncogene 26, 1959–1970 (2007).

    Article  CAS  PubMed  Google Scholar 

  124. Stephens, P. J. et al. The landscape of cancer genes and mutational processes in breast cancer. Nature 486, 400–404 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  125. Cancer Genome Atlas Research Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).

  126. Naume, B. et al. Presence of bone marrow micrometastasis is associated with different recurrence risk within molecular subtypes of breast cancer. Mol. Oncol. 1, 160–171 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  127. Nordgard, S. H. et al. Genome-wide analysis identifies 16q deletion associated with survival, molecular subtypes, mRNA expression, and germline haplotypes in breast cancer patients. Genes Chromosomes Cancer 47, 680–696 (2008).

    Article  CAS  PubMed  Google Scholar 

  128. Rønneberg, J. A. et al. Methylation profiling with a panel of cancer related genes: association with estrogen receptor, TP53 mutation status and expression subtypes in sporadic breast cancer. Mol. Oncol. 5, 61–76 (2011).

    Article  CAS  PubMed  Google Scholar 

  129. Enerly, E. et al. miRNA-mRNA integrated analysis reveals roles for miRNAs in primary breast tumors. PLoS ONE 6, e16915 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Joshi, H., Bhanot, G., Børresen-Dale, A.-L. & Kristensen, V. N. Potential tumorigenic programs associated with TP53 mutation status reveal role of VEGF pathway. Br. J. Cancer 107, 1722–1728 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  131. Stephens, P. J. et al. Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature 462, 1005–1010 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Sun, Z. et al. Batch effect correction for genome-wide methylation data with Illumina Infinium platform. BMC Med. Genom. 4, 84 (2011).

    Article  CAS  Google Scholar 

  133. Strehl, A. & Ghosh, J. Cluster ensembles — a knowledge reuse framework for combining partitionings. Journal of Machine Learning 3, 583–617 (2002).

    Google Scholar 

  134. Monti, S., Tamayo, P., Mesirov, J. & Golub, T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learn. 52, 91–118 (2003).

    Article  Google Scholar 

  135. Collisson, E. A. et al. Subtypes of pancreatic ductal adenocarcinoma and their differing responses to therapy. Nature Med. 17, 500–503 (2011).

    Article  CAS  PubMed  Google Scholar 

  136. Lancichinetti, A. & Fortunato, S. Consensus clustering in complex networks. Sci. Rep. 2, 336 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Lee, M. & Kim, Y. CHESS (CgHExpreSS): a comprehensive analysis tool for the analysis of genomic alterations and their effects on the expression profile of the genome. BMC Bioinformatics 10, 424 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. Shen, R., Olshen, A. B. & Ladanyi, M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25, 2906–2912 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  139. Leday, G. G. R. & van de Wiel, M. A. PLRS: a flexible tool for the joint analysis of DNA copy number and mRNA expression data. Bioinformatics 29, 1081–1082 (2013).

    Article  CAS  PubMed  Google Scholar 

  140. Chen, B.-J. et al. Harnessing gene expression to identify the genetic basis of drug resistance. Mol. Syst. Biol. 5, 310 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  141. Yuan, Y., Curtis, C., Caldas, C. & Markowetz, F. A. Sparse regulatory network of copy-number driven gene expression reveals putative breast cancer oncogenes. IEEE/ACM Trans. Comput. Biol. Bioinform. 9, 947–954 (2012).

    Article  CAS  PubMed  Google Scholar 

  142. Carro, M. S. et al. The transcriptional network for mesenchymal transformation of brain tumours. Nature 463, 318–325 (2010).

    Article  CAS  PubMed  Google Scholar 

  143. Saadi, A. et al. Stromal genes discriminate preinvasive from invasive disease, predict outcome, and highlight inflammatory pathways in digestive cancers. Proc. Natl Acad. Sci. USA 107, 2177–2182 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  144. Hamatani, T. et al. Global gene expression analysis identifies molecular pathways distinguishing blastocyst dormancy and activation. Proc. Natl Acad. Sci. USA 101, 10326–10331 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  145. Draghici, S. et al. A systems biology approach for pathway level analysis. Genome Res. 17, 1537–1545 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  146. Engström, P. G. et al. Digital transcriptome profiling of normal and glioblastoma-derived neural stem cells identifies genes associated with patient survival. Genome Med. 4, 76 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. Wu, J., Mao, X., Cai, T., Luo, J. & Wei, L. KOBAS server: a web-based platform for automated annotation and pathway identification. Nucleic Acids Res. 34, W720–W724 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. Xie, C. et al. KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res. 39, W316–W322 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  149. Li, C. et al. SubpathwayMiner: a software package for flexible identification of pathways. Nucleic Acids Res. 37, e131–e131 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  150. Chang, H.-T. et al. Comprehensive analysis of microRNAs in breast cancer. BMCGenomics 13, S18 (2012).

    Google Scholar 

  151. Tamborero, D., Lopez-Bigas, N. & Gonzalez-Perez, A. Oncodrive-CIS: a method to reveal likely driver genes based on the impact of their copy number changes on expression. PLoS ONE 8, e55489 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  152. Warsow, G. et al. ExprEssence—revealing the essence of differential experimental data in the context of an interaction/regulation net-work. BMC Syst. Biol. 4, 164 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  153. Deshpande, R., Sharma, S., Verfaillie, C. M., Hu, W.-S. & Myers, C. L. A scalable approach for discovering conserved active subnetworks across species. PLoS Comput. Biol. 6, e1001028 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  154. Goffard, N., Frickey, T. & Weiller, G. PathExpress update: the enzyme neighbourhood method of associating gene-expression data with metabolic pathways. Nucleic Acids Res. 37, W335–W339 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  155. Bryant, W. A., Sternberg, M. J. E. & Pinney, J. W. AMBIENT: Active Modules for Bipartite Networks—using high-throughput transcriptomic data to dissect metabolic response. BMC Syst. Biol. 7, 26 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  156. Kirk, P., Griffin, J. E., Savage, R. S., Ghahramani, Z. & Wild, D. L. Bayesian correlated clustering to integrate multiple datasets. Bioinformatics 28, 3290–3297 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  157. Brodtkorb, M. et al. Whole-genome integrative analysis reveals expression signatures predicting transformation in follicular lymphoma. Blood, 123,1051–1054 (2014).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank numerous collaborators, most notably D. Quigley, R. Sachidanandam, S. Hautaniemi, P. van Loo and C. Vaske for the critical reading of the manuscript and for sharing their overview of the field and valuable discussions. Special thanks to C. Perou and C. Creighton of The Cancer Genome Atlas (TCGA) and O. Rueda and C. Caldas of the METABRIC study, as well as M. M. Holmen, from Oslo University Hospital for providing original images. The authors also thank the Norwegian Cancer Society, the K.G. Jebsen Foundation, the Norwegian Research Council, Health Region South East, and the Norwegian Radium Hospital's Foundation for financial support over many years.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anne-Lise Børresen-Dale.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

FURTHER INFORMATION

Cancer Genome Project

Catalogue of Somatic Mutations in Cancer (COSMIC) database

ENCyclopedia Of DNA Elements (ENCODE)

International Cancer Genome Consortium (ICGC)

NCI/TCGA

The Cancer Genome Atlas (TCGA)

Bioconductor

Bionimbus

CytoScape

Federation of SAGE

Synapse

HPRD

Kyoto Encyclopedia of Genes and Genomes (KEGG)

MIPS (Mammalian protein—protein interaction)

PID Pathway Interaction Database (NCI)

Reactome

WikiPathways

Biowaver

DAVID

GOLEM

GRIFn

HEFalMp

Mefit

MsigDB Molecular Signatures Database

Oncomine

Rembrandt

Search-Based Exploration of Expression Compendium (SEEK)

Sleipnir

Summary of gene ontology tools

Combinatorial ALgorithm for Expression and Sequencebased Cluster Extraction (COALESCE)

Copy Number and Expression In Cancer (CONNEXIC)

DR-Integrator

IntOGen

Magellan

OncoDrive

PAthway Recognition Algorithm using Data Integration on Genomic Models (PARADIGM)

PowerPoint slides

Glossary

Information theory

A branch of applied mathematics that quantifies the value of information in data.

Bioconductor

A free, open-source and open-development software project for the analysis of high-throughput genomic data. Based on the statistical programming language R, the project was started in 2001 and now contains more than 750 packages to carry out data handling, visualization and analysis.

Expression quantitative trait loci

(eQTL). Genomic loci that regulate expression levels of mRNAs or proteins.

Over-fitting

In statistics, over-fitting occurs when a statistical model describes random noise instead of the underlying relationship.

T-test statistic

T-tests are used to determine whether the mean of a continuous variable is different in two groups of individuals. It is based on a quantity called a t-test statistic, which is computed from the data and reflects the signal-to-noise ratio.

Expectation-maximization algorithm

(EM algorithm). An iterative algorithm for the estimation of parameters in statistical models depending on unobserved variables. A limitation with EM is that it requires specification of initial values for the iteration, and the estimated parameters may depend on these.

Lasso

A shrinkage and variable selection method for linear regression, used in particular when there are many covariates (for example, genes).

Maximum entropy techniques

An alternative to maximum likelihood, maximum entropy techniques are a way to estimate models from data, by finding the most random probability distribution that fits the data.

Simulated annealing

A global optimization algorithm that seeks a good approximation to the point of absolute maximum of a function.

Greedy search algorithms

In optimization, a greedy algorithm is an iterative algorithm that takes an optimal (or semi-optimal) choice at every step, in the hope of obtaining the global solution at convergence. These algorithms do not generally result in optimal solutions and are used when the determination of a global solution would require an unacceptable amount of computing time.

Bayesian approach

An approach to statistics that involves starting from our current (a priori) level of knowledge, collecting data and then using both to infer our (a posteriori) knowledge. Bayesian inference allows the incorporation of additional external knowledge into the estimation process.

Latent variables

In statistics, latent variables (as opposed to observable data) are not measured but must be estimated from data, similar to parameters. However, contrary to parameters, latent variables are random and have a distribution. Latent models are inherently Bayesian.

Support vector machines

In machine learning, support vector machines are supervised learning models that are used for classification and regression analysis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kristensen, V., Lingjærde, O., Russnes, H. et al. Principles and methods of integrative genomic analyses in cancer. Nat Rev Cancer 14, 299–313 (2014). https://doi.org/10.1038/nrc3721

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrc3721

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer