Key Points
-
Genomic, metabolomic and clinical data on a range of solid cancers and model systems are emerging and can be used to identify novel patient subgroups for tailored therapy and monitoring.
-
Molecular markers identified at the DNA, mRNA, microRNA and protein levels have been used to develop profiles associated with taxonomy, tumour aggressiveness, response to therapy and patient outcome.
-
The information content is higher in integrated analysis than in any of the molecular levels studied separately, and a large number of statistical methods for the integration of 'omics' data have emerged.
-
The access to large data sets that have been made available by the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) has made it possible to compare the performance of some of the statistical methods of omic data integration on the same data set.
-
These recent developments will fundamentally alter the way that we statistically model and evaluate treatment strategies, from identifying patient groups that respond to treatment above random, to identifying pathways and biological entities that are druggable and altered above random.
-
A shift from large randomized clinical trials towards treatment modalities that are tailored for stratified patient groups, down to N-of-1 trials, in which a single patient constitutes the entire trial, will require new statistical methods.
-
Outsourcing data and searching for solutions in open competition will allow new ideas to instantly emerge to 'embrace the complexity' that is associated with the exponentially increasing amounts of data and find new ways of shared analysis.
Abstract
Combined analyses of molecular data, such as DNA copy-number alteration, mRNA and protein expression, point to biological functions and molecular pathways being deregulated in multiple cancers. Genomic, metabolomic and clinical data from various solid cancers and model systems are emerging and can be used to identify novel patient subgroups for tailored therapy and monitoring. The integrative genomics methodologies that are used to interpret these data require expertise in different disciplines, such as biology, medicine, mathematics, statistics and bioinformatics, and they can seem daunting. The objectives, methods and computational tools of integrative genomics that are available to date are reviewed here, as is their implementation in cancer research.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Hood, L., Heath, J. R., Phelps, M. E. & Lin, B. Systems biology and new technologies enable predictive and preventative medicine. Science 306, 640–643 (2004).
Ideker, T., Galitski, T. & Hood, L. A new approach to decoding life: systems biology. Annu. Rev. Genomics Hum. Genet. 2, 343–372 (2001).
Auffray, C. & Hood, L. Editorial: Systems biology and personalized medicine - the future is now. Biotechnol. J. 7, 938–939 (2012). This paper outlines the definitions and state of the art methodology in systems biology.
Tian, Q., Price, N. D. & Hood, L. Systems cancer medicine: towards realization of predictive, preventive, personalized and participatory (P4) medicine. J. Intern. Med. 271, 111–121 (2012).
Schadt, E. Eric Schadt. Interview by H. Craig Mak. Nature Biotech. 30, 769–770 (2012).
Joyce, A. R. & Palsson, B. Ø. The model organism as a system: integrating 'omics' data sets. Nat. Rev. Mol. Cell. Biol. 7, 198–210 (2006).
Martin, M. Semantic Web may be cancer information's next step forward. J. Natl. Cancer Inst. 103, 1215–1218 (2011).
Forbes, S. A. et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 39, D945–D950 (2011).
Cheung, H. W. et al. Systematic investigation of genetic vulnerabilities across cancer cell lines reveals lineage-specific dependencies in ovarian cancer. Proc. Natl Acad. Sci. USA 108, 12372–12377 (2011).
Martin, M. Rewriting the mathematics of tumor growth. J. Natl Cancer Inst. 103, 1564–1565 (2011).
Forbes, S. A. et al. The Catalogue of Somatic Mutations in Cancer (COSMIC). Curr. Protoc. Hum. Genet. Chapter 10, Unit 10.11 (2008).
Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009).
International Cancer Genome Consortium. International network of cancer genome projects. Nature 464, 993–998 (2010). This is a description and the first results of the ICGC, a worldwide endeavour to characterize a wide range of tumours by next-generation sequencing.
The Cancer Genome Atlas Research Network. The Cancer Genome Atlas Pan-Cancer analysis project. Nature Genet. 45, 1113–1120 (2013).
ENCODE Project Consortium. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9, e1001046 (2011). This is a genome-wide encyclopaedia of structural and regulatory elements in the genome.
Quigley, D. A. et al. The 5p12 breast cancer susceptibility locus affects MRPS30 expression in estrogen-receptor positive tumors. Mol. Oncol. 8, 273–284 (2013).
Fletcher, M. N. C. et al. Master regulators of FGFR2 signalling and breast cancer risk. Nature Commun. 4, 2464 (2013).
Brower, V. Epigenetics: Unravelling the cancer code. Nature 471, S12–13 (2011).
Chin, L., Andersen, J. N. & Futreal, P. A. Cancer genomics: from discovery science to personalized medicine. Nature Med. 17, 297–303 (2011).
Yuan, Y. et al. Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling. Sci. Transl. Med. 4, 157ra143–157ra143 (2012).
Kumar, V. et al. Radiomics: the process and the challenges. Magn. Reson. Imaging 30, 1234–1248 (2012).
Kilpinen, S. et al. Systematic bioinformatic analysis of expression levels of 17,330 human genes across 9,783 samples from 175 types of healthy and pathological tissues. Genome Biol. 9, R139 (2008).
Wong, A. K. et al. IMP: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks. Nucleic Acids Res. 40, W484–W490 (2012).
Engreitz, J. M., Daigle, B. J., Marshall, J. J. & Altman, R. B. Independent component analysis: mining microarray data for fundamental human gene expression modules. J. Biomed. Inform. 43, 932–944 (2010).
Engreitz, J. M. et al. ProfileChaser: searching microarray repositories based on genome-wide patterns of differential expression. Bioinformatics 27, 3317–3318 (2011).
Rhodes, D. R. et al. ONCOMINE: a cancer microarray database and integrated data-mining platform. Neoplasia 6, 1–6 (2004).
Madhavan, S. et al. Rembrandt: helping personalized medicine become a reality through integrative translational research. Mol. Cancer Res. 7, 157–167 (2009). This paper describes integrated genomic analyses in medicine.
Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
Saito, R. et al. A travel guide to Cytoscape plugins. Nature Methods 9, 1069–1076 (2012).
Cline, M. S. et al. Integration of biological networks and gene expression data using Cytoscape. Nature Protocol. 2, 2366–2382 (2007). This paper describes a widely used space for genomic analysis and visualization.
Gundem, G. et al. IntOGen: integration and data mining of multidimensional oncogenomic data. Nature Methods 7, 92–93 (2010).
Gonzalez-Perez, A. & López-Bigas, N. Functional impact bias reveals cancer drivers. Nucleic Acids Res. 40, e169 (2012).
Margolin, A. A. et al. Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci. Transl. Med. 5, 181re1–181re1 (2013).
Schadt, E. E., Linderman, M. D., Sorenson, J., Lee, L. & Nolan, G. P. Computational solutions to large-scale data management and analysis. Nature Rev. Genet. 11, 647–657 (2010).
Quigley, D. & Balmain, A. Systems genetics analysis of cancer susceptibility: from mouse models to humans. Nature Rev. Genet. 10, 651–657 (2009).
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013). This paper describes an integration of next-generation sequencing data from DNA and RNA levels that reveals the structure of many regulatory elements.
Chin, K. et al. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell 10, 529–541 (2006).
Lando, M. et al. Gene dosage, expression, and ontology analysis identifies driver genes in the carcinogenesis and chemoradioresistance of cervical cancer. PLoS Genet. 5, e1000719 (2009).
Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010).
Sun, Z. et al. Integrated analysis of gene expression, CpG island methylation, and gene copy number in breast cancer cells by deep sequencing. PLoS ONE 6, e17490 (2011).
Ovaska, K. et al. Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme. Genome Med. 2, 65 (2010).
Aure, M. R. et al. Identifying in-trans process associated genes in breast cancer by integrated analysis of copy number and expression data. PLoS ONE 8, e53014 (2013).
Chibon, F. et al. Validated prediction of clinical outcome in sarcomas and multiple types of cancer on the basis of a gene expression signature related to genome complexity. Nature Med. 16, 781–787 (2010).
Chari, R., Coe, B. P., Vucic, E. A., Lockwood, W. W. & Lam, W. L. An integrative multi-dimensional genetic and epigenetic strategy to identify aberrant genes and pathways in cancer. BMC Syst. Biol. 4, 67 (2010).
Louhimo, R. & Hautaniemi, S. CNAmet: an R package for integrating copy number, methylation and expression data. Bioinformatics 27, 887–888 (2011).
R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/
Shen, Y., Sun, W. & Li, K.-C. Dynamically weighted clustering with noise set. Bioinformatics 26, 341–347 (2010).
Shen, R. et al. Integrative subtype discovery in glioblastoma using iCluster. PLoS ONE 7, e35236 (2012).
Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012).
Yuan, Y., Savage, R. S. & Markowetz, F. Patient-specific data fusion defines prognostic cancer subtypes. PLoS Comput. Biol. 7, e1002227 (2011).
Bøvelstad, H. M. et al. Predicting survival from microarray data—a comparative study. Bioinformatics 23, 2080–2087 (2007).
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Statist. Soc. Series B. 58, 267–288 (1996).
Nowak, G., Hastie, T., Pollack, J. R. & Tibshirani, R. A fused lasso latent feature model for analyzing multi-sample aCGH data. Biostatistics 12, 776–791 (2011).
Mankoo, P. K., Shen, R., Schultz, N., Levine, D. A. & Sander, C. Time to recurrence and survival in serous ovarian tumors predicted from integrated genomic profiles. PLoS ONE 6, e24709 (2011).
Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Statist. Soc.: Series B (Statist. Methodol.) 67, 301–320 (2005).
Segal, E., Friedman, N., Koller, D. & Regev, A. A module map showing conditional activity of expression modules in cancer. Nature Genet. 36 1090–1098 (2004). This landmark publication establishes the principles of identification of regulatory modules.
Kelder, T. et al. WikiPathways: building research communities on biological pathways. Nucleic Acids Res. 40, D1301–D1307 (2012).
Rhee, S. Y., Wood, V., Dolinski, K. & Draghici, S. Use and misuse of the gene ontology annotations. Nature Rev. Genet. 9, 509–515 (2008).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Dittrich, M. T., Klau, G. W., Rosenwald, A., Dandekar, T. & Müller, T. Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics 24, i223–i231 (2008).
Qiu, Y.-Q., Zhang, S., Zhang, X.-S. & Chen, L. Detecting disease associated modules and prioritizing active genes based on high throughput data. BMC Bioinformatics 11, 26 (2010).
Guo, Z. et al. Edge-based scoring and searching method for identifying condition-responsive protein-protein interaction sub-network. Bioinformatics 23, 2121–2128 (2007).
Chuang, H.-Y. et al. Subnetwork-based analysis of chronic lymphocytic leukemia identifies pathways that associate with disease progression. Blood 120, 2639–2649 (2012).
Doniger, S. W. et al. MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biol. 4, R7 (2003).
Tarca, A. L. et al. A novel signaling pathway impact analysis. Bioinformatics 25, 75–82 (2009).
Efroni, S., Schaefer, C. F. & Buetow, K. H. Identification of key processes underlying cancer phenotypes using biologic pathway analysis. PLoS ONE 2, e425 (2007).
Drier, Y., Sheffer, M. & Domany, E. Pathway-based personalized analysis of cancer. Proc. Natl Acad. Sci. USA 110, 6388–6393 (2013).
Huttenhower, C. et al. Detailing regulatory networks through large scale data integration. Bioinformatics 25, 3267–3274 (2009).
Huttenhower, C. et al. Exploring the human genome with functional maps. Genome Res. 19, 1093–1106 (2009).
Mayer, C.-D., Lorent, J. & Horgan, G. W. Exploratory analysis of multiple omics datasets using the adjusted RV coefficient. Stat. Appl. Genet. Mol. Biol. 10, Article 14 (2011).
Quigley, D. A. et al. Genetic architecture of mouse skin inflammation and tumour susceptibility. Nature 458, 505–508 (2009).
Lê Cao, K.-A., González, I. & Déjean, S. integrOmics: an R package to unravel relationships between two omics datasets. Bioinformatics 25, 2855–2856 (2009).
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).
Margolin, A. A., Wang, K., Califano, A. & Nemenman, I. Multivariate dependence and genetic networks inference. IET Syst. Biol. 4, 428–440 (2010).
Margolin, A. A. & Califano, A. Theory and limitations of genetic network inference from microarray data. Ann. NY Acad. Sci. 1115, 51–72 (2007).
Koller, D. & Friedman, N. Probabilistic graphical models: principles and techniques. (Massachusetts Institute of Technology, 2009). This study describes one of the basic approaches for studying gene–gene dependencies.
Califano, A., Butte, A. J., Friend, S., Ideker, T. & Schadt, E. Leveraging models of cell regulation and GWAS data in integrative network-based association studies. Nature Genet. 44, 841–847 (2012)). This paper describes a fundamental attempt to identify genotype–phenotype interactions.
Ideker, T., Ozier, O., Schwikowski, B. & Siegel, A. F. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18 (Suppl. 1), S233–240 (2002).
Breitling, R., Amtmann, A. & Herzyk, P. Graph-based iterative Group Analysis enhances microarray interpretation. BMC Bioinformatics 5, 100 (2004).
Ideker, T. & Krogan, N. J. Differential network biology. Mol. Syst. Biol. 8, 565 (2012).
Stingo, F. C. & Vannucci, M. Variable selection for discriminant analysis with Markov random field priors for the analysis of microarray data. Bioinformatics 27, 495–501 (2011).
Bauer, S., Gagneur, J. & Robinson, P. N. GOing Bayesian: model-based gene set analysis of genome-scale data. Nucleic Acids Res. 38, 3523–3532 (2010).
Newton, M. A., He, Q. & Kendziorski, C. A model-based analysis to infer the functional content of a gene list. Stat. Appl. Genet. Mol. Biol. 11, http://dx.doi.org/10.2202/1544-6115.1716 (2012).
Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature Genet. 34, 166–176 (2003).
Segal, E., Friedman, N., Kaminski, N., Regev, A. & Koller, D. From signatures to models: understanding cancer using microarrays. Nature Genet. 37 S38–S45 (2005).
Vaske, C. J. et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 26, i237–i245 (2010). This paper describes an application of approaches from the probabilistic graphical models in the identification of pathways or dependencies deviating from a given norm.
Kristensen, V. N. et al. Integrated molecular profiles of invasive breast tumors and ductal carcinoma in situ (DCIS) reveal differential vascular and interleukin signaling. Proc. Natl Acad. Sci. USA 109, 2802–2807 (2012).
Ferkingstad, E., Frigessi, A. & Lyng, H. Indirect genomic effects on survival from gene expression data. Genome Biol. 9, R58 (2008).
Imoto, S. et al. Combining microarrays and biological knowledge for estimating gene networks via bayesian networks. J. Bioinform. Comput. Biol. 2, 77–98 (2004).
Bottolo, L. et al. Bayesian detection of expression quantitative trait loci hot spots. Genetics 189, 1449–1459 (2011).
Akavia, U. D. et al. An integrated approach to uncover drivers of cancer. Cell 143, 1005–1017 (2010).
Birtwistle, M. R. et al. Ligand-dependent responses of the ErbB signaling network: experimental and modeling analyses. Mol. Syst. Biol. 3, 144 (2007).
Nik-Zainal, S. A. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
Shah, S. P. et al. The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature 486, 395–399 (2012).
Cancer, Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).
Ciriello, G. et al. Emerging landscape of oncogenic signatures across human cancers. Nature Genet. 45, 1127–1133 (2013).
Zack, T. I. et al. Pan-cancer patterns of somatic copy number alteration. Nature Genet. 45, 1134–1140 (2013).
Cancer Genome Atlas Research Network. The Cancer Genome Atlas Pan-Cancer analysis project. Nature Genet. 45, 1113–1120 (2013).
Newman, M. E. J. Fast algorithm for detecting community structure in networks. Phys. Rev. E Stat. Nonlin Soft Matter Phys. 69, 066133 (2004).
Louhimo, R., Lepikhova, T., Monni, O. & Hautaniemi, S. Comparative analysis of algorithms for integration of copy number and expression data. Nature Methods 9, 351–355 (2012).
Solvang, H. K., Lingjærde, O. C., Frigessi, A., Børresen-Dale, A.-L. & Kristensen, V. N. Linear and non-linear dependencies between copy number aberrations and mRNA expression reveal distinct molecular pathways in breast cancer. BMC Bioinformatics 12, 197 (2011).
Heiser, L. M. et al. Subtype and pathway specific responses to anticancer compounds in breast cancer. Proc. Natl Acad. Sci. USA 109, 2724–2729 (2012).
Hoshino, D. et al. Network analysis of the focal adhesion to invadopodia transition identifies a PI3K-PKCα invasive signaling axis. Sci. Signal. 5, ra66 (2012).
Stronach, E. A. et al. DNA-PK mediates AKT activation and apoptosis inhibition in clinically acquired platinum resistance. Neoplasia 13, 1069–1080 (2011).
Mok, T. S. et al. Gefitinib or carboplatin-paclitaxel in pulmonary adenocarcinoma. N. Engl. J. Med. 361, 947–957 (2009).
Shepherd, F. A. et al. Erlotinib in previously treated non-small-cell lung cancer. N. Engl. J. Med. 353, 123–132 (2005).
Piccart-Gebhart, M. J. et al. Trastuzumab after adjuvant chemotherapy in HER2-positive breast cancer. N. Engl. J. Med. 353, 1659–1672 (2005).
Romond, E. H. et al. Trastuzumab plus adjuvant chemotherapy for operable HER2-positive breast cancer. N. Engl. J. Med. 353, 1673–1684 (2005).
Chapman, P. B. et al. Improved survival with vemurafenib in melanoma with BRAF V600E mutation. N. Engl. J. Med. 364, 2507–2516 (2011).
Jonker, D. J. et al. Cetuximab for the treatment of colorectal cancer. N. Engl. J. Med. 357, 2040–2048 (2007).
Karapetis, C. S. et al. K-ras mutations and benefit from cetuximab in advanced colorectal cancer. N. Engl. J. Med. 359, 1757–1765 (2008).
Iadevaia, S., Lu, Y., Morales, F. C., Mills, G. B. & Ram, P. T. Identification of optimal drug combinations targeting cellular networks: integrating phospho-proteomics and computational network analysis. Cancer Res. 70, 6704–6714 (2010).
van de Vijver, M. J. et al. A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347, 1999–2009 (2002).
Paik, S. et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 351, 2817–2826 (2004).
Cooper, S. et al. Predicting protein structures with a multiplayer online game. Nature 466, 756–760 (2010).
Radivojac, P. et al. A large-scale evaluation of computational protein function prediction. Nature Methods 10, 221–227 (2013).
Cheng, W.-Y., Ou Yang, T.-H. & Anastassiou, D. Biomolecular events in cancer revealed by attractor metagenes. PLoS Comput. Biol. 9, e1002920 (2013).
Cheng, W.-Y., Ou Yang, T.-H. & Anastassiou, D. Development of a prognostic model for breast cancer survival in an open challenge environment. Sci. Transl. Med. 5, 181ra50–181ra50 (2013).
Perou, C. M. et al. Molecular portraits of human breast tumours. Nature 406, 747–752 (2000).
Sørlie, T. et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl Acad. Sci. USA 98, 10869–10874 (2001).
Sørlie, T. et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc. Natl Acad. Sci. USA 100, 8418–8423 (2003).
Russnes, H. G. et al. Genomic architecture characterizes tumor progression paths and fate in breast cancer patients.Sci. Transl. Med. 2, 38ra47–38ra47 (2010).
Chin, S.-F. et al. Using array-comparative genomic hybridization to define molecular portraits of primary breast cancers. Oncogene 26, 1959–1970 (2007).
Stephens, P. J. et al. The landscape of cancer genes and mutational processes in breast cancer. Nature 486, 400–404 (2012).
Cancer Genome Atlas Research Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Naume, B. et al. Presence of bone marrow micrometastasis is associated with different recurrence risk within molecular subtypes of breast cancer. Mol. Oncol. 1, 160–171 (2007).
Nordgard, S. H. et al. Genome-wide analysis identifies 16q deletion associated with survival, molecular subtypes, mRNA expression, and germline haplotypes in breast cancer patients. Genes Chromosomes Cancer 47, 680–696 (2008).
Rønneberg, J. A. et al. Methylation profiling with a panel of cancer related genes: association with estrogen receptor, TP53 mutation status and expression subtypes in sporadic breast cancer. Mol. Oncol. 5, 61–76 (2011).
Enerly, E. et al. miRNA-mRNA integrated analysis reveals roles for miRNAs in primary breast tumors. PLoS ONE 6, e16915 (2011).
Joshi, H., Bhanot, G., Børresen-Dale, A.-L. & Kristensen, V. N. Potential tumorigenic programs associated with TP53 mutation status reveal role of VEGF pathway. Br. J. Cancer 107, 1722–1728 (2012).
Stephens, P. J. et al. Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature 462, 1005–1010 (2009).
Sun, Z. et al. Batch effect correction for genome-wide methylation data with Illumina Infinium platform. BMC Med. Genom. 4, 84 (2011).
Strehl, A. & Ghosh, J. Cluster ensembles — a knowledge reuse framework for combining partitionings. Journal of Machine Learning 3, 583–617 (2002).
Monti, S., Tamayo, P., Mesirov, J. & Golub, T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learn. 52, 91–118 (2003).
Collisson, E. A. et al. Subtypes of pancreatic ductal adenocarcinoma and their differing responses to therapy. Nature Med. 17, 500–503 (2011).
Lancichinetti, A. & Fortunato, S. Consensus clustering in complex networks. Sci. Rep. 2, 336 (2012).
Lee, M. & Kim, Y. CHESS (CgHExpreSS): a comprehensive analysis tool for the analysis of genomic alterations and their effects on the expression profile of the genome. BMC Bioinformatics 10, 424 (2009).
Shen, R., Olshen, A. B. & Ladanyi, M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25, 2906–2912 (2009).
Leday, G. G. R. & van de Wiel, M. A. PLRS: a flexible tool for the joint analysis of DNA copy number and mRNA expression data. Bioinformatics 29, 1081–1082 (2013).
Chen, B.-J. et al. Harnessing gene expression to identify the genetic basis of drug resistance. Mol. Syst. Biol. 5, 310 (2009).
Yuan, Y., Curtis, C., Caldas, C. & Markowetz, F. A. Sparse regulatory network of copy-number driven gene expression reveals putative breast cancer oncogenes. IEEE/ACM Trans. Comput. Biol. Bioinform. 9, 947–954 (2012).
Carro, M. S. et al. The transcriptional network for mesenchymal transformation of brain tumours. Nature 463, 318–325 (2010).
Saadi, A. et al. Stromal genes discriminate preinvasive from invasive disease, predict outcome, and highlight inflammatory pathways in digestive cancers. Proc. Natl Acad. Sci. USA 107, 2177–2182 (2010).
Hamatani, T. et al. Global gene expression analysis identifies molecular pathways distinguishing blastocyst dormancy and activation. Proc. Natl Acad. Sci. USA 101, 10326–10331 (2004).
Draghici, S. et al. A systems biology approach for pathway level analysis. Genome Res. 17, 1537–1545 (2007).
Engström, P. G. et al. Digital transcriptome profiling of normal and glioblastoma-derived neural stem cells identifies genes associated with patient survival. Genome Med. 4, 76 (2012).
Wu, J., Mao, X., Cai, T., Luo, J. & Wei, L. KOBAS server: a web-based platform for automated annotation and pathway identification. Nucleic Acids Res. 34, W720–W724 (2006).
Xie, C. et al. KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res. 39, W316–W322 (2011).
Li, C. et al. SubpathwayMiner: a software package for flexible identification of pathways. Nucleic Acids Res. 37, e131–e131 (2009).
Chang, H.-T. et al. Comprehensive analysis of microRNAs in breast cancer. BMCGenomics 13, S18 (2012).
Tamborero, D., Lopez-Bigas, N. & Gonzalez-Perez, A. Oncodrive-CIS: a method to reveal likely driver genes based on the impact of their copy number changes on expression. PLoS ONE 8, e55489 (2013).
Warsow, G. et al. ExprEssence—revealing the essence of differential experimental data in the context of an interaction/regulation net-work. BMC Syst. Biol. 4, 164 (2010).
Deshpande, R., Sharma, S., Verfaillie, C. M., Hu, W.-S. & Myers, C. L. A scalable approach for discovering conserved active subnetworks across species. PLoS Comput. Biol. 6, e1001028 (2010).
Goffard, N., Frickey, T. & Weiller, G. PathExpress update: the enzyme neighbourhood method of associating gene-expression data with metabolic pathways. Nucleic Acids Res. 37, W335–W339 (2009).
Bryant, W. A., Sternberg, M. J. E. & Pinney, J. W. AMBIENT: Active Modules for Bipartite Networks—using high-throughput transcriptomic data to dissect metabolic response. BMC Syst. Biol. 7, 26 (2013).
Kirk, P., Griffin, J. E., Savage, R. S., Ghahramani, Z. & Wild, D. L. Bayesian correlated clustering to integrate multiple datasets. Bioinformatics 28, 3290–3297 (2012).
Brodtkorb, M. et al. Whole-genome integrative analysis reveals expression signatures predicting transformation in follicular lymphoma. Blood, 123,1051–1054 (2014).
Acknowledgements
The authors thank numerous collaborators, most notably D. Quigley, R. Sachidanandam, S. Hautaniemi, P. van Loo and C. Vaske for the critical reading of the manuscript and for sharing their overview of the field and valuable discussions. Special thanks to C. Perou and C. Creighton of The Cancer Genome Atlas (TCGA) and O. Rueda and C. Caldas of the METABRIC study, as well as M. M. Holmen, from Oslo University Hospital for providing original images. The authors also thank the Norwegian Cancer Society, the K.G. Jebsen Foundation, the Norwegian Research Council, Health Region South East, and the Norwegian Radium Hospital's Foundation for financial support over many years.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Related links
FURTHER INFORMATION
Glossary
- Information theory
-
A branch of applied mathematics that quantifies the value of information in data.
- Bioconductor
-
A free, open-source and open-development software project for the analysis of high-throughput genomic data. Based on the statistical programming language R, the project was started in 2001 and now contains more than 750 packages to carry out data handling, visualization and analysis.
- Expression quantitative trait loci
-
(eQTL). Genomic loci that regulate expression levels of mRNAs or proteins.
- Over-fitting
-
In statistics, over-fitting occurs when a statistical model describes random noise instead of the underlying relationship.
- T-test statistic
-
T-tests are used to determine whether the mean of a continuous variable is different in two groups of individuals. It is based on a quantity called a t-test statistic, which is computed from the data and reflects the signal-to-noise ratio.
- Expectation-maximization algorithm
-
(EM algorithm). An iterative algorithm for the estimation of parameters in statistical models depending on unobserved variables. A limitation with EM is that it requires specification of initial values for the iteration, and the estimated parameters may depend on these.
- Lasso
-
A shrinkage and variable selection method for linear regression, used in particular when there are many covariates (for example, genes).
- Maximum entropy techniques
-
An alternative to maximum likelihood, maximum entropy techniques are a way to estimate models from data, by finding the most random probability distribution that fits the data.
- Simulated annealing
-
A global optimization algorithm that seeks a good approximation to the point of absolute maximum of a function.
- Greedy search algorithms
-
In optimization, a greedy algorithm is an iterative algorithm that takes an optimal (or semi-optimal) choice at every step, in the hope of obtaining the global solution at convergence. These algorithms do not generally result in optimal solutions and are used when the determination of a global solution would require an unacceptable amount of computing time.
- Bayesian approach
-
An approach to statistics that involves starting from our current (a priori) level of knowledge, collecting data and then using both to infer our (a posteriori) knowledge. Bayesian inference allows the incorporation of additional external knowledge into the estimation process.
- Latent variables
-
In statistics, latent variables (as opposed to observable data) are not measured but must be estimated from data, similar to parameters. However, contrary to parameters, latent variables are random and have a distribution. Latent models are inherently Bayesian.
- Support vector machines
-
In machine learning, support vector machines are supervised learning models that are used for classification and regression analysis.
Rights and permissions
About this article
Cite this article
Kristensen, V., Lingjærde, O., Russnes, H. et al. Principles and methods of integrative genomic analyses in cancer. Nat Rev Cancer 14, 299–313 (2014). https://doi.org/10.1038/nrc3721
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrc3721
This article is cited by
-
DNA-framework-based multidimensional molecular classifiers for cancer diagnosis
Nature Nanotechnology (2023)
-
A Systematic Review on Biomarker Identification for Cancer Diagnosis and Prognosis in Multi-omics: From Computational Needs to Machine Learning and Deep Learning
Archives of Computational Methods in Engineering (2023)
-
Identification of ZMYND19 as a novel biomarker of colorectal cancer: RNA-sequencing and machine learning analysis
Journal of Cell Communication and Signaling (2023)
-
Widespread redundancy in -omics profiles of cancer mutation states
Genome Biology (2022)
-
Validation analysis of the novel imaging-based prognostic radiomic signature in patients undergoing primary surgery for advanced high-grade serous ovarian cancer (HGSOC)
British Journal of Cancer (2022)