Abstract
Chromosomal instability is a hallmark of cancer, and genes that display abnormal expression in aberrant chromosomal regions are likely to be key players in tumor progression. Identifying such driver genes reliably requires computational methods that can integrate genome-scale data from several sources. We compared the performance of ten algorithms that integrate copy-number and transcriptomics data from 15 head and neck squamous cell carcinoma cell lines, 129 lung squamous cell carcinoma primary tumors and simulated data. Our results revealed clear differences between the methods in terms of sensitivity and specificity as well as in their performance in small and large sample sizes. Results of the comparison are available at http://csbi.ltdk.helsinki.fi/cn2gealgo/.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Integration of transcriptome and proteome profiles in glioblastoma: looking for the missing link
BMC Molecular Biology Open Access 21 November 2018
-
Importance of rare gene copy number alterations for personalized tumor characterization and survival analysis
Genome Biology Open Access 03 October 2016
-
Data integration to prioritize drugs using genomics and curated data
BioData Mining Open Access 26 May 2016
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout


Accession codes
References
Stratton, M.R. Exploring the genomes of cancer cells: progress and promise. Science 331, 1553–1558 (2011).
Negrini, S., Gorgoulis, V.G. & Halazonetis, T.D. Genomic instability—an evolving hallmark of cancer. Nat. Rev. Mol. Cell Biol. 11, 220–228 (2010).
Hanash, S. Integrated global profiling of cancer. Nat. Rev. Cancer 4, 638–644 (2004).
Fröhling, S. & Döhner, H. Chromosomal abnormalities in cancer. N. Engl. J. Med. 359, 722–734 (2008).
Hanahan, D. & Weinberg, R. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).
Hyman, E. et al. Impact of DNA amplification on gene expression patterns in breast cancer. Cancer Res. 62, 6240–6245 (2002).
Albertson, D.G., Collins, C., McCormick, F. & Gray, J.W. Chromosome aberrations in solid tumors. Nat. Genet. 34, 369–376 (2003).
Pollack, J.R. et al. Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc. Natl. Acad. Sci. USA 99, 12963–12968 (2002).
Berger, J.A., Hautaniemi, S., Mitra, S.K. & Astola, J. Jointly analyzing gene expression and copy number data in breast cancer using data reduction models. IEEE/ACM Trans. Comput. Biol. Bioinformatics 3, 2 (2006).
Bicciato, S. et al. A computational procedure to identify significant overlap of differentially expressed and genomic imbalanced regions in cancer datasets. Nucleic Acids Res. 37, 5057–5070 (2009).
Chin, K. et al. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell 10, 529–541 (2006).
Hautaniemi, S. et al. A strategy for identifying putative causes of gene expression variation in human cancers. J. Franklin Inst. 341, 77–88 (2004).
Lahti, L., Myllykangas, S., Knuutila, S. & Kaski, S. Dependency detection with similarity constraints. in Proceedings of the 2009 IEEE International Workshop on Machine Learning for Signal Processing XIX 89–94 (IEEE, 2009).
Lee, H., Kong, S.W. & Park, P.J. Integrative analysis reveals the direct and indirect interactions between DNA copy number aberrations and gene expression changes. Bioinformatics 24, 889–896 (2008).
Lipson, D., Ben-Dor, A., Dehan, E. & Yakhini, Z. Joint Aanalysis of DNA copy numbers and gene expression levels. in Lecture Notes in Computer Science. vol. 3240 (eds. Jonassen, I. & Kim, J.) 135–146 (Springer, 2004).
Oh, M., Song, B. & Lee, H. CAM: A web tool for combining array CGH and microarray gene expression data from multiple samples. Comput. Biol. Med. 40, 781–785 (2010).
Salari, K., Tibshirani, R. & Pollack, J.R. DR–Integrator: a new analytic tool for integrating DNA copy number and gene expression data. Bioinformatics 26, 414–416 (2010).
Schäfer, M. et al. Integrated analysis of copy number alterations and gene expression: a bivariate assessment of equally directed abnormalities. Bioinformatics 25, 3228–3235 (2009).
van Wieringen, W.N. & van de Wiel, M.A. Nonparametric testing for DNA copy number induced differential mRNA gene expression. Biometrics 65, 19–29 (2009).
Choi, H., Qin, Z.S. & Ghosh, D.A. Double-layered mixture model for the joint analysis of DNA copy number and gene expression data. J. Comput. Biol. 17, 121–137 (2010).
Menezes, R., Boetzer, M., Sieswerda, M., van Ommen, G.-J. & Boer, J. Integrated analysis of DNA copy number and gene expression microarray data using gene sets. BMC Bioinformatics 10, 203 (2009).
Lê Cao, K.-A., González, I. & Déjean, S. integrOmics: an R package to unravel relationships between two omics datasets. Bioinformatics 25, 2855–2856 (2009).
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F. & Nielsen, H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412–424 (2000).
Carbone, D. Smoking and cancer. Am. J. Med. 93, S13–S17 (1992).
Gibcus, J.H. et al. Amplicon mapping and expression profiling identify the Fas-associated Death Domain gene as a new driver in the 11q13.3 amplicon in laryngeal/pharyngeal cancer. Clin. Cancer Res. 13, 6257–6266 (2007).
Leemans, C.R., Braakhuis, B.J.M. & Brakenhoff, R.H. The molecular biology of head and neck cancer. Nat. Rev. Cancer 11, 9–22 (2011).
Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010).
Huang, X.-P. et al. Alteration of RPL14 in squamous cell carcinomas and preneoplastic lesions of the esophagus. Gene 366, 161–168 (2006).
Dormoy-Raclet, V. et al. Unr, a cytoplasmic RNA-binding protein with cold-shock domains, is involved in control of apoptosis in ES and HuH7 cells. Oncogene 26, 2595–2605 (2007).
Croce, C.M. Oncogenes and cancer. N. Engl. J. Med. 358, 502–511 (2008).
Diaz, R. et al. The N-ras proto-oncogene can suppress the malignant phenotype in the presence or absence of its oncogene. Cancer Res. 62, 4514–4518 (2002).
Takahashi, C. et al. Nras loss induces metastatic conversion of Rb1-deficient neuroendocrine thyroid tumor. Nat. Genet. 38, 118–123 (2006).
The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).
Comtesse, N. et al. Frequent overexpression of the genes FXR1, CLAPM1 and EIF4G located on amplicon 3q26–27 in squamous cell carcinoma of the lung. Int. J. Cancer 120, 2538–2544 (2007).
van de Wiel, M.A. et al. CGHcall: calling aberrations for array CGH tumor profiles. Bioinformatics 23, 892–894 (2007).
Chen, P., Lepikhova, T., Hu, Y., Monni, O. & Hautaniemi, S. Comprehensive exon array data processing method for quantitative analysis of alternative spliced variants. Nucleic Acids Res. 39, e123 (2011).
van de Wiel, M.A., Picard, F., van Wieringen, W.N. & Ylstra, B. Preprocessing and downstream analysis of microarray DNA copy number profiles. Brief. Bioinform. 12, 10–21 (2011).
Beroukhim, R. et al. Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc. Natl. Acad. Sci. USA 104, 20007–20012 (2007).
Woodwark, C. & Bateman, A. The characterisation of three types of genes that overlie copy number variable regions. PLoS ONE 6, e14814 (2011).
Louhimo, R. & Hautaniemi, S. CNAmet: an R package for integration of copy number, expression and methylation data. Bioinformatics 27, 887–888 (2011).
Huber, W., Toedling, J. & Steinmetz, L.M. Transcript mapping with high–density oligonucleotide tiling arrays. Bioinformatics 22, 1963–1970 (2006).
Olshen, A.B., Venkatraman, E.S., Lucito, R. & Wigler, M. Circular binary segmentation for the analysis of array–based DNA copy number data. Biostatistics 5, 557–572 (2004).
Hubbard, T.J. et al. Ensembl 2009. Nucleic Acids Res. 37, D690–D697 (2009).
Lai, W.R., Johnson, M.D., Kucherlapati, R. & Park, P.J. Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21, 3763–3770 (2005).
van Wieringen, W.N., van de Wiel, M.A. & Ylstra, B. Normalized, segmented or called acgh data? Cancer Inform. 3, 321–327 (2007).
Ovaska, K. et al. Large–scale data integration framework provides a comprehensive view on glioblastoma multiforme. Genome Med. 2, 65 (2010).
Willenbrock, H. & Fridlyand, J. A comparison study: applying segmentation to array CGH data for downstream analyses. Bioinformatics 21, 4084–4091 (2005).
Zender, L. et al. Identification and validation of oncogenes in liver cancer using an integrative oncogenomic approach. Cell 125, 1253–1267 (2006).
Schuuring, E., Verhoeven, E., Mooi, W. & Michalides, R. Identification and cloning of two overexpressed genes, U21B31/PRAD1 and EMS1, within the amplified chromosome 11q13 region in human carcinomas. Oncogene 2, 355–361 (1992).
Freier, K. et al. Recurrent copy number gain of transcription factor SOX2 and corresponding high protein expression in oral squamous cell carcinoma. Genes Chromosom. Cancer 49, 9–16 (2010).
Redon, R. et al. A simple specific pattern of chromosomal aberrations at early stages of head and neck squamous cell carcinomas. Cancer Res. 61, 4122–4129 (2001).
Reed, A.L. et al. High Frequency of p16 (CDKN2/MTS-1/INK4A) Inactivation in head and neck squamous cell carcinoma. Cancer Res. 56, 3630–3633 (1996).
de al Guardia, C., Casiano, C.A., Trinidad-Pinedo, J. & Báez, A. Cenp-F gene amplification and overexpression in head and neck squamous cell carcinomas. Head Neck 23, 104–112 (2001).
Kim, Y.H. et al. Genomic and functional analysis identifies CRKL as an oncogene amplified in lung cancer. Oncogene 29, 1421–1430 (2010).
Sun, P.C. et al. Transcript map of the 8p23 putative tumor suppressor region. Genomics 75, 17–25 (2001).
Sarkaria, I. et al. Squamous cell carcinoma related oncogene/DCUN1D1 is highly conserved and activated by amplification in squamous cell carcinomas. Cancer Res. 66, 9437–9444 (2006).
Sheu, J.J.-C. et al. Functional genomic analysis identified epidermal growth factor receptor activation as the most nommon genetic event in oral squamous cell carcinoma. Cancer Res. 69, 2568–2576 (2009).
Pekarsky, Y., Zanesi, N., Palamarchuk, A., Huebner, K. & Croce, C.M. FHIT: from gene discovery to cancer treatment and prevention. Lancet Oncol. 3, 748–754 (2002).
Seiwert, T.Y. et al. The MET receptor tyrosine kinase is a potential novel therapeutic target for head and neck squamous cell carcinoma. Cancer Res. 69, 3021–3031 (2009).
Huang, X., Gollin, S.M., Raja, S. & Godfrey, T.E. High-resolution mapping of the 11q13 amplicon and identification of a gene, TAOS1, that is amplified and overexpressed in oral cancer cells. Proc. Natl. Acad. Sci. USA 99, 11369–11374 (2002).
Katoh, M. & Katoh, M. Identification and characterization of human TIPARP gene within the CCNL amplicon at human chromosome 3q25.31. Int. J. Oncol. 23, 541–547 (2003).
Okami, K. et al. Analysis of PTEN/MMAC1 alterations in aerodigestive tract tumors. Cancer Res. 58, 509–511 (1998).
Agochiya, M. et al. Increased dosage and amplification of the focal adhesion kinase gene in human cancer cells. Oncogene 18, 5646–5653 (1999).
Hogg, R. et al. Frequent 3p allele loss and epigenetic inactivation of the RASSF1A tumour suppressor gene from region 3p21.3 in head and neck squamous cell carcinoma. Eur. J. Cancer 38, 1585–1592 (2002).
Bornstein, S. et al. Smad4 loss in mice causes spontaneous head and neck cancer with increased genomic instability and inflammation. J. Clin. Invest. 119, 3408–3419 (2009).
Bian, Y. et al. Progressive tumor formation in mice with conditional deletion of TGF-β signaling in head and neck epithelia is associated with activation of the PI3K/Akt pathway. Cancer Res. 69, 5918–5926 (2009).
Lu, S.-L. et al. Loss of transforming growth factor-β type II receptor promotes metastatic head-and-neck squamous cell carcinoma. Genes Dev. 20, 1331–1342 (2006).
Hibi, K. et al. AIS is an oncogene amplified in squamous cell carcinoma. Proc. Natl. Acad. Sci. USA 97, 5462–5467 (2000).
Schefe, J., Lehmann, K., Buschmann, I., Unger, T. & Funke-Kaiser, H. J. Mol. Med. 84, 901–910 (2006).
Vandesompele, J. et al. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 3, 7 (2002).
Pfaffl, M.W., Tichopad, A., Prgomet, C. & Neuvians, T.P. Determination of stable housekeeping genes, differentially regulated target genes and sample integrity: BestKeeper–Excel-based tool using pair-wise correlations. Biotechnol. Lett. 26, 509–515 (2004).
Acknowledgements
This work was supported by the Academy of Finland (grants 125826 and 255523 (S.H.), and 218022 (O.M.)), Biocentrum Helsinki, Helsinki Biomedical Graduate School (R.L.), The Finnish Cancer Organizations and the Sigrid Jusélius Foundation.
Author information
Authors and Affiliations
Contributions
R.L. developed the analysis pipeline, implemented all algorithms and analyzed data. R.L. and S.H. designed the experiments and analyzed results. T.L. performed the qRT-PCR experiments and compiled the ground truth gene list with R.L. R.L. and S.H. wrote the manuscript. O.M. provided the array CGH and gene expression data on HNSCC samples. R.L. and T.L. wrote validation parts of Online Methods.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–4, Supplementary Table 2, Supplementary Note, Supplementary Results (PDF 1608 kb)
Supplementary Table 1
Sensitivity, specificity and MCC scores for all algorithms in all datasets. (XLS 11 kb)
Rights and permissions
About this article
Cite this article
Louhimo, R., Lepikhova, T., Monni, O. et al. Comparative analysis of algorithms for integration of copy number and expression data. Nat Methods 9, 351–355 (2012). https://doi.org/10.1038/nmeth.1893
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.1893
This article is cited by
-
Integration of transcriptome and proteome profiles in glioblastoma: looking for the missing link
BMC Molecular Biology (2018)
-
A survey of best practices for RNA-seq data analysis
Genome Biology (2016)
-
Data integration to prioritize drugs using genomics and curated data
BioData Mining (2016)
-
Importance of rare gene copy number alterations for personalized tumor characterization and survival analysis
Genome Biology (2016)
-
Liprin-α1 is a regulator of vimentin intermediate filament network in the cancer cell adhesion machinery
Scientific Reports (2016)