Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Comparative analysis of algorithms for integration of copy number and expression data

Abstract

Chromosomal instability is a hallmark of cancer, and genes that display abnormal expression in aberrant chromosomal regions are likely to be key players in tumor progression. Identifying such driver genes reliably requires computational methods that can integrate genome-scale data from several sources. We compared the performance of ten algorithms that integrate copy-number and transcriptomics data from 15 head and neck squamous cell carcinoma cell lines, 129 lung squamous cell carcinoma primary tumors and simulated data. Our results revealed clear differences between the methods in terms of sensitivity and specificity as well as in their performance in small and large sample sizes. Results of the comparison are available at http://csbi.ltdk.helsinki.fi/cn2gealgo/.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Expression and copy-number values in simulated samples.
Figure 2: Sensitivity, specificity and MCC scores for all methods in simulated and real datasets.

Similar content being viewed by others

Accession codes

Accessions

ArrayExpress

References

  1. Stratton, M.R. Exploring the genomes of cancer cells: progress and promise. Science 331, 1553–1558 (2011).

    Article  CAS  Google Scholar 

  2. Negrini, S., Gorgoulis, V.G. & Halazonetis, T.D. Genomic instability—an evolving hallmark of cancer. Nat. Rev. Mol. Cell Biol. 11, 220–228 (2010).

    Article  CAS  Google Scholar 

  3. Hanash, S. Integrated global profiling of cancer. Nat. Rev. Cancer 4, 638–644 (2004).

    Article  CAS  Google Scholar 

  4. Fröhling, S. & Döhner, H. Chromosomal abnormalities in cancer. N. Engl. J. Med. 359, 722–734 (2008).

    Article  Google Scholar 

  5. Hanahan, D. & Weinberg, R. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).

    Article  CAS  Google Scholar 

  6. Hyman, E. et al. Impact of DNA amplification on gene expression patterns in breast cancer. Cancer Res. 62, 6240–6245 (2002).

    CAS  PubMed  Google Scholar 

  7. Albertson, D.G., Collins, C., McCormick, F. & Gray, J.W. Chromosome aberrations in solid tumors. Nat. Genet. 34, 369–376 (2003).

    Article  CAS  Google Scholar 

  8. Pollack, J.R. et al. Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc. Natl. Acad. Sci. USA 99, 12963–12968 (2002).

    Article  CAS  Google Scholar 

  9. Berger, J.A., Hautaniemi, S., Mitra, S.K. & Astola, J. Jointly analyzing gene expression and copy number data in breast cancer using data reduction models. IEEE/ACM Trans. Comput. Biol. Bioinformatics 3, 2 (2006).

    Article  CAS  Google Scholar 

  10. Bicciato, S. et al. A computational procedure to identify significant overlap of differentially expressed and genomic imbalanced regions in cancer datasets. Nucleic Acids Res. 37, 5057–5070 (2009).

    Article  CAS  Google Scholar 

  11. Chin, K. et al. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell 10, 529–541 (2006).

    Article  CAS  Google Scholar 

  12. Hautaniemi, S. et al. A strategy for identifying putative causes of gene expression variation in human cancers. J. Franklin Inst. 341, 77–88 (2004).

    Article  Google Scholar 

  13. Lahti, L., Myllykangas, S., Knuutila, S. & Kaski, S. Dependency detection with similarity constraints. in Proceedings of the 2009 IEEE International Workshop on Machine Learning for Signal Processing XIX 89–94 (IEEE, 2009).

  14. Lee, H., Kong, S.W. & Park, P.J. Integrative analysis reveals the direct and indirect interactions between DNA copy number aberrations and gene expression changes. Bioinformatics 24, 889–896 (2008).

    Article  CAS  Google Scholar 

  15. Lipson, D., Ben-Dor, A., Dehan, E. & Yakhini, Z. Joint Aanalysis of DNA copy numbers and gene expression levels. in Lecture Notes in Computer Science. vol. 3240 (eds. Jonassen, I. & Kim, J.) 135–146 (Springer, 2004).

  16. Oh, M., Song, B. & Lee, H. CAM: A web tool for combining array CGH and microarray gene expression data from multiple samples. Comput. Biol. Med. 40, 781–785 (2010).

    Article  CAS  Google Scholar 

  17. Salari, K., Tibshirani, R. & Pollack, J.R. DR–Integrator: a new analytic tool for integrating DNA copy number and gene expression data. Bioinformatics 26, 414–416 (2010).

    Article  CAS  Google Scholar 

  18. Schäfer, M. et al. Integrated analysis of copy number alterations and gene expression: a bivariate assessment of equally directed abnormalities. Bioinformatics 25, 3228–3235 (2009).

    Article  Google Scholar 

  19. van Wieringen, W.N. & van de Wiel, M.A. Nonparametric testing for DNA copy number induced differential mRNA gene expression. Biometrics 65, 19–29 (2009).

    Article  CAS  Google Scholar 

  20. Choi, H., Qin, Z.S. & Ghosh, D.A. Double-layered mixture model for the joint analysis of DNA copy number and gene expression data. J. Comput. Biol. 17, 121–137 (2010).

    Article  CAS  Google Scholar 

  21. Menezes, R., Boetzer, M., Sieswerda, M., van Ommen, G.-J. & Boer, J. Integrated analysis of DNA copy number and gene expression microarray data using gene sets. BMC Bioinformatics 10, 203 (2009).

    Article  Google Scholar 

  22. Lê Cao, K.-A., González, I. & Déjean, S. integrOmics: an R package to unravel relationships between two omics datasets. Bioinformatics 25, 2855–2856 (2009).

    Article  Google Scholar 

  23. Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F. & Nielsen, H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412–424 (2000).

    Article  CAS  Google Scholar 

  24. Carbone, D. Smoking and cancer. Am. J. Med. 93, S13–S17 (1992).

    Article  Google Scholar 

  25. Gibcus, J.H. et al. Amplicon mapping and expression profiling identify the Fas-associated Death Domain gene as a new driver in the 11q13.3 amplicon in laryngeal/pharyngeal cancer. Clin. Cancer Res. 13, 6257–6266 (2007).

    Article  CAS  Google Scholar 

  26. Leemans, C.R., Braakhuis, B.J.M. & Brakenhoff, R.H. The molecular biology of head and neck cancer. Nat. Rev. Cancer 11, 9–22 (2011).

    Article  CAS  Google Scholar 

  27. Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010).

    Article  CAS  Google Scholar 

  28. Huang, X.-P. et al. Alteration of RPL14 in squamous cell carcinomas and preneoplastic lesions of the esophagus. Gene 366, 161–168 (2006).

    Article  CAS  Google Scholar 

  29. Dormoy-Raclet, V. et al. Unr, a cytoplasmic RNA-binding protein with cold-shock domains, is involved in control of apoptosis in ES and HuH7 cells. Oncogene 26, 2595–2605 (2007).

    Article  CAS  Google Scholar 

  30. Croce, C.M. Oncogenes and cancer. N. Engl. J. Med. 358, 502–511 (2008).

    Article  CAS  Google Scholar 

  31. Diaz, R. et al. The N-ras proto-oncogene can suppress the malignant phenotype in the presence or absence of its oncogene. Cancer Res. 62, 4514–4518 (2002).

    CAS  PubMed  Google Scholar 

  32. Takahashi, C. et al. Nras loss induces metastatic conversion of Rb1-deficient neuroendocrine thyroid tumor. Nat. Genet. 38, 118–123 (2006).

    Article  CAS  Google Scholar 

  33. The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).

  34. Comtesse, N. et al. Frequent overexpression of the genes FXR1, CLAPM1 and EIF4G located on amplicon 3q26–27 in squamous cell carcinoma of the lung. Int. J. Cancer 120, 2538–2544 (2007).

    Article  CAS  Google Scholar 

  35. van de Wiel, M.A. et al. CGHcall: calling aberrations for array CGH tumor profiles. Bioinformatics 23, 892–894 (2007).

    Article  CAS  Google Scholar 

  36. Chen, P., Lepikhova, T., Hu, Y., Monni, O. & Hautaniemi, S. Comprehensive exon array data processing method for quantitative analysis of alternative spliced variants. Nucleic Acids Res. 39, e123 (2011).

    Article  CAS  Google Scholar 

  37. van de Wiel, M.A., Picard, F., van Wieringen, W.N. & Ylstra, B. Preprocessing and downstream analysis of microarray DNA copy number profiles. Brief. Bioinform. 12, 10–21 (2011).

    Article  CAS  Google Scholar 

  38. Beroukhim, R. et al. Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc. Natl. Acad. Sci. USA 104, 20007–20012 (2007).

    Article  CAS  Google Scholar 

  39. Woodwark, C. & Bateman, A. The characterisation of three types of genes that overlie copy number variable regions. PLoS ONE 6, e14814 (2011).

    Article  CAS  Google Scholar 

  40. Louhimo, R. & Hautaniemi, S. CNAmet: an R package for integration of copy number, expression and methylation data. Bioinformatics 27, 887–888 (2011).

    Article  CAS  Google Scholar 

  41. Huber, W., Toedling, J. & Steinmetz, L.M. Transcript mapping with high–density oligonucleotide tiling arrays. Bioinformatics 22, 1963–1970 (2006).

    Article  CAS  Google Scholar 

  42. Olshen, A.B., Venkatraman, E.S., Lucito, R. & Wigler, M. Circular binary segmentation for the analysis of array–based DNA copy number data. Biostatistics 5, 557–572 (2004).

    Article  Google Scholar 

  43. Hubbard, T.J. et al. Ensembl 2009. Nucleic Acids Res. 37, D690–D697 (2009).

    Article  CAS  Google Scholar 

  44. Lai, W.R., Johnson, M.D., Kucherlapati, R. & Park, P.J. Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21, 3763–3770 (2005).

    Article  CAS  Google Scholar 

  45. van Wieringen, W.N., van de Wiel, M.A. & Ylstra, B. Normalized, segmented or called acgh data? Cancer Inform. 3, 321–327 (2007).

    Article  Google Scholar 

  46. Ovaska, K. et al. Large–scale data integration framework provides a comprehensive view on glioblastoma multiforme. Genome Med. 2, 65 (2010).

    Article  Google Scholar 

  47. Willenbrock, H. & Fridlyand, J. A comparison study: applying segmentation to array CGH data for downstream analyses. Bioinformatics 21, 4084–4091 (2005).

    Article  CAS  Google Scholar 

  48. Zender, L. et al. Identification and validation of oncogenes in liver cancer using an integrative oncogenomic approach. Cell 125, 1253–1267 (2006).

    Article  CAS  Google Scholar 

  49. Schuuring, E., Verhoeven, E., Mooi, W. & Michalides, R. Identification and cloning of two overexpressed genes, U21B31/PRAD1 and EMS1, within the amplified chromosome 11q13 region in human carcinomas. Oncogene 2, 355–361 (1992).

    Google Scholar 

  50. Freier, K. et al. Recurrent copy number gain of transcription factor SOX2 and corresponding high protein expression in oral squamous cell carcinoma. Genes Chromosom. Cancer 49, 9–16 (2010).

    Article  CAS  Google Scholar 

  51. Redon, R. et al. A simple specific pattern of chromosomal aberrations at early stages of head and neck squamous cell carcinomas. Cancer Res. 61, 4122–4129 (2001).

    CAS  PubMed  Google Scholar 

  52. Reed, A.L. et al. High Frequency of p16 (CDKN2/MTS-1/INK4A) Inactivation in head and neck squamous cell carcinoma. Cancer Res. 56, 3630–3633 (1996).

    CAS  PubMed  Google Scholar 

  53. de al Guardia, C., Casiano, C.A., Trinidad-Pinedo, J. & Báez, A. Cenp-F gene amplification and overexpression in head and neck squamous cell carcinomas. Head Neck 23, 104–112 (2001).

    Article  Google Scholar 

  54. Kim, Y.H. et al. Genomic and functional analysis identifies CRKL as an oncogene amplified in lung cancer. Oncogene 29, 1421–1430 (2010).

    Article  CAS  Google Scholar 

  55. Sun, P.C. et al. Transcript map of the 8p23 putative tumor suppressor region. Genomics 75, 17–25 (2001).

    Article  CAS  Google Scholar 

  56. Sarkaria, I. et al. Squamous cell carcinoma related oncogene/DCUN1D1 is highly conserved and activated by amplification in squamous cell carcinomas. Cancer Res. 66, 9437–9444 (2006).

    Article  CAS  Google Scholar 

  57. Sheu, J.J.-C. et al. Functional genomic analysis identified epidermal growth factor receptor activation as the most nommon genetic event in oral squamous cell carcinoma. Cancer Res. 69, 2568–2576 (2009).

    Article  CAS  Google Scholar 

  58. Pekarsky, Y., Zanesi, N., Palamarchuk, A., Huebner, K. & Croce, C.M. FHIT: from gene discovery to cancer treatment and prevention. Lancet Oncol. 3, 748–754 (2002).

    Article  CAS  Google Scholar 

  59. Seiwert, T.Y. et al. The MET receptor tyrosine kinase is a potential novel therapeutic target for head and neck squamous cell carcinoma. Cancer Res. 69, 3021–3031 (2009).

    Article  CAS  Google Scholar 

  60. Huang, X., Gollin, S.M., Raja, S. & Godfrey, T.E. High-resolution mapping of the 11q13 amplicon and identification of a gene, TAOS1, that is amplified and overexpressed in oral cancer cells. Proc. Natl. Acad. Sci. USA 99, 11369–11374 (2002).

    Article  CAS  Google Scholar 

  61. Katoh, M. & Katoh, M. Identification and characterization of human TIPARP gene within the CCNL amplicon at human chromosome 3q25.31. Int. J. Oncol. 23, 541–547 (2003).

    CAS  PubMed  Google Scholar 

  62. Okami, K. et al. Analysis of PTEN/MMAC1 alterations in aerodigestive tract tumors. Cancer Res. 58, 509–511 (1998).

    CAS  PubMed  Google Scholar 

  63. Agochiya, M. et al. Increased dosage and amplification of the focal adhesion kinase gene in human cancer cells. Oncogene 18, 5646–5653 (1999).

    Article  CAS  Google Scholar 

  64. Hogg, R. et al. Frequent 3p allele loss and epigenetic inactivation of the RASSF1A tumour suppressor gene from region 3p21.3 in head and neck squamous cell carcinoma. Eur. J. Cancer 38, 1585–1592 (2002).

    Article  CAS  Google Scholar 

  65. Bornstein, S. et al. Smad4 loss in mice causes spontaneous head and neck cancer with increased genomic instability and inflammation. J. Clin. Invest. 119, 3408–3419 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Bian, Y. et al. Progressive tumor formation in mice with conditional deletion of TGF-β signaling in head and neck epithelia is associated with activation of the PI3K/Akt pathway. Cancer Res. 69, 5918–5926 (2009).

    Article  CAS  Google Scholar 

  67. Lu, S.-L. et al. Loss of transforming growth factor-β type II receptor promotes metastatic head-and-neck squamous cell carcinoma. Genes Dev. 20, 1331–1342 (2006).

    Article  CAS  Google Scholar 

  68. Hibi, K. et al. AIS is an oncogene amplified in squamous cell carcinoma. Proc. Natl. Acad. Sci. USA 97, 5462–5467 (2000).

    Article  CAS  Google Scholar 

  69. Schefe, J., Lehmann, K., Buschmann, I., Unger, T. & Funke-Kaiser, H. J. Mol. Med. 84, 901–910 (2006).

    Article  CAS  Google Scholar 

  70. Vandesompele, J. et al. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 3, 7 (2002).

    Article  Google Scholar 

  71. Pfaffl, M.W., Tichopad, A., Prgomet, C. & Neuvians, T.P. Determination of stable housekeeping genes, differentially regulated target genes and sample integrity: BestKeeper–Excel-based tool using pair-wise correlations. Biotechnol. Lett. 26, 509–515 (2004).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by the Academy of Finland (grants 125826 and 255523 (S.H.), and 218022 (O.M.)), Biocentrum Helsinki, Helsinki Biomedical Graduate School (R.L.), The Finnish Cancer Organizations and the Sigrid Jusélius Foundation.

Author information

Authors and Affiliations

Authors

Contributions

R.L. developed the analysis pipeline, implemented all algorithms and analyzed data. R.L. and S.H. designed the experiments and analyzed results. T.L. performed the qRT-PCR experiments and compiled the ground truth gene list with R.L. R.L. and S.H. wrote the manuscript. O.M. provided the array CGH and gene expression data on HNSCC samples. R.L. and T.L. wrote validation parts of Online Methods.

Corresponding author

Correspondence to Sampsa Hautaniemi.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–4, Supplementary Table 2, Supplementary Note, Supplementary Results (PDF 1608 kb)

Supplementary Table 1

Sensitivity, specificity and MCC scores for all algorithms in all datasets. (XLS 11 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Louhimo, R., Lepikhova, T., Monni, O. et al. Comparative analysis of algorithms for integration of copy number and expression data. Nat Methods 9, 351–355 (2012). https://doi.org/10.1038/nmeth.1893

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.1893

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research