Predicting the best treatment strategy from genomic information is a core goal of precision medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling data sets measured in human breast cancer cell lines. Through a collaborative effort between the National Cancer Institute (NCI) and the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we analyzed a total of 44 drug sensitivity prediction algorithms. The top-performing approaches modeled nonlinear relationships and incorporated biological pathway information. We found that gene expression microarrays consistently provided the best predictive power of the individual profiling data sets; however, performance was increased by including multiple, independent data sets. We discuss the innovations underlying the top-performing methodology, Bayesian multitask MKL, and we provide detailed descriptions of all methods. This study establishes benchmarks for drug sensitivity prediction and identifies approaches that can be leveraged for the development of new methods.
At a glance
- The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012). et al.
- Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
- Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570–575 (2012). et al.
- Subtype and pathway specific responses to anticancer compounds in breast cancer. Proc. Natl. Acad. Sci. USA 109, 2724–2729 (2012). et al.
- International Cancer Genome Consortium. et al. International network of cancer genome projects. Nature 464, 993–998 (2010).
- The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935 (2006). et al.
- Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 41, D955–D961 (2013). et al.
- The NCI60 human tumour cell line anticancer drug screen. Nat. Rev. Cancer 6, 813–823 (2006).
- Widespread potential for growth-factor-driven resistance to anticancer kinase inhibitors. Nature 487, 505–509 (2012). et al.
- The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012). et al.
- Gene expression profiling in breast cancer: classification, prognostication, and prediction. Lancet 378, 1812–1823 (2011). &
- Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. USA 98, 10869–10874 (2001). et al.
- Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002). et al.
- Identification and functional analysis of 9p24 amplified genes in human breast cancer. Oncogene 31, 333–341 (2012). et al.
- SEER Cancer Statistics Review, 1975–2010 (National Cancer Insitute, Bethesda, MD, 2013). et al.
- The landscape of cancer genes and mutational processes in breast cancer. Nature 486, 400–404 (2012). et al.
- The genomic landscapes of human breast and colorectal cancers. Science 318, 1108–1113 (2007). et al.
- Molecular profiling of breast cancer cell lines defines relevant tumor models and provides a resource for cancer gene discovery. PLoS ONE 4, e6146 (2009). et al.
- A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell 10, 515–527 (2006). et al.
- Modeling precision treatment in breast cancer. Genome Biol. 14, R110 (2013). et al.
- Integrating data on DNA copy number with gene expression levels and drug sensitivities in the NCI-60 cell line panel. Mol. Cancer Ther. 5, 853–867 (2006). et al.
- Collections of simultaneously altered genes as biomarkers of cancer cell drug response. Cancer Res. 73, 1699–1708 (2013). &
- Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS ONE 8, e61318 (2013). et al.
- Regression Modeling Strategies (Springer, New York, 2001).
- Wisdom of crowds for robust gene network inference. Nat. Methods 9, 796–804 (2012). et al.
- KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000). &
- Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005). et al.
- Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT Press, 2001). &
- Kernel Methods for Pattern Analysis (Cambridge University Press, New York, NY, 2004). &
- Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011). et al.
- Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 26, i237–i245 (2010). et al.
- Multiple kernel learning algorithms. J. Mach. Learn. Res. 12, 2211–2268 (2011). &
- Multitask learning. Mach. Learn. 28, 41–75 (1997).
- Random forests. Mach. Learn. 45, 5–32 (2001).
- Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010). , &
- Simultaneous identification of multiple driver pathways in cancer. PLoS Comput. Biol. 9, e1003054 (2013). , , &
- Comparing drug activity across cell line banks reveals systematic variation in properties other than potency. Nat. Chem. Biol. 9, 708–714 (2013). , , , &
- Oncogenic NRAS signaling differentially regulates survival and proliferation in melanoma. Nat. Med. 18, 1503–1510 (2012). et al.
- Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011). &
- Expanding the diversity of image-based RNAi screen applications using cell spot microarrays. Microarrays 2, 97–114 (2013). , , &
- Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci. Transl. Med. 5, 181re1 (2013). et al.
- Seeking the wisdom of crowds through challenge-based competitions in biomedical research. Clin. Pharmacol. Ther. 93, 396–398 (2013). &
- A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 23, 657–663 (2007). &
- A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6. Bioinformatics 25, 2149–2156 (2009). , &
- Alternative expression analysis by RNA sequencing. Nat. Methods 7, 843–847 (2010). et al.
- Genome-wide methylation analysis identifies genes specific to breast cancer hormone receptor status and risk of recurrence. Cancer Res. 71, 6195–6207 (2011). et al.
- Reverse phase protein array: validation of a novel proteomic technology and utility for analysis of primary leukemia specimens and hematopoietic stem cells. Mol. Cancer Ther. 5, 2512–2521 (2006). et al.
- A systems analysis of the chemosensitivity of breast cancer cells to the polyamine analogue PG-11047. BMC Med. 7, 77 (2009). et al.
- Feasibility of a high-flux anticancer drug screen using a diverse panel of cultured human tumor cell lines. J. Natl. Cancer Inst. 83, 757–766 (1991). et al.