A community effort to assess and improve drug sensitivity prediction algorithms

Journal name:
Nature Biotechnology
Volume:
32,
Pages:
1202–1212
Year published:
DOI:
doi:10.1038/nbt.2877
Received
Accepted
Published online
Corrected online

Abstract

Predicting the best treatment strategy from genomic information is a core goal of precision medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling data sets measured in human breast cancer cell lines. Through a collaborative effort between the National Cancer Institute (NCI) and the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we analyzed a total of 44 drug sensitivity prediction algorithms. The top-performing approaches modeled nonlinear relationships and incorporated biological pathway information. We found that gene expression microarrays consistently provided the best predictive power of the individual profiling data sets; however, performance was increased by including multiple, independent data sets. We discuss the innovations underlying the top-performing methodology, Bayesian multitask MKL, and we provide detailed descriptions of all methods. This study establishes benchmarks for drug sensitivity prediction and identifies approaches that can be leveraged for the development of new methods.

At a glance

Figures

  1. The NCI-DREAM drug sensitivity challenge.
    Figure 1: The NCI-DREAM drug sensitivity challenge.

    (a) Six genomic, epigenomic, and proteomic profiling data sets were generated for 53 breast cancer cell lines, which were previously described23. Drug responses as measured by growth inhibition were assessed after treating the 53 cell lines with 28 drugs. Participants were supplied with all six profiling data sets and dose-response data for 35 cell lines and all 28 compounds (training set). Cell line names were released, but drug names were anonymized. The challenge was to predict the response (ranking from most sensitive to most resistant) for the 18 held-out cell lines (test set). The training and test cell lines were balanced for cancer subtype, dynamic range and missing values (Supplementary Fig. 11). Submissions were scored on their weighted average performance on ranking the 18 cell lines for 28 compounds. (b) Dose-response values for the training and test cell lines displayed as heatmaps.

  2. Evaluation of individual drug sensitivity prediction algorithms.
    Figure 2: Evaluation of individual drug sensitivity prediction algorithms.

    Prediction algorithms (n = 44) are indexed according to Table 1. (a) Team performance was evaluated using the weighted, probabilistic concordance index (wpc-index), which accounts for the experimental variation measured across cell lines and between compounds. Overall team ranks are listed on top of each bar. The gray line represents the mean random prediction score. (b,c) Robustness analysis was performed by randomly masking 10% of the test data set for 10,000 iterations. Performing this procedure repeatedly generates a distribution of wpc-index scores for each team (b). Additionally, after each iteration, teams were re-ranked to create a distribution of rank orders (c). The top two teams were reliably ranked the best and second-best performers (one-sided, Wilcoxon signed-rank test for b and c, FDR less double 10−10).

  3. The method implemented by the best performing team.
    Figure 3: The method implemented by the best performing team.

    (a) In addition to the six profiling data sets, three different categories of data views were compiled using prior biological knowledge, yielding in total 22 genomic views of each cell line. (b) Bayesian multitask MKL combines nonlinear regression, multiview learning, multitask learning and Bayesian inference. Nonlinear regression: response values were computed not directly from the input features but from kernels, which define similarity measures between cell lines. Each of the K data views was converted into an N×N kernel matrix Kk (k = 1,...,K), where N is the number of training cell lines. Specifically, the Gaussian kernel was used for real-valued data, and the Jaccard similarity coefficient for binary-valued data. Multiview learning: a combined kernel matrix K* was constructed as a weighted sum of the view-specific kernel matrices Kk, k = 1,...,K. The kernel weights were obtained by multiple kernel learning. Multitask learning: training was performed for all drugs simultaneously, sharing the kernel weights across drugs but allowing for drug-specific regression parameters, which for each drug consisted of a weight vector for the training cell lines and an intercept term. Bayesian inference: the model parameters were assumed to be random variables that follow specific probability distributions. Instead of learning point estimates for model parameters, the parameters of these distributions were learned using a variational approximation scheme.

  4. Performance comparison of data set views.
    Figure 4: Performance comparison of data set views.

    The top-performing method, Bayesian multitask MKL, and an elastic net predictor were trained on (a) the original profiling data sets, (b) computed views, (c) groups of data views, and (d) the fully integrated set of all data views. Boxplots represent the distribution of 50 random simulations matching the NCI-DREAM challenge parameters, where whiskers indicate the upper and lower range limit, and the black line, the median. (b) The computed views were derived from gene sets, combined data sets, calculated as the product of values between data sets, and discretizing continuous measures into binary values. (c) Data view groups were defined as all views derived from one profiling data set. (d) For Bayesian multitask MKL, the integration of all data views achieves the best performance. Gene expression is the most predictive profiling data set, slightly outperformed by gene set views of expression data and the integration of original and gene set expression data.

  5. Prediction performance on individual drugs.
    Figure 5: Prediction performance on individual drugs.

    Prediction algorithms are indexed and colored according to Table 1. (a) The heatplot illustrates participant performance on individual drugs, grouped by drug class (values can be found in Supplementary Table 9). Drug weights, which take into account the number of missing values and the noise in the −log10(GI50) measurements, are displayed at the top of the heatplot. Team submissions are ordered according to their overall performance from best performer at the top of the list. (b) The dynamic range of drugs across all cell lines was compared to the median team score. The node size reflects the number of distinct −log10(GI50) values for each drug across all 53 cell lines. The node colors reflect mode-of-action classes. The gray horizontal line is the mean score of random predictions and the vertical gray line separates low dynamic range (<2) from high dynamic range (>2), where dynamic range for a drug is the maximum −log10(GI50) − minimum −log10(GI50). (c) The distribution of team scores (n = 44) for individual drugs was compared to the null model of random predictions (gray line where pc-index = 0.5). The red points correspond to the maximum possible pc index (pc index of gold standard in the test data). On average, 21/28 drugs performed better than the null model; using the Kolmogorov-Smirnov test, 16/28 drugs were significantly better than the null model (*FDR < 0.05; **FDR < 0.01; ***FDR < 0.001).

Accession codes

Primary accessions

ArrayExpress

Gene Expression Omnibus

Change history

Corrected online 10 October 2014
In the version of this article initially published online, in Table 1, the FDRs should all have had negative rather than positive exponents. Several citations were incorrect, including on p.1, two lines from end of page, “proposed1,3,20,21,22” should have been “proposed1,3,20–23”; p.2, first paragraph of Results, “abundance23” should have been “abundance20”; p.9, end of penultimate paragraph, “models22” should have been “models23.” The errors have been corrected for the print, PDF and HTML versions of this article.

References

  1. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603607 (2012).
  2. Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 6170 (2012).
  3. Garnett, M.J. et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570575 (2012).
  4. Heiser, L.M. et al. Subtype and pathway specific responses to anticancer compounds in breast cancer. Proc. Natl. Acad. Sci. USA 109, 27242729 (2012).
  5. International Cancer Genome Consortium. et al. International network of cancer genome projects. Nature 464, 993998 (2010).
  6. Lamb, J. et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 19291935 (2006).
  7. Yang, W. et al. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 41, D955D961 (2013).
  8. Shoemaker, R.H. The NCI60 human tumour cell line anticancer drug screen. Nat. Rev. Cancer 6, 813823 (2006).
  9. Wilson, T.R. et al. Widespread potential for growth-factor-driven resistance to anticancer kinase inhibitors. Nature 487, 505509 (2012).
  10. Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346352 (2012).
  11. Reis-Filho, J.S. & Pusztai, L. Gene expression profiling in breast cancer: classification, prognostication, and prediction. Lancet 378, 18121823 (2011).
  12. Sorlie, T. et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. USA 98, 1086910874 (2001).
  13. van 't Veer, L.J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530536 (2002).
  14. Wu, J. et al. Identification and functional analysis of 9p24 amplified genes in human breast cancer. Oncogene 31, 333341 (2012).
  15. Howlader, N. et al. SEER Cancer Statistics Review, 1975–2010 (National Cancer Insitute, Bethesda, MD, 2013).
  16. Stephens, P.J. et al. The landscape of cancer genes and mutational processes in breast cancer. Nature 486, 400404 (2012).
  17. Wood, L.D. et al. The genomic landscapes of human breast and colorectal cancers. Science 318, 11081113 (2007).
  18. Kao, J. et al. Molecular profiling of breast cancer cell lines defines relevant tumor models and provides a resource for cancer gene discovery. PLoS ONE 4, e6146 (2009).
  19. Neve, R.M. et al. A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell 10, 515527 (2006).
  20. Daemen, A. et al. Modeling precision treatment in breast cancer. Genome Biol. 14, R110 (2013).
  21. Bussey, K.J. et al. Integrating data on DNA copy number with gene expression levels and drug sensitivities in the NCI-60 cell line panel. Mol. Cancer Ther. 5, 853867 (2006).
  22. Masica, D.L. & Karchin, R. Collections of simultaneously altered genes as biomarkers of cancer cell drug response. Cancer Res. 73, 16991708 (2013).
  23. Menden, M.P. et al. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS ONE 8, e61318 (2013).
  24. Harrell, F.E. Regression Modeling Strategies (Springer, New York, 2001).
  25. Marbach, D. et al. Wisdom of crowds for robust gene network inference. Nat. Methods 9, 796804 (2012).
  26. Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 2730 (2000).
  27. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 1554515550 (2005).
  28. Schölkopf, B. & Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT Press, 2001).
  29. Shawe-Taylor, J. & Cristianni, N. Kernel Methods for Pattern Analysis (Cambridge University Press, New York, NY, 2004).
  30. Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 17391740 (2011).
  31. Vaske, C.J. et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 26, i237i245 (2010).
  32. Gönen, M. & Alpaydin, E. Multiple kernel learning algorithms. J. Mach. Learn. Res. 12, 22112268 (2011).
  33. Caruana, R. Multitask learning. Mach. Learn. 28, 4175 (1997).
  34. Breiman, L. Random forests. Mach. Learn. 45, 532 (2001).
  35. Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 122 (2010).
  36. Leiserson, M.D., Blokh, D., Sharan, R. & Raphael, B.J. Simultaneous identification of multiple driver pathways in cancer. PLoS Comput. Biol. 9, e1003054 (2013).
  37. Fallahi-Sichani, M., Honarnejad, S., Heiser, L.M., Gray, J.W. & Sorger, P.K. Comparing drug activity across cell line banks reveals systematic variation in properties other than potency. Nat. Chem. Biol. 9, 708714 (2013).
  38. Kwong, L.N. et al. Oncogenic NRAS signaling differentially regulates survival and proliferation in melanoma. Nat. Med. 18, 15031510 (2012).
  39. Hanahan, D. & Weinberg, R.A. Hallmarks of cancer: the next generation. Cell 144, 646674 (2011).
  40. Rantala, L.M., Kwon, S., Korkola, J.E. & Gray, J.W. Expanding the diversity of image-based RNAi screen applications using cell spot microarrays. Microarrays 2, 97114 (2013).
  41. Margolin, A.A. et al. Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci. Transl. Med. 5, 181re1 (2013).
  42. Costello, J.C. & Stolovitzky, G. Seeking the wisdom of crowds through challenge-based competitions in biomedical research. Clin. Pharmacol. Ther. 93, 396398 (2013).
  43. Venkatraman, E.S. & Olshen, A.B. A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 23, 657663 (2007).
  44. Bengtsson, H., Wirapati, P. & Speed, T.P. A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6. Bioinformatics 25, 21492156 (2009).
  45. Griffith, M. et al. Alternative expression analysis by RNA sequencing. Nat. Methods 7, 843847 (2010).
  46. Fackler, M.J. et al. Genome-wide methylation analysis identifies genes specific to breast cancer hormone receptor status and risk of recurrence. Cancer Res. 71, 61956207 (2011).
  47. Tibes, R. et al. Reverse phase protein array: validation of a novel proteomic technology and utility for analysis of primary leukemia specimens and hematopoietic stem cells. Mol. Cancer Ther. 5, 25122521 (2006).
  48. Kuo, W.L. et al. A systems analysis of the chemosensitivity of breast cancer cells to the polyamine analogue PG-11047. BMC Med. 7, 77 (2009).
  49. Monks, A. et al. Feasibility of a high-flux anticancer drug screen using a diverse panel of cultured human tumor cell lines. J. Natl. Cancer Inst. 83, 757766 (1991).

Download references

Author information

  1. These authors contributed equally to this work.

    • James C Costello,
    • Laura M Heiser &
    • Elisabeth Georgii

Affiliations

  1. Howard Hughes Medical Institute, Boston University, Boston, Massachusetts, USA.

    • James C Costello,
    • James C Collins &
    • James J Collins
  2. Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA.

    • James C Costello,
    • James C Collins &
    • James J Collins
  3. Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA.

    • Laura M Heiser,
    • Nicholas J Wang &
    • Joe W Gray
  4. Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland.

    • Elisabeth Georgii,
    • Mehmet Gönen,
    • Muhammad Ammad-ud-din,
    • Suleiman A Khan &
    • Samuel Kaski
  5. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    • Michael P Menden,
    • Thomas Cokelaer &
    • Julio Saez-Rodriguez
  6. Department of Systems Biology, Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, USA.

    • Mukesh Bansal &
    • Andrea Califano
  7. Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland.

    • Petteri Hintsanen,
    • John-Patrick Mpindi,
    • Olli Kallioniemi,
    • Tero Aittokallio &
    • Krister Wennerberg
  8. Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.

    • Antti Honkela &
    • Samuel Kaski
  9. List of participants and affiliations appear at the end of the paper.

    • NCI DREAM Community
  10. Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, Massachusetts, USA.

    • James C Collins &
    • James J Collins
  11. National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA.

    • Dan Gallahan &
    • Dinah Singer
  12. IBM T.J. Watson Research Center, IBM, Yorktown Heights, New York, USA.

    • Gustavo Stolovitzky
  13. Present address: Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA.

    • James C Costello
  14. Swiss Institute for Experimental Cancer Research (ISREC), Swiss Federal Institute of Technology Lausanne (EPFL), Lausanne, Switzerland.

    • Jean-Paul Abbuehl,
    • Jonathan Bernard,
    • Krisztian Homicsko &
    • Anguraj Sadanandam
  15. Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, Texas, USA.

    • Jeffrey Allen,
    • Beibei Chen,
    • Min Kim,
    • Hao Tang,
    • Guanghua Xiao,
    • Yang Xie &
    • Jichen Yang
  16. Departments of Genetics and Bioengineering, Stanford University, Stanford, California, USA.

    • Russ B Altman &
    • Assaf Gottlieb
  17. Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, USA.

    • Shawn Balcome,
    • Raamesh Deshpande,
    • Chad L Myers,
    • Wen Wang &
    • Tian Xia
  18. Department of Computer Science, Stanford University, Palo Alto, California, USA.

    • Alexis Battle,
    • David A Knowles &
    • Daphne Koller
  19. Unilever Centre, Cambridge University, Cambridge, UK.

    • Andreas Bender
  20. Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts, USA.

    • Bonnie Berger
  21. Department of Statistics, University of Pune, Pune, India.

    • Madhuchhanda Bhattacharjee
  22. School of Mathematics and Statistics, University of Hyderabad, Hyderabad, India.

    • Madhuchhanda Bhattacharjee
  23. Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA.

    • Krithika Bhuvaneshwar,
    • Robinder Gauba,
    • Yuriy Gusev,
    • Michael Harris,
    • Subha Madhavan,
    • Lei Song &
    • Difei Wang
  24. Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, W. Lafayette, Indiana, USA.

    • Andrew A Bieberich &
    • V Jo Davisson
  25. Department of Statistics, University of Wisconsin, Madison, Wisconsin, USA.

    • Fred Boehm,
    • Haoyang Fan,
    • Nicholas Henderson,
    • Sunduz Keles,
    • Christina Kendziorski,
    • Michael A Newton,
    • Tram Ta,
    • Zhishi Wang &
    • Chandler Zuo
  26. Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin, USA.

    • Fred Boehm,
    • Haoyang Fan,
    • Nicholas Henderson,
    • Sunduz Keles,
    • Christina Kendziorski,
    • Michael A Newton,
    • Tram Ta,
    • Zhishi Wang &
    • Chandler Zuo
  27. Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, USA.

    • Christina Chan
  28. Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, Michigan, USA.

    • Christina Chan
  29. Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, USA.

    • Christina Chan
  30. Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, USA.

    • Ting-Huei Chen,
    • Min Jin Ha &
    • Wei Sun
  31. Korea Advanced Institute of Science and Technology, Daejeon, Korea.

    • Jaejoon Choi,
    • Woochang Hwang,
    • Junho Kim &
    • Junehawk Lee
  32. Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisbon, Portugal.

    • Luis Pedro Coelho
  33. Department of Medicine, Dan L. Duncan Center Division of Biostatistics, Baylor College of Medicine, Houston, Texas, USA.

    • Chad J Creighton
  34. Translational Medicine, Millennium Pharmaceuticals, Cambridge, Massachusetts, USA.

    • Jike Cui,
    • Bin Li,
    • Hyunjin Shin &
    • William Trepicchio
  35. Center for Integrated Bioinformatics, Drexel University, Philadelphia, Pennsylvania, USA.

    • Will Dampier,
    • Richard Pestell &
    • Aydin Tozeren
  36. Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Ghent, Belgium.

    • Bernard De Baets,
    • Michiel Stock &
    • Willem Waegeman
  37. Department of Information Engineering, University of Padova, Padova, Italy.

    • Barbara DiCamillo,
    • Francesco Sambo &
    • Gianna Maria Toffolo
  38. Computer and Information Science Department, IUPUI, Indianapolis, Indiana, USA.

    • Murat Dundar
  39. National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.

    • Zhana Duren,
    • Yong Wang,
    • Shihua Zhang,
    • Xiang-Sun Zhang &
    • Junfei Zhao
  40. Jefferson Kimmel Cancer Center, Drexel University, Philadelphia, Pennsylvania, USA.

    • Adam Ertel
  41. Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, Washington, DC, USA.

    • Hongbin Fang &
    • Ming Tan
  42. Department of Physics, University of Marburg, Marburg, Germany.

    • Michael Grau
  43. Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA.

    • Leng Han,
    • Jun Li,
    • Han Liang,
    • Yanxun Xu &
    • Yuan Yuan
  44. Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, USA.

    • Hussein A Hejase
  45. Department of Bioengineering and Institute for Genomic Biology, University of Illinois, Champaign-Urbana, Illinois, USA.

    • Jack P Hou &
    • Jian Ma
  46. Leiden Academic Center for Drug Research, University of Leiden, Leiden, Netherlands.

    • Adriaan P IJzerman &
    • Eelke B Lenselink
  47. Izmir Institute of Technology, Izmir, Turkey.

    • Bilge Karacali
  48. Division of Biostatistics, University of Virginia School of Medicine, Charlottesville, Virginia, USA.

    • Youngchul Kim &
    • Jae K Lee
  49. Korea Institute of Science and Technology Information, Daejeon, Korea.

    • Junehawk Lee
  50. Buck Institute, Novato, California, USA.

    • Biao Li &
    • Sean Mooney
  51. CAS-MPG Partner Institute for Computational Biology, Key Laboratory of Computational Biology, Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, Shanghai, P.R. China.

    • Jun Li
  52. Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, USA.

    • Han Liang &
    • Yuan Yuan
  53. Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA.

    • Subha Madhavan &
    • Difei Wang
  54. ChEMBL Group, The EMBL-European Bioinformatics Institute, Cambridge, UK.

    • John P Overington &
    • Gerard J P van Westen
  55. Electrical and Computer Engineering, Texas Tech University, Lubbock, Texas, USA.

    • Ranadip Pal &
    • Qian Wan
  56. Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts, USA.

    • Jian Peng
  57. IBM Almaden Research Center, IBM Almaden Research Center, San Jose, California, USA.

    • Robert J Prill
  58. Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Houston, Texas, USA.

    • Peng Qiu
  59. Bindley Bioscience Center, Purdue University, W. Lafayette, Indiana, USA.

    • Bartek Rajwa
  60. Department of Animal and Avian Science, University of Maryland, College Park, Maryland, USA.

    • Jiuzhou Song
  61. Embedded Systems Laboratory (ESL), Institute of Electrical Engineering, Swiss Federal Institute of Technology Lausanne (EPFL), Lausanne, Switzerland.

    • Arvind Sridhar
  62. Department of Mathematics and Statistics, Georgetown University, Washington, DC, USA.

    • Mahlet Tadesse
  63. The University of Colorado Cancer Center, University of Colorado School of Medicine, Aurora, Colorado, USA.

    • Dan Theodorescu
  64. Centre for Computational Biology, Mines ParisTech, Fontainebleau, France.

    • Nelle Varoquaux,
    • Jean-Philippe Vert &
    • Thomas Walter
  65. Institut Curie, Paris, France.

    • Nelle Varoquaux,
    • Jean-Philippe Vert &
    • Thomas Walter
  66. INSERM U900, Paris, France.

    • Nelle Varoquaux,
    • Jean-Philippe Vert &
    • Thomas Walter
  67. Janssen Pharmaceutica, Beerse, Belgium.

    • Joerg K Wegner &
    • Herman W T van Vlijmen
  68. Department of Biostatistics and Computational Biology, Rochester University Medical Center, Rochester, New York, USA.

    • Tongtong Wu
  69. Department of Statistics, Rice University, Houston, Texas, USA.

    • Yanxun Xu

Consortia

  1. NCI DREAM Community

    • Jean-Paul Abbuehl,
    • Tero Aittokallio,
    • Jeffrey Allen,
    • Russ B Altman,
    • Muhammad Ammad-ud-din,
    • Shawn Balcome,
    • Mukesh Bansal,
    • Alexis Battle,
    • Andreas Bender,
    • Bonnie Berger,
    • Jonathan Bernard,
    • Madhuchhanda Bhattacharjee,
    • Krithika Bhuvaneshwar,
    • Andrew A Bieberich,
    • Fred Boehm,
    • Andrea Califano,
    • Christina Chan,
    • Beibei Chen,
    • Ting-Huei Chen,
    • Jaejoon Choi,
    • Luis Pedro Coelho,
    • Thomas Cokelaer,
    • James C Collins,
    • James C Costello,
    • Chad J Creighton,
    • Jike Cui,
    • Will Dampier,
    • V Jo Davisson,
    • Bernard De Baets,
    • Raamesh Deshpande,
    • Barbara DiCamillo,
    • Murat Dundar,
    • Zhana Duren,
    • Adam Ertel,
    • Haoyang Fan,
    • Hongbin Fang,
    • Dan Gallahan,
    • Robinder Gauba,
    • Elisabeth Georgii,
    • Mehmet Gönen,
    • Assaf Gottlieb,
    • Michael Grau,
    • Joe W Gray,
    • Yuriy Gusev,
    • Min Jin Ha,
    • Leng Han,
    • Michael Harris,
    • Laura M Heiser,
    • Nicholas Henderson,
    • Hussein A Hejase,
    • Petteri Hintsanen,
    • Krisztian Homicsko,
    • Antti Honkela,
    • Jack P Hou,
    • Woochang Hwang,
    • Adriaan P IJzerman,
    • Olli Kallioniemi,
    • Bilge Karacali,
    • Samuel Kaski,
    • Sunduz Keles,
    • Christina Kendziorski,
    • Suleiman A Khan,
    • Junho Kim,
    • Min Kim,
    • Youngchul Kim,
    • David A Knowles,
    • Daphne Koller,
    • Junehawk Lee,
    • Jae K Lee,
    • Eelke B Lenselink,
    • Biao Li,
    • Bin Li,
    • Jun Li,
    • Han Liang,
    • Jian Ma,
    • Subha Madhavan,
    • Michael P Menden,
    • Sean Mooney,
    • John-Patrick Mpindi,
    • Chad L Myers,
    • Michael A Newton,
    • John P Overington,
    • Ranadip Pal,
    • Jian Peng,
    • Richard Pestell,
    • Robert J Prill,
    • Peng Qiu,
    • Bartek Rajwa,
    • Anguraj Sadanandam,
    • Julio Saez-Rodriguez,
    • Francesco Sambo,
    • Hyunjin Shin,
    • Dinah Singer,
    • Jiuzhou Song,
    • Lei Song,
    • Arvind Sridhar,
    • Michiel Stock,
    • Gustavo Stolovitzky,
    • Wei Sun,
    • Tram Ta,
    • Mahlet Tadesse,
    • Ming Tan,
    • Hao Tang,
    • Dan Theodorescu,
    • Gianna Maria Toffolo,
    • Aydin Tozeren,
    • William Trepicchio,
    • Nelle Varoquaux,
    • Jean-Philippe Vert,
    • Willem Waegeman,
    • Thomas Walter,
    • Qian Wan,
    • Difei Wang,
    • Nicholas J Wang,
    • Wen Wang,
    • Yong Wang,
    • Zhishi Wang,
    • Joerg K Wegner,
    • Krister Wennerberg,
    • Tongtong Wu,
    • Tian Xia,
    • Guanghua Xiao,
    • Yang Xie,
    • Yanxun Xu,
    • Jichen Yang,
    • Yuan Yuan,
    • Shihua Zhang,
    • Xiang-Sun Zhang,
    • Junfei Zhao,
    • Chandler Zuo,
    • Herman W T van Vlijmen &
    • Gerard J P van Westen

Contributions

J.C.C., M.P.M., L.M.H., M.B., D.G., D.S., J.S.-R., J.J.C., J.W.G. and G.S. designed the challenge. The top-performing approach was designed by E.G., M.G., M.A., P.H., S.A.K., J.-P.M., O.K., A.H., T.A., K.W. and S.K. Data analysis for the top-performing approach was conducted by E.G., M.G., M.A., P.H., S.A.K. and S.K. M.G. and S.K. designed the Bayesian model and M.G. implemented the inference algorithm for the top-performing approach. The NCI-DREAM Community provided drug sensitivity predictions and Supplementary Note 1. descriptions. J.C.C., L.M.H. and M.P.M. performed analysis of challenge predictions. J.C.C., L.M.H., E.G., M.P.M., J.S.-R., S.K. and G.S. interpreted the results of the challenge and performed follow-up analyses for the manuscript. L.M.H., N.J.W. and J.W.G. generated experimental data. J.C.C., L.M.H., E.G., M.G., M.P.M., J.J.C., J.S.-R., S.K., J.W.G. and G.S. wrote the paper.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (21,694 KB)

    Supplementary Figures 1–11 and Supplementary Notes 1–4

Excel files

  1. Supplementary Tables (954 KB)

    Supplementary Tables 1–10

Zip files

  1. Supplementary Software (6,483 KB)

    Supplementary Software - Top performing team code.

Additional data