Haibe-Kains et al.1 reported inconsistency between two large-scale pharmacogenomic studies—the Cancer Cell Line Encyclopedia (CCLE)2 and the Cancer Genome Project (CGP)3. Upon careful analysis of the same data, we have come to quite different and much more positive conclusions. Here we highlight the most important reasons for this. There is a Reply to this Comment by Safikhani, Z. et al. Nature 540, http://dx.doi.org/10.1038/nature19839 (2016).
To assess the concordance of two large studies of the efficacy of cancer drugs, Haibe-Kains et al.1 compared the correlation in drug sensitivity measures with correlation in gene expression measured on the same human cancer cell lines. The authors reported correlation ‘between’ cell lines for gene expression but, inconsistently, ‘across’ cell lines for drug sensitivity (see Methods). On re-analysis, we found much higher correlations between cell lines than across cell lines for both gene expression and drug sensitivity measures (median Spearman’s rank correlation coefficient (rs) = 0.88 between cell lines, rs = 0.56 across cell lines for expression; median rs = 0.62 between cell lines and rs = 0.35 across cell lines for area under the curve (AUC), a drug sensitivity measure). Thus, by correcting this inconsistency, the correlations for expression and drug sensitivity data were far more similar than was originally reported, which markedly undermines the authors’ interpretation of the relative quality of expression and drug sensitivity datasets.
In addition, the fundamental issue is that the authors’ reported Spearman’s correlation coefficients do not fairly reflect the concordance of drug sensitivity between the studies, because of the lack of variability in drug response, which arises owing to the highly targeted nature of many of the drugs assessed. To see why correlation is not an appropriate measure of biological concordance for these data, consider the hypothetical example of a drug that is not effective against any cell lines, which is a possibility for an experimental drug. In such a case, the randomly fluctuating measurement error, which is inherent in biological assays, will dominate over the non-existent biological variability, meaning that there could be no expectation of correlation between repeated measures of drug sensitivity—assuming other experimental variables are held constant. In this study, many of the drugs were highly targeted agents, which by design require specific, and often rare, molecular targets for response (see Supplementary Table 1). Consider nilotinib, which targets the BCR-ABL1 fusion gene and was suggested in ref. 1 to exhibit ‘poor consistency’ between CGP and CCLE (rs = 0.1 for AUC). In CGP, BCR-ABL1 status was reported to be strongly associated with drug sensitivity (P = 2.54 × 10−65), accurately reflecting the known biology. BCR-ABL1 status was not reported by CCLE; however, upon re-analysis we identified three BCR-ABL1-positive cell lines among the 189 nilotinib-treated cell lines that overlapped CGP, and these were also the three most sensitive samples (P = 9 × 10−7). Hence, despite the fact that these drug sensitivity data were accurately recapitulating biological expectations in both studies, the authors’ criteria classified nilotinib sensitivity incorrectly. Of the 577 cell lines screened in CGP, 573 do not contain the nilotinib target, that is, the BCR-ABL1 fusion gene. Thus, given (as expected) no drug response in almost all cell lines screened (median AUC across all cell lines = 0.99; AUC of 1 represents no drug response; Fig. 1a, Supplementary Table 1), there was little biological variability across most of the cell lines, resulting in low correlation between the repeated measurements made by CCLE and CGP, despite clearly concordant results. Similarly, most other drugs that the authors compared were also targeted agents, meaning that this lack of drug response was common; for 10 of the 15 drugs, the median AUC was greater than 0.90 in CGP, and 8 of these 10 also have median AUC values greater than 0.9 in CCLE, resulting in little variability across most cell lines when treated with these drugs. We identified a systematic relationship between variability in drug response in either study and correlation between the two studies (Fig. 1b). A valid comparison of CGP and CCLE should consider the pharmacology of the drugs screened and in particular the differences in the variability induced by different drugs. Nilotinib was not an isolated case; despite the highly experimental nature of many of the compounds screened by CCLE and CGP, we still identified several expected associations that were consistently reported by both studies, including ERBB2 for lapatinib4, NQO1 expression for 17-AAG5, BRAF mutation for PD-0325901 (ref. 6), AZD6244 (ref. 7), and PLX4720 (ref. 8), MDM2 for nutlin-3a (ref. 9), and MET for crizotinib10 (Supplementary Table 1). Finally, the utility of these pharmacogenomic datasets is now further supported by the findings that models fit using data from CGP could reliably predict drug response in several clinical trials11,12.
In summary, our analysis shows that the conclusions of Haibe-Kains et al.1 are unsubstantiated, and we propose that a fair assessment of concordance between large pharmacogenomic datasets will require the development or adaptation of methods that account for the issues raised here, although great care will be required to ensure that such methods do not introduce their own unforeseen biases.
In CGP and CCLE, using ordered data common to both studies, gene expression and drug sensitivity (AUC) values can be arranged in n1 × m and n2 × m matrices, respectively, in which m is the number of cell lines, n1 is the number of genes and n2 is the number of drugs common to both studies. Correlations ‘between’ cell lines are calculated by the correlation of matching columns of CGP and CCLE matrices (vectors of length n1 for expression or n2 for AUC). Correlations ‘across’ cell lines are the correlations of matching rows (vectors of length m for both data).
To achieve easy reproduction of our results, we have made the source code for our analysis available in a GitHub repository (https://github.com/paulgeeleher/nature_bca).
Haibe-Kains, B. et al. Inconsistency in large pharmacogenomic studies. Nature 504, 389–393 (2013)
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012)
Garnett, M. J. et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570–575 (2012)
Konecny, G. E. et al. Activity of the dual kinase inhibitor lapatinib (GW572016) against HER-2-overexpressing and trastuzumab-treated breast cancer cells. Cancer Res. 66, 1630–1639 (2006)
Kelland, L. R., Sharp, S. Y., Rogers, P. M., Myers, T. G. & Workman, P. DT-Diaphorase expression and tumor cell sensitivity to 17-allylamino, 17-demethoxygeldanamycin, an inhibitor of heat shock protein 90. J. Natl. Cancer Inst. 91, 1940–1949 (1999)
Solit, D. B. et al. BRAF mutation predicts sensitivity to MEK inhibition. Nature 439, 358–362 (2006)
Dry, J. R. et al. Transcriptional pathway signatures predict MEK addiction and response to selumetinib (AZD6244). Cancer Res. 70, 2264–2273 (2010)
Tsai, J. et al. Discovery of a selective inhibitor of oncogenic B-Raf kinase with potent antimelanoma activity. Proc. Natl Acad. Sci. USA 105, 3041–3046 (2008)
Müller, C. R. et al. Potential for treatment of liposarcomas with the MDM2 antagonist Nutlin-3A. Int. J. Cancer 121, 199–205 (2007)
Timm, A. & Kolesar, J. M. Crizotinib for the treatment of non-small-cell lung cancer. Am. J. Health Syst. Pharm. 70, 943–947 (2013)
Geeleher, P., Cox, N. J. & Huang, R. S. Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome Biol. 15, R47 (2014)
Falgreen, S. et al. Predicting response to multidrug regimens in cancer patients using cell line experiments and regularised regression models. BMC Cancer 15, 235 (2015)
The authors declare no competing financial interests.
This file shows a summary of the results reported with the Supplementary Data of CGP and CCLE for the 15 compounds assessed by Haibe-Kains et al. These results were obtained from either the CGP web portal (www.cancerrxgene.org) or the supplementary tables S6 & S7 or supplementary figure 9 provided with the publication of the CCLE data. These already published results show a very clear trend of both studies reliably identifying many canonical drug targets. (XLSX 15 kb)
About this article
Cite this article
Geeleher, P., Gamazon, E., Seoighe, C. et al. Consistency in large pharmacogenomic studies. Nature 540, E1–E2 (2016). https://doi.org/10.1038/nature19838
Scientific Reports (2020)
Briefings in Bioinformatics (2020)
Briefings in Bioinformatics (2019)
Frontiers in Chemistry (2019)
Genome Medicine (2019)