Consistency in large pharmacogenomic studies

arising from B. Haibe-Kains et al. Nature 504, 389–393 (2013); 10.1038/nature12831

Haibe-Kains et al.1 reported inconsistency between two large-scale pharmacogenomic studies—the Cancer Cell Line Encyclopedia (CCLE)2 and the Cancer Genome Project (CGP)3. Upon careful analysis of the same data, we have come to quite different and much more positive conclusions. Here we highlight the most important reasons for this. There is a Reply to this Comment by Safikhani, Z. et al. Nature 540, (2016).

To assess the concordance of two large studies of the efficacy of cancer drugs, Haibe-Kains et al.1 compared the correlation in drug sensitivity measures with correlation in gene expression measured on the same human cancer cell lines. The authors reported correlation ‘between’ cell lines for gene expression but, inconsistently, ‘across’ cell lines for drug sensitivity (see Methods). On re-analysis, we found much higher correlations between cell lines than across cell lines for both gene expression and drug sensitivity measures (median Spearman’s rank correlation coefficient (rs) = 0.88 between cell lines, rs = 0.56 across cell lines for expression; median rs = 0.62 between cell lines and rs = 0.35 across cell lines for area under the curve (AUC), a drug sensitivity measure). Thus, by correcting this inconsistency, the correlations for expression and drug sensitivity data were far more similar than was originally reported, which markedly undermines the authors’ interpretation of the relative quality of expression and drug sensitivity datasets.

In addition, the fundamental issue is that the authors’ reported Spearman’s correlation coefficients do not fairly reflect the concordance of drug sensitivity between the studies, because of the lack of variability in drug response, which arises owing to the highly targeted nature of many of the drugs assessed. To see why correlation is not an appropriate measure of biological concordance for these data, consider the hypothetical example of a drug that is not effective against any cell lines, which is a possibility for an experimental drug. In such a case, the randomly fluctuating measurement error, which is inherent in biological assays, will dominate over the non-existent biological variability, meaning that there could be no expectation of correlation between repeated measures of drug sensitivity—assuming other experimental variables are held constant. In this study, many of the drugs were highly targeted agents, which by design require specific, and often rare, molecular targets for response (see Supplementary Table 1). Consider nilotinib, which targets the BCR-ABL1 fusion gene and was suggested in ref. 1 to exhibit ‘poor consistency’ between CGP and CCLE (rs = 0.1 for AUC). In CGP, BCR-ABL1 status was reported to be strongly associated with drug sensitivity (P = 2.54 × 10−65), accurately reflecting the known biology. BCR-ABL1 status was not reported by CCLE; however, upon re-analysis we identified three BCR-ABL1-positive cell lines among the 189 nilotinib-treated cell lines that overlapped CGP, and these were also the three most sensitive samples (P = 9 × 10−7). Hence, despite the fact that these drug sensitivity data were accurately recapitulating biological expectations in both studies, the authors’ criteria classified nilotinib sensitivity incorrectly. Of the 577 cell lines screened in CGP, 573 do not contain the nilotinib target, that is, the BCR-ABL1 fusion gene. Thus, given (as expected) no drug response in almost all cell lines screened (median AUC across all cell lines = 0.99; AUC of 1 represents no drug response; Fig. 1a, Supplementary Table 1), there was little biological variability across most of the cell lines, resulting in low correlation between the repeated measurements made by CCLE and CGP, despite clearly concordant results. Similarly, most other drugs that the authors compared were also targeted agents, meaning that this lack of drug response was common; for 10 of the 15 drugs, the median AUC was greater than 0.90 in CGP, and 8 of these 10 also have median AUC values greater than 0.9 in CCLE, resulting in little variability across most cell lines when treated with these drugs. We identified a systematic relationship between variability in drug response in either study and correlation between the two studies (Fig. 1b). A valid comparison of CGP and CCLE should consider the pharmacology of the drugs screened and in particular the differences in the variability induced by different drugs. Nilotinib was not an isolated case; despite the highly experimental nature of many of the compounds screened by CCLE and CGP, we still identified several expected associations that were consistently reported by both studies, including ERBB2 for lapatinib4, NQO1 expression for 17-AAG5, BRAF mutation for PD-0325901 (ref. 6), AZD6244 (ref. 7), and PLX4720 (ref. 8), MDM2 for nutlin-3a (ref. 9), and MET for crizotinib10 (Supplementary Table 1). Finally, the utility of these pharmacogenomic datasets is now further supported by the findings that models fit using data from CGP could reliably predict drug response in several clinical trials11,12.

Figure 1: Limitations of using a correlation metric for the assessment of concordance between the CGP and CCLE drug sensitivity data.

a, Highly targeted agents (such as nilotinib) highlight a major limitation of the test for concordance reported in ref. 1. Scatterplot shows the nilotinib AUC values (in CGP) for the 189 cell lines that were screened by both CGP and CCLE. Only a very small proportion of cell lines achieve a response, that is the three BCR-ABL1-positive cell lines highlighted in red. This almost complete lack of biological variability renders a Spearman correlation ineffective as a means to assess concordance. b, The test for concordance in ref. 1 is confounded by variability in the drug response. Scatterplot shows the strong association between ‘Spearman’s correlation of AUC between CCLE and CGP’ and ‘variance of AUC in CCLE’. Drugs with a more variable AUC value are more likely to be highly correlated between CCLE and CGP (rs = 0.83, P = 1.9 × 10−4). The points have been colour coded by their variance of AUC in CGP, which is also significantly associated with both ‘variance of AUC in CCLE’ and ‘Spearman’s correlation of AUC between CCLE and CGP’. σ2 CPG denotes the variance of AUC in CPG.

PowerPoint slide

In summary, our analysis shows that the conclusions of Haibe-Kains et al.1 are unsubstantiated, and we propose that a fair assessment of concordance between large pharmacogenomic datasets will require the development or adaptation of methods that account for the issues raised here, although great care will be required to ensure that such methods do not introduce their own unforeseen biases.


In CGP and CCLE, using ordered data common to both studies, gene expression and drug sensitivity (AUC) values can be arranged in n1 × m and n2 × m matrices, respectively, in which m is the number of cell lines, n1 is the number of genes and n2 is the number of drugs common to both studies. Correlations ‘between’ cell lines are calculated by the correlation of matching columns of CGP and CCLE matrices (vectors of length n1 for expression or n2 for AUC). Correlations ‘across’ cell lines are the correlations of matching rows (vectors of length m for both data).

To achieve easy reproduction of our results, we have made the source code for our analysis available in a GitHub repository (

Change history

  • 13 December 2016

    The received date and affiliation details for author P.G. were corrected in the HTML.


  1. 1

    Haibe-Kains, B. et al. Inconsistency in large pharmacogenomic studies. Nature 504, 389–393 (2013)

    ADS  CAS  Article  Google Scholar 

  2. 2

    Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012)

    ADS  CAS  Article  Google Scholar 

  3. 3

    Garnett, M. J. et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570–575 (2012)

    ADS  CAS  Article  Google Scholar 

  4. 4

    Konecny, G. E. et al. Activity of the dual kinase inhibitor lapatinib (GW572016) against HER-2-overexpressing and trastuzumab-treated breast cancer cells. Cancer Res. 66, 1630–1639 (2006)

    CAS  Article  Google Scholar 

  5. 5

    Kelland, L. R., Sharp, S. Y., Rogers, P. M., Myers, T. G. & Workman, P. DT-Diaphorase expression and tumor cell sensitivity to 17-allylamino, 17-demethoxygeldanamycin, an inhibitor of heat shock protein 90. J. Natl. Cancer Inst. 91, 1940–1949 (1999)

    CAS  Article  Google Scholar 

  6. 6

    Solit, D. B. et al. BRAF mutation predicts sensitivity to MEK inhibition. Nature 439, 358–362 (2006)

    ADS  CAS  Article  Google Scholar 

  7. 7

    Dry, J. R. et al. Transcriptional pathway signatures predict MEK addiction and response to selumetinib (AZD6244). Cancer Res. 70, 2264–2273 (2010)

    CAS  Article  Google Scholar 

  8. 8

    Tsai, J. et al. Discovery of a selective inhibitor of oncogenic B-Raf kinase with potent antimelanoma activity. Proc. Natl Acad. Sci. USA 105, 3041–3046 (2008)

    ADS  CAS  Article  Google Scholar 

  9. 9

    Müller, C. R. et al. Potential for treatment of liposarcomas with the MDM2 antagonist Nutlin-3A. Int. J. Cancer 121, 199–205 (2007)

    Article  Google Scholar 

  10. 10

    Timm, A. & Kolesar, J. M. Crizotinib for the treatment of non-small-cell lung cancer. Am. J. Health Syst. Pharm. 70, 943–947 (2013)

    CAS  Article  Google Scholar 

  11. 11

    Geeleher, P., Cox, N. J. & Huang, R. S. Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome Biol. 15, R47 (2014)

    Article  Google Scholar 

  12. 12

    Falgreen, S. et al. Predicting response to multidrug regimens in cancer patients using cell line experiments and regularised regression models. BMC Cancer 15, 235 (2015)

    Article  Google Scholar 

Download references

Author information



Corresponding authors

Correspondence to Nancy J. Cox or R. Stephanie Huang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Table 1

This file shows a summary of the results reported with the Supplementary Data of CGP and CCLE for the 15 compounds assessed by Haibe-Kains et al. These results were obtained from either the CGP web portal ( or the supplementary tables S6 & S7 or supplementary figure 9 provided with the publication of the CCLE data. These already published results show a very clear trend of both studies reliably identifying many canonical drug targets. (XLSX 15 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Geeleher, P., Gamazon, E., Seoighe, C. et al. Consistency in large pharmacogenomic studies. Nature 540, E1–E2 (2016).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.