Inconsistency in large pharmacogenomic studies

Article metrics

Abstract

Two large-scale pharmacogenomic studies were published recently in this journal. Genomic data are well correlated between studies; however, the measured drug response data are highly discordant. Although the source of inconsistencies remains uncertain, it has potential implications for using these outcome measures to assess gene–drug associations or select potential anticancer drugs on the basis of their reported results.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Consistency between gene expression profiles of cell lines in CGP6 and CCLE7 studies.
Figure 2: Consistency between drug sensitivity data published in CGP and CCLE studies.
Figure 3: Consistency of associations of genomics features with drug sensitivity.
Figure 4: Effects on consistency by intermixing CGP6 and CCLE7 data.

References

  1. 1

    Roden, D. M. & George, A. L., Jr The genetic basis of drug response. Nature 1, 37–44 (2002)

  2. 2

    Shoemaker, R. H. The NCI60 human tumour cell line anticancer drug screen. Nature Rev. Cancer 6, 813–823 (2006)

  3. 3

    Weinstein, J. N. Drug discovery: Cell lines battle cancer. Nature 483, 544–545 (2012)

  4. 4

    Heiser, L. M. et al. Subtype and pathway specific responses to anticancer compounds in breast cancer. Proc. Natl Acad. Sci. USA 109, 2724–2729 (2012)

  5. 5

    Yamori, T. Panel of human cancer cell lines provides valuable database for drug discovery and bioinformatics. Cancer Chemother. Pharmacol. 52 (Suppl. 1). 74–79 (2003)

  6. 6

    Garnett, M. J. et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570–575 (2012)

  7. 7

    Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012)

  8. 8

    Wu, R. & Lin, M. Statistical and Computational Pharmacogenomics (Chapman and Hall/CRC, 2010)

  9. 9

    Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005)

  10. 10

    Greshock, J. et al. Molecular target class is predictive of in vitro response profile. Cancer Res. 70, 3677–3686 (2010)

  11. 11

    Papillon-Cavanagh, S. et al. Comparison and validation of genomic predictors for anticancer drug sensitivity. J. Am. Med. Inform. Assoc. 20, 597–602 (2013)

  12. 12

    Spearman, C. The proof and measurement of association between two things. Int. J. Epidemiol. 39, 1137–1150 (2010)

  13. 13

    Barretina, J. et al. Addendum: The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 492, 290 (2012)

  14. 14

    Parkinson, H. et al. ArrayExpress–a public database of microarray experiments and gene expression profiles. Nucleic Acids Res. 35, D747–D750 (2007)

  15. 15

    McCall, M. N., Bolstad, B. M. & Irizarry, R. A. Frozen robust multiarray analysis (fRMA). Biostatistics 11, 242–253 (2010)

  16. 16

    Li, Q., Birkbak, N. J., Győrffy, B., Szallasi, Z. & Eklund, A. C. Jetset: selecting the optimal microarray probe set to represent a gene. BMC Bioinformatics 12, 474 (2011)

  17. 17

    Ashburner, M. et al. Gene ontology: tool for the unfication of biology. Nature Genet. 25, 25–29 (2000)

  18. 18

    Sim, J. & Wright, C. C. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys. Ther. 85, 257–268 (2005)

Download references

Acknowledgements

We thank J. Archambault for his insightful comments on the comparative study between experimental protocols used in the large pharmacogenomic studies investigated in this work. The authors would like to thank the investigators of the Cancer Genome Project, the Cancer Cell Line Encyclopedia and the GlaxoSmithKline cell line study who have made their invaluable data available to the scientific community. N.E.-H. was supported by an IRCM doctoral fellowship. A.H.B. was supported by an award from the Klarman Family Foundation and by support from NIH grant CA087969. N.J.B. was funded The Villum Kann Rasmussen Foundation. J.Q. was supported grants from the Dr Miriam and Sheldon G. Adelson Medical Research Foundation and from the NCI GAME-ON Cancer Post-GWAS initiative (U19 CA148065-01).

Author information

B.H.-K. conceived the study with major contributions from N.J.B., H.J.W.L.A. and J.Q. B.H.-K. and N.E.-H. collected and curated the gene expression profiles and drug phenotypic data. A.C.J. and A.H.B. collected and curated the mutation data. B.H.-K. performed all the analyses and wrote the code with contributions from N.E.-H. and A.C.J. N.E.-H., A.C.J. and A.H.B. compared the experimental protocols of the pharmacogenomic studies. B.H.-K., A.H.B., H.J.W.L.A. and J.Q. supervised the study. B.H.-K., A.H.B., H.J.W.L.A. and J.Q. wrote the manuscript with contributions from N.E.-H. and N.J.B. All authors discussed the results and commented on the manuscript.

Correspondence to Benjamin Haibe-Kains.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

The R code enables one to download and process the pharmacogenomic data and to generate all the results presented in the paper and its Supplementary Information.

Extended data figures and tables

Extended Data Figure 1 Intersection between the pharmacogenomic studies in terms of drugs, cell lines and genes.

a, Venn diagram reporting the number of drugs shared between CGP and CCLE studies. b, Description of the 15 anticancer drugs screened both in CGP and CCLE studies. c, Venn diagram reporting the number of drugs shared between CGP, CCLE and GSK studies. d, Venn diagram reporting the number of cell lines shared by CGP and CCLE studies. e, Number of cell lines for each tissue type among the 471 common to CGP and CCLE studies. f, Venn diagram reporting the number of cell lines shared between CGP, CCLE and GSK studies. g, Venn diagram reporting the number of genes whose presence of mutations was assessed both in CGP and CCLE studies. h, Venn diagram reporting the number of genes whose expression was assessed both in CGP and CCLE studies.

Extended Data Figure 2 Box plot of the correlations of missense mutation profiles between identical cell lines in CGP and CCLE.

Two‐sided Wilcoxon rank‐sum test was used to test whether agreement (Cohen’s κ coefficient) was significantly higher in identical cell lines compared to different cell lines (upper‐right corner).

Extended Data Figure 3 Scatter plot reporting the IC50 values of camptothecin for 252 cell lines screened within the CGP project, as measured at the facilities of the Massachusetts General Hospital (MGH) and the Wellcome Trust Sanger Institute (WTSI).

Spearman’s rank correlation coefficient (rs) is reported in the upper‐left corner. Significance of the Spearman's rank correlation (positive) coefficient is reported as one‐sided Pvalue.

Extended Data Figure 4 Consistency of IC50 values within the range of tested concentrations between CGP and CCLE.

a, Scatter plots reporting the drug sensitivity measurements, which are the IC50 values within the range of tested concentrations (thus excluding extrapolated IC50 in CGP and placeholder values in CCLE) in the 471 cell lines and for each of the 15 drugs investigated both in CGP and CCLE. b, Spearman’s rank correlation coefficient (rs) for each drug where significance of each correlation coefficient is reported using an asterisk if one‐sided P value <0.05.

Extended Data Figure 5 Consistency of AUC values between CGP and CCLE.

a, Scatter plots reporting the drug sensitivity (AUC) measured in the 471 cell lines and for each of the 15 drugs investigated both in CGP and CCLE. b, Spearman’s rank correlation coefficient (rs) for each drug where significance of each correlation coefficient is reported using an asterisk if one‐sided P value <0.05.

Extended Data Figure 6 Consistency of AUC‐based gene–drug associations between CGP and CCLE.

a, Scatter plots reporting the gene–drug associations computed with AUC, as quantified by the standardized coefficient of the gene of interest in a linear model controlled for tissue type, in the 471 cell lines and for each of the 15 drugs investigated both in CGP and CCLE. b, Spearman’s rank correlation coefficient (rs) for each drug where significance of each correlation coefficient is reported using an asterisk if one‐sided P value <0.05.

Extended Data Figure 7 Consistency of AUC‐based pathway–drug associations between CGP and CCLE.

a, Scatter plots reporting the pathway–drug associations computed with AUC, as quantified by the standardized coefficient of the gene of interest in a linear model controlled for tissue type, in the 471 cell lines and for each of the 15 drugs investigated both in CGP and CCLE. b, Spearman’s rank correlation coefficient (rs) for each drug where significance of each correlation coefficient is reported using an asterisk if one‐sided P value <0.05.

Extended Data Figure 8 Consistency of AUC‐based mutation–drug associations between CGP and CCLE.

a, Scatter plots reporting the mutation–drug associations computed with AUC, as quantified by the standardized coefficient of the gene of interest in a linear model controlled for tissue type, in the 471 cell lines and for each of the 15 drugs investigated both in CGP and CCLE. b, Spearman’s rank correlation coefficient (rs) for each drug where significance of each correlation coefficient is reported using an asterisk if one‐sided P value <0.05.

Extended Data Figure 9 Comparison of drug sensitivity measured in CGP and CCLE with GSK.

a, Scatter plots reporting the drug sensitivity measurements (IC50) of all drugs and cell lines screened both in CCLE and GSK data sets (2 drugs in 249 cell lines). b, Scatter plots reporting the drug sensitivity measurements (IC50) of all drugs and cell lines screened both in CCLE and GSK data sets (5 drugs in 231 cell lines). Significance of the Spearman's rank correlation (positive) coefficient is reported as one‐sided P value.

Extended Data Table 1 Spearman’s rank correlation coefficients and significance for consistency of drug sensitivity, gene–drug and pathway–drug associations for IC50 (a) and AUC (b)

Supplementary information

Supplementary Information

This file contains a list of abbreviations, the instructions to fully reproduce the analysis results from the R scripts, the comparison of pharmacological assays, Supplementary Tables 1-2, Supplementary Figures 1-23 and Supplementary References. (PDF 13904 kb)

Supplementary Data

This file contains Supplementary set 1, R scripts. The archive (zip) contains the R scripts and accompanying files to enable full reproducibility of the analysis results. (ZIP 11045 kb)

Supplementary Data

This zipped file contains Supplementary Data sets 2 and 3. Supplementary File 2, Statistics for the gene-drug-associations for IC50 in CGP reports the gene-drug associations using IC50 as drug sensitivity measure, including the standardized coefficient, its standard error, t statistic, nominal p-value and FDR for the 12,187 genes and 15 drugs screened in CGP. Supplementary File 3, Statistics for the gene-drug-associations for IC50 in CCLE reports the gene-drug associations using IC50 as drug sensitivity measure, including the standardized coefficient, its standard error, t statistic, nominal p-value and FDR for the 12,187 genes and 15 drugs screened in CCLE. (ZIP 25103 kb)

Supplementary Data

This zipped file contains Supplementary Data sets 4-13 and a guide to the data. (ZIP 30820 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Haibe-Kains, B., El-Hachem, N., Birkbak, N. et al. Inconsistency in large pharmacogenomic studies. Nature 504, 389–393 (2013) doi:10.1038/nature12831

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.