Inconsistency in large pharmacogenomic studies

Haibe-Kains, Benjamin; El-Hachem, Nehme; Birkbak, Nicolai Juul; Jin, Andrew C.; Beck, Andrew H.; Aerts, Hugo J. W. L.; Quackenbush, John

doi:10.1038/nature12831

Analysis
Published: 27 November 2013

Inconsistency in large pharmacogenomic studies

Benjamin Haibe-Kains^1,2,
Nehme El-Hachem¹,
Nicolai Juul Birkbak³,
Andrew C. Jin⁴,
Andrew H. Beck⁴^na1,
Hugo J. W. L. Aerts^5,6,7^na1 &
…
John Quackenbush^5,8^na1

Nature volume 504, pages 389–393 (2013)Cite this article

40k Accesses
355 Citations
180 Altmetric
Metrics details

Subjects

Abstract

Two large-scale pharmacogenomic studies were published recently in this journal. Genomic data are well correlated between studies; however, the measured drug response data are highly discordant. Although the source of inconsistencies remains uncertain, it has potential implications for using these outcome measures to assess gene–drug associations or select potential anticancer drugs on the basis of their reported results.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Consistency between gene expression profiles of cell lines in CGP⁶ and CCLE⁷ studies.**

**Figure 2: Consistency between drug sensitivity data published in CGP and CCLE studies.**

**Figure 3: Consistency of associations of genomics features with drug sensitivity.**

**Figure 4: Effects on consistency by intermixing CGP⁶ and CCLE⁷ data.**

Pharmacogenomics: current status and future perspectives

Article 27 January 2023

Munir Pirmohamed

Advancing the use of genome-wide association studies for drug repurposing

Article 23 July 2021

William R. Reay & Murray J. Cairns

Refining the impact of genetic evidence on clinical success

Article Open access 17 April 2024

Eric Vallabh Minikel, Jeffery L. Painter, … Matthew R. Nelson

References

Roden, D. M. & George, A. L., Jr The genetic basis of drug response. Nature 1, 37–44 (2002)
CAS Google Scholar
Shoemaker, R. H. The NCI60 human tumour cell line anticancer drug screen. Nature Rev. Cancer 6, 813–823 (2006)
Article CAS Google Scholar
Weinstein, J. N. Drug discovery: Cell lines battle cancer. Nature 483, 544–545 (2012)
Article ADS CAS Google Scholar
Heiser, L. M. et al. Subtype and pathway specific responses to anticancer compounds in breast cancer. Proc. Natl Acad. Sci. USA 109, 2724–2729 (2012)
Article ADS CAS Google Scholar
Yamori, T. Panel of human cancer cell lines provides valuable database for drug discovery and bioinformatics. Cancer Chemother. Pharmacol. 52 (Suppl. 1). 74–79 (2003)
Article Google Scholar
Garnett, M. J. et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570–575 (2012)
Article ADS CAS Google Scholar
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012)
Article ADS CAS Google Scholar
Wu, R. & Lin, M. Statistical and Computational Pharmacogenomics (Chapman and Hall/CRC, 2010)
MATH Google Scholar
Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005)
Article ADS CAS Google Scholar
Greshock, J. et al. Molecular target class is predictive of in vitro response profile. Cancer Res. 70, 3677–3686 (2010)
Article CAS Google Scholar
Papillon-Cavanagh, S. et al. Comparison and validation of genomic predictors for anticancer drug sensitivity. J. Am. Med. Inform. Assoc. 20, 597–602 (2013)
Article Google Scholar
Spearman, C. The proof and measurement of association between two things. Int. J. Epidemiol. 39, 1137–1150 (2010)
Article CAS Google Scholar
Barretina, J. et al. Addendum: The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 492, 290 (2012)
Article ADS CAS Google Scholar
Parkinson, H. et al. ArrayExpress–a public database of microarray experiments and gene expression profiles. Nucleic Acids Res. 35, D747–D750 (2007)
Article CAS Google Scholar
McCall, M. N., Bolstad, B. M. & Irizarry, R. A. Frozen robust multiarray analysis (fRMA). Biostatistics 11, 242–253 (2010)
Article Google Scholar
Li, Q., Birkbak, N. J., Győrffy, B., Szallasi, Z. & Eklund, A. C. Jetset: selecting the optimal microarray probe set to represent a gene. BMC Bioinformatics 12, 474 (2011)
Article Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unfication of biology. Nature Genet. 25, 25–29 (2000)
Article CAS Google Scholar
Sim, J. & Wright, C. C. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys. Ther. 85, 257–268 (2005)
PubMed Google Scholar

Download references

Acknowledgements

We thank J. Archambault for his insightful comments on the comparative study between experimental protocols used in the large pharmacogenomic studies investigated in this work. The authors would like to thank the investigators of the Cancer Genome Project, the Cancer Cell Line Encyclopedia and the GlaxoSmithKline cell line study who have made their invaluable data available to the scientific community. N.E.-H. was supported by an IRCM doctoral fellowship. A.H.B. was supported by an award from the Klarman Family Foundation and by support from NIH grant CA087969. N.J.B. was funded The Villum Kann Rasmussen Foundation. J.Q. was supported grants from the Dr Miriam and Sheldon G. Adelson Medical Research Foundation and from the NCI GAME-ON Cancer Post-GWAS initiative (U19 CA148065-01).

Author information

Andrew H. Beck, Hugo J. W. L. Aerts and John Quackenbush: These authors contributed equally to this work.

Authors and Affiliations

Institut de Recherches Cliniques de Montréal, University of Montreal, Montreal, Quebec, Canada ,
Benjamin Haibe-Kains & Nehme El-Hachem
Ontario Cancer Institute, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario M5G 2M9, Canada ,
Benjamin Haibe-Kains
Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, 2800 Kgs, Lyngby, Denmark,
Nicolai Juul Birkbak
Department of Pathology, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, 02215, Massachusetts, USA
Andrew C. Jin & Andrew H. Beck
Department of Biostatistics and Computational Biology and Center for Cancer Computational Biology, Dana-Farber Cancer Institute, Boston, 02215, Massachusetts, USA
Hugo J. W. L. Aerts & John Quackenbush
Department of Radiation Oncology & Radiology, Dana-Farber Cancer Institute, Brigham and Women’s Hospital, Harvard Medical School, Boston, 02215, Massachusetts, USA
Hugo J. W. L. Aerts
Department of Radiation Oncology, Maastricht University, Maastricht 6200 MD, The Netherlands,
Hugo J. W. L. Aerts
Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, 02215, Massachusetts, USA
John Quackenbush

Authors

Benjamin Haibe-Kains
View author publications
You can also search for this author in PubMed Google Scholar
Nehme El-Hachem
View author publications
You can also search for this author in PubMed Google Scholar
Nicolai Juul Birkbak
View author publications
You can also search for this author in PubMed Google Scholar
Andrew C. Jin
View author publications
You can also search for this author in PubMed Google Scholar
Andrew H. Beck
View author publications
You can also search for this author in PubMed Google Scholar
Hugo J. W. L. Aerts
View author publications
You can also search for this author in PubMed Google Scholar
John Quackenbush
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.H.-K. conceived the study with major contributions from N.J.B., H.J.W.L.A. and J.Q. B.H.-K. and N.E.-H. collected and curated the gene expression profiles and drug phenotypic data. A.C.J. and A.H.B. collected and curated the mutation data. B.H.-K. performed all the analyses and wrote the code with contributions from N.E.-H. and A.C.J. N.E.-H., A.C.J. and A.H.B. compared the experimental protocols of the pharmacogenomic studies. B.H.-K., A.H.B., H.J.W.L.A. and J.Q. supervised the study. B.H.-K., A.H.B., H.J.W.L.A. and J.Q. wrote the manuscript with contributions from N.E.-H. and N.J.B. All authors discussed the results and commented on the manuscript.

Corresponding author

Correspondence to Benjamin Haibe-Kains.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

The R code enables one to download and process the pharmacogenomic data and to generate all the results presented in the paper and its Supplementary Information.

Extended data figures and tables

Extended Data Figure 1 Intersection between the pharmacogenomic studies in terms of drugs, cell lines and genes.

a, Venn diagram reporting the number of drugs shared between CGP and CCLE studies. b, Description of the 15 anticancer drugs screened both in CGP and CCLE studies. c, Venn diagram reporting the number of drugs shared between CGP, CCLE and GSK studies. d, Venn diagram reporting the number of cell lines shared by CGP and CCLE studies. e, Number of cell lines for each tissue type among the 471 common to CGP and CCLE studies. f, Venn diagram reporting the number of cell lines shared between CGP, CCLE and GSK studies. g, Venn diagram reporting the number of genes whose presence of mutations was assessed both in CGP and CCLE studies. h, Venn diagram reporting the number of genes whose expression was assessed both in CGP and CCLE studies.

Extended Data Figure 2 Box plot of the correlations of missense mutation profiles between identical cell lines in CGP and CCLE.

Two‐sided Wilcoxon rank‐sum test was used to test whether agreement (Cohen’s κ coefficient) was significantly higher in identical cell lines compared to different cell lines (upper‐right corner).

Extended Data Figure 3 Scatter plot reporting the IC₅₀ values of camptothecin for 252 cell lines screened within the CGP project, as measured at the facilities of the Massachusetts General Hospital (MGH) and the Wellcome Trust Sanger Institute (WTSI).

Spearman’s rank correlation coefficient (r_s) is reported in the upper‐left corner. Significance of the Spearman's rank correlation (positive) coefficient is reported as one‐sided Pvalue.

Extended Data Figure 4 Consistency of IC₅₀ values within the range of tested concentrations between CGP and CCLE.

a, Scatter plots reporting the drug sensitivity measurements, which are the IC₅₀ values within the range of tested concentrations (thus excluding extrapolated IC₅₀ in CGP and placeholder values in CCLE) in the 471 cell lines and for each of the 15 drugs investigated both in CGP and CCLE. b, Spearman’s rank correlation coefficient (r_s) for each drug where significance of each correlation coefficient is reported using an asterisk if one‐sided P value <0.05.

Extended Data Figure 5 Consistency of AUC values between CGP and CCLE.

a, Scatter plots reporting the drug sensitivity (AUC) measured in the 471 cell lines and for each of the 15 drugs investigated both in CGP and CCLE. b, Spearman’s rank correlation coefficient (r_s) for each drug where significance of each correlation coefficient is reported using an asterisk if one‐sided P value <0.05.

Extended Data Figure 6 Consistency of AUC‐based gene–drug associations between CGP and CCLE.

a, Scatter plots reporting the gene–drug associations computed with AUC, as quantified by the standardized coefficient of the gene of interest in a linear model controlled for tissue type, in the 471 cell lines and for each of the 15 drugs investigated both in CGP and CCLE. b, Spearman’s rank correlation coefficient (r_s) for each drug where significance of each correlation coefficient is reported using an asterisk if one‐sided P value <0.05.

Extended Data Figure 7 Consistency of AUC‐based pathway–drug associations between CGP and CCLE.

a, Scatter plots reporting the pathway–drug associations computed with AUC, as quantified by the standardized coefficient of the gene of interest in a linear model controlled for tissue type, in the 471 cell lines and for each of the 15 drugs investigated both in CGP and CCLE. b, Spearman’s rank correlation coefficient (r_s) for each drug where significance of each correlation coefficient is reported using an asterisk if one‐sided P value <0.05.

Extended Data Figure 8 Consistency of AUC‐based mutation–drug associations between CGP and CCLE.

a, Scatter plots reporting the mutation–drug associations computed with AUC, as quantified by the standardized coefficient of the gene of interest in a linear model controlled for tissue type, in the 471 cell lines and for each of the 15 drugs investigated both in CGP and CCLE. b, Spearman’s rank correlation coefficient (r_s) for each drug where significance of each correlation coefficient is reported using an asterisk if one‐sided P value <0.05.

Extended Data Figure 9 Comparison of drug sensitivity measured in CGP and CCLE with GSK.

a, Scatter plots reporting the drug sensitivity measurements (IC₅₀) of all drugs and cell lines screened both in CCLE and GSK data sets (2 drugs in 249 cell lines). b, Scatter plots reporting the drug sensitivity measurements (IC₅₀) of all drugs and cell lines screened both in CCLE and GSK data sets (5 drugs in 231 cell lines). Significance of the Spearman's rank correlation (positive) coefficient is reported as one‐sided P value.

Extended Data Table 1 Spearman’s rank correlation coefficients and significance for consistency of drug sensitivity, gene–drug and pathway–drug associations for IC₅₀ (a) and AUC (b)

Full size table

Supplementary information

Supplementary Information

This file contains a list of abbreviations, the instructions to fully reproduce the analysis results from the R scripts, the comparison of pharmacological assays, Supplementary Tables 1-2, Supplementary Figures 1-23 and Supplementary References. (PDF 13904 kb)

Supplementary Data

This file contains Supplementary set 1, R scripts. The archive (zip) contains the R scripts and accompanying files to enable full reproducibility of the analysis results. (ZIP 11045 kb)

Supplementary Data

This zipped file contains Supplementary Data sets 2 and 3. Supplementary File 2, Statistics for the gene-drug-associations for IC₅₀ in CGP reports the gene-drug associations using IC₅₀ as drug sensitivity measure, including the standardized coefficient, its standard error, t statistic, nominal p-value and FDR for the 12,187 genes and 15 drugs screened in CGP. Supplementary File 3, Statistics for the gene-drug-associations for IC₅₀ in CCLE reports the gene-drug associations using IC₅₀ as drug sensitivity measure, including the standardized coefficient, its standard error, t statistic, nominal p-value and FDR for the 12,187 genes and 15 drugs screened in CCLE. (ZIP 25103 kb)

Supplementary Data

This zipped file contains Supplementary Data sets 4-13 and a guide to the data. (ZIP 30820 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Fig. 4

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haibe-Kains, B., El-Hachem, N., Birkbak, N. et al. Inconsistency in large pharmacogenomic studies. Nature 504, 389–393 (2013). https://doi.org/10.1038/nature12831

Download citation

Received: 15 April 2013
Accepted: 07 November 2013
Published: 27 November 2013
Issue Date: 19 December 2013
DOI: https://doi.org/10.1038/nature12831

This article is cited by

Prevalence, causes and impact of TP53-loss phenocopying events in human tumors
- Bruno Fito-Lopez
- Marina Salvadores
- Fran Supek
BMC Biology (2023)
To share is to be a scientist
- Vivien Marx
Nature Methods (2023)
Establishment and evaluation of ectopic and orthotopic prostate cancer models using cell sheet technology
- Dongliang Zhang
- Ying Wang
- Shukui Zhou
Journal of Translational Medicine (2022)
Evaluation of statistical approaches for association testing in noisy drug screening data
- Petr Smirnov
- Ian Smith
- Benjamin Haibe-Kains
BMC Bioinformatics (2022)
Biomimetic hydrogel supports initiation and growth of patient-derived breast tumor organoids
- Elisabeth Prince
- Jennifer Cruickshank
- Eugenia Kumacheva
Nature Communications (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Extended data figures and tables

Supplementary information

PowerPoint slides

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links