A deep learning framework for drug repurposing via emulating clinical trials on real-world patient data

Liu, Ruoqi; Wei, Lai; Zhang, Ping

doi:10.1038/s42256-020-00276-w

Article
Published: 04 January 2021

A deep learning framework for drug repurposing via emulating clinical trials on real-world patient data

Nature Machine Intelligence volume 3, pages 68–75 (2021)Cite this article

6098 Accesses
35 Citations
238 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

Drug repurposing is an effective strategy to identify new uses for existing drugs, providing the quickest possible transition from bench to bedside. Real-world data, such as electronic health records and insurance claims, provide information on large cohorts of users for many drugs. Here we present an efficient and easily customized framework for generating and testing multiple candidates for drug repurposing using a retrospective analysis of real-world data. Building upon well-established causal inference and deep learning methods, our framework emulates randomized clinical trials for drugs present in a large-scale medical claims database. We demonstrate our framework on a coronary artery disease cohort of millions of patients. We successfully identify drugs and drug combinations that substantially improve the coronary artery disease outcomes but haven’t been indicated for treating coronary artery disease, paving the way for drug repurposing.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Flowchart of overall drug repurposing framework.**

Fig. 2: Illustration of the deep learning model for predicting treatment probability (or propensity score) that we used to correct confounding from time sequence data (including diagnoses d_t, prescriptions p_t and demographics b_t).

**Fig. 3: Distribution of estimated ATE of drugs on defined outcomes across the 50 bootstrap samples.**

**Fig. 4: The SMD values of the top 20 well-balanced covariates.**

High-throughput target trial emulation for Alzheimer’s disease drug repurposing with real-world data

Article Open access 11 December 2023

Repurposing drugs to treat cardiovascular disease in the era of precision medicine

Article 23 May 2022

Causal inference and counterfactual prediction in machine learning for actionable healthcare

Article 13 July 2020

Data availability

The data we use is MarketScan Commercial Claims and Encounters (CCAE, more than 100 million patients, from 2012 to 2017) The details of source data structure and prepossessed input data demo are available at the Github repository https://github.com/ruoqi-liu/DeepIPW. Access to the MarketScan data analysed in this manuscript is provided by the Ohio State University. The dataset is available from IBM at https://www.ibm.com/products/marketscan-research-databases.

Code availability

The source code for this paper can be downloaded from the Github repository at https://github.com/ruoqi-liu/DeepIPWor the Zenodo repository at https://doi.org/10.5281/zenodo.4079391.

References

Langedijk, J., Mantel-Teeuwisse, A. K., Slijkerman, D. S. & Schutjens, M.-H. D. Drug repositioning and repurposing: terminology and definitions in literature. Drug Discov. Today 20, 1027–1034 (2015).
Article Google Scholar
Ashburn, T. T. & Thor, K. B. Drug repositioning: identifying and developing new uses for existing drugs. Nat. Rev. Drug Discov. 3, 673–683 (2004).
Article Google Scholar
Pushpakom, S. et al. Drug repurposing: progress, challenges and recommendations. Nat. Rev. Drug Discov. 18, 41–58 (2019).
Article Google Scholar
Luo, H. et al. DPDR-CPI, a server that predicts drug positioning and drug repositioning via chemical-protein interactome. Sci. Rep. 6, 35996 (2016).
Article Google Scholar
Dakshanamurthy, S. et al. Predicting new indications for approved drugs using a proteochemometric method. J. Med. Chem. 55, 6832–6848 (2012).
Article Google Scholar
Sanseau, P. et al. Use of genome-wide association studies for drug repositioning. Nat. Biotechnol. 30, 317–320 (2012).
Article Google Scholar
Iorio, F. et al. Discovery of drug mode of action and drug repositioning from transcriptional responses. Proc. Natl Acad. Sci USA 107, 14621–14626 (2010).
Article Google Scholar
Sirota, M. et al. Discovery and preclinical validation of drug indications using compendia of public gene expression data. Sci. Transl. Med. 3, 96ra77 (2011).
Article Google Scholar
Buchan, N. S. et al. The role of translational bioinformatics in drug discovery. Drug Discov. Today 16, 426–434 (2011).
Article Google Scholar
Sherman, R. E. et al. Real-world evidence—what is it and what can it tell us. N. Engl. J. Med. 375, 2293–2297 (2016).
Article Google Scholar
Cheng, F. et al. Network-based approach to prediction and population-based validation of in silico drug repurposing. Nat. Commun. 9, 2691 (2018).
Article Google Scholar
Xu, H. et al. Validating drug repurposing signals using electronic health records: a case study of metformin associated with reduced cancer mortality. J. Am. Med. Inform. Assoc. 22, 179–191 (2014).
Article Google Scholar
Hernán, M. A. & Robins, J. M. Using big data to emulate a target trial when a randomized trial is not available. Am. J. Epidemiol. 183, 758–764 (2016).
Article Google Scholar
D’Agostino, R. B. Estimating treatment effects using observational data. JAMA 297, 314–316 (2007).
Article Google Scholar
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Article Google Scholar
Hirano, K., Imbens, G. W. & Ridder, G. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71, 1161–1189 (2003).
Article MathSciNet Google Scholar
MarketScan Research Databases. IBM https://www.ibm.com/products/marketscan-research-databases (2020).
Commercial Claims and Encounters: Medicare Supplemental https://theclearcenter.org/wp-content/uploads/2020/01/IBM-MarketScan-User-Guide.pdf (Truven Health Analytics, 2016).
Classification of diseases, functioning, and disability. Centers for Disease Control and Prevention https://www.cdc.gov/nchs/icd/index.htm (2019).
The Observational Health Data Sciences and Informatics (OHDSI). https://ohdsi.org/ (2019).
Causes of heart failure. American Heart Association https://www.heart.org/en/health-topics/heart-failure/causes-and-risks-for-heart-failure/causes-of-heart-failure (2017).
Gheorghiade, M. & Bonow, R. O. Chronic heart failure in the united states: a manifestation of coronary artery disease. Circulation 97, 282–289 (1998).
Article Google Scholar
Conditions that increase risk for stroke. Centers for Disease Control and Prevention https://www.cdc.gov/stroke/conditions.htm (2018).
Coronary artery disease. Heart and Stroke Foundation of Canada https://www.heartandstroke.ca/heart/conditions/coronary-artery-disease (2019).
Austin, P. C. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav. Res. 46, 399–424 (2011).
Article Google Scholar
Efron, B. & Tibshirani, R. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat. Sci. 1, 54–75 (1986).
Article MathSciNet Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
MathSciNet MATH Google Scholar
Kuhn, M., Campillos, M., Letunic, L. J. & Bork, P. A side effect resource to capture phenotypic effects of drugs. Mol. Syst. Biol. 6, 343 (2010).
Article Google Scholar
Wishart, D. S. et al. Drugbank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Res. 46, D1074–D1082 (2018).
Article Google Scholar
Fisher, M. L. et al. Beneficial effects of metoprolol in heart failure associated with coronary artery disease: a randomized trial. J. Am. Coll. Cardiol. 23, 943–950 (1994).
Article Google Scholar
Wong, T. Y., Simó, R. & Mitchell, P. Fenofibrate – a potential systemic treatment for diabetic retinopathy?. Am. J. Ophthalmol. 154, 6–12 (2012).
Article Google Scholar
Hydrochlorothiazide. drugs.com https://www.drugs.com/monograph/hydrochlorothiazide.html (2019).
Pepine, C. J. et al. A calcium antagonist vs a non–calcium antagonist hypertension treatment strategy for patients with coronary artery disease: the international verapamil-trandolapril study (invest): a randomized controlled trial. JAMA 290, 2805–2816 (2003).
Article Google Scholar
Jukema, J. W. et al. Effects of lipid lowering by pravastatin on progression and regression of coronary artery disease in symptomatic men with normal to moderately elevated serum cholesterol levels: the regression growth evaluation statin study (regress). Circulation 91, 2528–2540 (1995).
Article Google Scholar
Kjekshus, J., Pedersen, T. R., Olsson, A. G., Færgeman, O. & Pyörälä, K. The effects of simvastatin on the incidence of heart failure in patients with coronary heart disease. J. Card. Fail. 3, 249–254 (1997).
Article Google Scholar
Higuchi, T., Abletshauser, C., Nekolla, S. G., Schwaiger, M. & Bengel, F. M. Effect of the angiotensin receptor blocker valsartan on coronary microvascular flow reserve in moderately hypertensive patients with stable coronary artery disease. Microcirculation 14, 805–812 (2007).
Article Google Scholar
Diltiazem. SIDER http://sideeffects.embl.de/drugs/3075/ (2019).
Ozery-Flato, M., Goldschmidt, Y., Shaham, O., Ravid, S. & Yanover, C. Framework for identifying drug repurposing candidates from observational healthcare data. Preprint at medRxiv https://doi.org/10.1101/2020.01.28.20018366 (2020).
Shimoni, Y. et al. An evaluation toolkit to guide model selection and cohort definition in causal inference. Preprint at https://arxiv.org/abs/1906.00442 (2019).
Zhang, P., Wang, F., Hu, J. & Sorrentino, R. Exploring the relationship between drug side-effects and therapeutic indications. In AMIA Annual Symposium Proceedings 2013 1568–1577 (American Medical Informatics Association, 2013).
Liang, X. et al. LRSSL: predict and interpret drug–disease associations based on data integration using sparse subspace learning. Bioinformatics 33, 1187–1196 (2017).
Article Google Scholar
Luo, H. et al. DRAR-CPI: a server for identifying drug repositioning potential and adverse drug reactions via the chemical–protein interactome. Nucleic Acids Res. 39, W492–W498 (2011).
Article Google Scholar
Dudley, J. T., Deshpande, T. & Butte, A. J. Exploiting drug–disease relationships for computational drug repositioning. Brief. Bioinform. 12, 303–311 (2011).
Article Google Scholar
Jarada, T. N., Rokne, J. G. & Alhajj, R. A review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions. J. Cheminf. 12, 46 (2020).
Article Google Scholar
Gottlieb, A., Stein, G. Y., Ruppin, E. & Sharan, R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol. Syst. Biol. 7, 496 (2011).
Article Google Scholar
Rubinstein, L. V. et al. Design issues of randomized phase II trials and a proposal for phase ii screening trials. J. Clin. Oncol. 23, 7199–7206 (2005).
Article Google Scholar
Metformin to reduce heart failure after myocardial infarction (gips-iii). clinicaltrials.gov https://clinicaltrials.gov/ct2/show/study/NCT01217307?term=metformin&cond=Coronary+Artery+Disease&phase=12&draw=2&rank=2 (2018).
Escitalopram oxalate. drugs.com https://www.drugs.com/monograph/escitalopram-oxalate.html (2020).
Responses of myocardial ischemia to escitalopram treatment (remit). clinicaltrials.gov https://clinicaltrials.gov/ct2/show/NCT00574847?term=escitalopram&cond=Coronary+Artery+Disease&draw=2&rank=1 (2015).
Effect of atorvastatin on fractional flow reserve in coronary artery disease (forte). clinicaltrials.gov https://clinicaltrials.gov/ct2/show/NCT01946815?term=atorvastatin&cond=Coronary+Artery+Disease&phase=12&draw=2&rank=1 (2018).
Dahlöf, B. et al. Cardiovascular morbidity and mortality in the losartan intervention for endpoint reduction in hypertension study (life): a randomised trial against atenolol. Lancet 359, 995–1003 (2002).
Article Google Scholar
D’Agostino, R. B. Jr Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat. Med. 17, 2265–2281 (1998).

Download references

Acknowledgements

This work was funded in part by the National Center for Advancing Translational Research of the National Institutes of Health under award number CTSA Grant UL1TR002733. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA
Ruoqi Liu & Ping Zhang
Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
Lai Wei & Ping Zhang
Translational Data Analytics Institute, The Ohio State University, Columbus, OH, USA
Ping Zhang

Authors

Ruoqi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lai Wei
View author publications
You can also search for this author in PubMed Google Scholar
Ping Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.Z. conceived the project. R.L. and P.Z. developed the method. R.L. conducted the experiments. R.L., L.W. and P.Z. analysed the results. R.L., L.W. and P.Z. wrote the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ping Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Daniel Merk and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 CAD cohorts characteristics.

a, The patients’ distribution of total time in the database. b, The patient’s distribution of time before/after CAD initiation date. c, The growth of the number of patients developing outcomes after CAD initiation date. d, The gender distribution with age at CAD initiation date.

Extended Data Fig. 2 Performance comparison of LSTM-IPTW and LR-IPTW using drug candidate: diltiazem (with known CAD indication).

The three figures on the top are results obtained from LSTM-IPTW, while the figures on the bottom are from LR-IPTW. a, and (d) The absolute SMD of each covariate in the original data (orange triangles) and in the weighted data (blue circles). b, and (e) The distribution of estimated propensity scores over user (orange area) and non-user (blue area) cohorts. c, and (f) The ROC curves for the propensity model (orange), expected value (green) and weighted propensity (blue).

Extended Data Fig. 3 Distribution of estimated ATE of drug classes on defined outcomes across the 50 bootstrap samples.

All these showing drug classes satisfy two conditions: adjusted p-value less than 0.05 and post unbalanced ratio less than 2%. Within the boxplot, the central line denotes the median, and the bottom and the top edges denote the 25th(Q1) and 75th(Q3) and percentiles respectively. The whiskers extend to 1.5 times the interquartile range.

Extended Data Fig. 4 The list of significant drug classes.

The drug classes are denoted using ATC code and corresponding names.

Extended Data Fig. 5 The estimated treatment effects for CAD over balanced and statistically significant drug combinations.

The drug combinations are ranked by the estimated ATE values.

Extended Data Fig. 6 Performance comparison of proposed method and three pre-clinical methods evaluated by Precision@K.

The values of K are selected from {6, 9}.

Extended Data Fig. 7 Retrieved additional repurposing candidates under different thresholds’ setting.

The adjusted p-value is changed to 0.15 and the post unbalanced ratio remains the same as previous setting (less than 2%).

Extended Data Fig. 8 The definition of user and non-user cohorts.

Index date refers to the first prescription of the trial’s drug (user cohort) or the alternative drug (non-user cohort). The time period before the index date is the baseline period, and the time after the index date is the follow-up period. The patient covariates are collected during the baseline period and the treatment effects areevaluated at the follow-up period.

Supplementary information

Supplementary Information

Supplementary Tables 1–6 and Figs. 1 and 2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, R., Wei, L. & Zhang, P. A deep learning framework for drug repurposing via emulating clinical trials on real-world patient data. Nat Mach Intell 3, 68–75 (2021). https://doi.org/10.1038/s42256-020-00276-w

Download citation

Received: 27 February 2020
Accepted: 16 November 2020
Published: 04 January 2021
Issue Date: January 2021
DOI: https://doi.org/10.1038/s42256-020-00276-w

This article is cited by

Clinical data mining: challenges, opportunities, and recommendations for translational applications
- Huimin Qiao
- Yijing Chen
- You Guo
Journal of Translational Medicine (2024)
Harnessing the potential of machine learning and artificial intelligence for dementia research
- Janice M. Ranson
- Magda Bucholc
- David J. Llewellyn
Brain Informatics (2023)
Comparing the effects of four common drug classes on the progression of mild cognitive impairment to dementia using electronic health records
- Jie Xu
- Fei Wang
- Jyotishman Pathak
Scientific Reports (2023)
Multitask joint strategies of self-supervised representation learning on biomedical networks for drug discovery
- Xiaoqi Wang
- Yingjie Cheng
- Shaoliang Peng
Nature Machine Intelligence (2023)
High-throughput target trial emulation for Alzheimer’s disease drug repurposing with real-world data
- Chengxi Zang
- Hao Zhang
- Fei Wang
Nature Communications (2023)