Abstract
Drug discovery focused on target proteins has been a successful strategy, but many diseases and biological processes lack obvious targets to enable such approaches. Here, to overcome this challenge, we describe a deep learning–based efficacy prediction system (DLEPS) that identifies drug candidates using a change in the gene expression profile in the diseased state as input. DLEPS was trained using chemically induced changes in transcriptional profiles from the L1000 project. We found that the changes in transcriptional profiles for previously unexamined molecules were predicted with a Pearson correlation coefficient of 0.74. We examined three disorders and experimentally tested the top drug candidates in mouse disease models. Validation showed that perillen, chikusetsusaponin IV and trametinib confer disease-relevant impacts against obesity, hyperuricemia and nonalcoholic steatohepatitis, respectively. DLEPS can generate insights into pathogenic mechanisms, and we demonstrate that the MEK–ERK signaling pathway is a target for developing agents against nonalcoholic steatohepatitis. Our findings suggest that DLEPS is an effective tool for drug repurposing and discovery.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout






Data availability
The training and test data for DLEPS were downloaded from the Gene Expression Omnibus with the accession number GSE92742. Browning markers, inflammation and fibrosis markers, and NASH markers were analyzed from ArrayExpress with accession numbers E-GEOD-8044, E-MEXP-980, E-GEOD-58979, respectively. All gene signatures are listed in Supplementary Table 3. RNA-seq data for WAT from mice treated with chikusetsusaponin IV and control mice (GSE165171), kidney from perillen-treated and control mice (GSE165173) and liver from trametinib-treated and control mice (GSE165174) are available at the Gene Expression Omnibus. Source data are provided with this paper.
Code availability
The code for DLEPS is available at https://github.com/kekegg/DLEPS. We have also set up a user-friendly online computing interface (https://www.dleps.tech/dleps/index). Commercial use of DLEPS requires a license.
References
Kong, W. et al. Prediction and optimization of NaV1.7 sodium channel inhibitors based on machine learning and simulated annealing. J. Chem. Inf. Model. 60, 2739–2753 (2020).
Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566, 224–229 (2019).
Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).
Xu, Y. et al. Deep learning for drug-induced liver injury. J. Chem. Inf. Model. 55, 2085–2093 (2015).
Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019).
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702 (2020).
Lamb, J. et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935 (2006).
Lee, J. et al. Withaferin A is a leptin sensitizer with strong antidiabetic properties in mice. Nat. Med. 22, 1023–1032 (2016).
Liu, J., Lee, J., Salazar Hernandez, M. A., Mazitschek, R. & Ozcan, U. Treatment of obesity with celastrol. Cell 161, 999–1011 (2015).
Wang, Y. et al. Multiple-purpose connectivity map analysis reveals the benefits of esculetin to hyperuricemia and renal fibrosis. Int. J. Mol. Sci. 21, 7695 (2020).
Subramanian, A. et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452 (2017).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Kusner, M. et al. Grammar variational autoencoder. Proc. 34th Int. Conf. Machine Learning, PMLR 70, 1945–1954 (2017).
Galmozzi, A. et al. ThermoMouse: an in vivo model to identify modulators of UCP1 expression in brown adipose tissue. Cell Rep. 9, 1584–1593 (2014).
Seale, P. et al. Transcriptional control of brown fat determination by PRDM16. Cell Metab. 6, 38–54 (2007).
Jalal, D. I., Chonchol, M., Chen, W. & Targher, G. Uric acid as a target of therapy in CKD. Am. J. Kidney Dis. 61, 134–146 (2013).
Puddu, P., Puddu, G. M., Cravero, E., Vizioli, L. & Muscari, A. Relationships among hyperuricemia, endothelial dysfunction and cardiovascular disease: molecular mechanisms and clinical implications. J. Cardiol. 59, 235–242 (2012).
Toda, A., Ishizaka, Y., Tani, M. & Yamakado, M. Hyperuricemia is a significant risk factor for the onset of chronic kidney disease. Nephron Clin. Pract. 126, 33–38 (2014).
Verhaeghe, C. et al. Role of IKK and ERK pathways in intrinsic inflammation of cystic fibrosis airways. Biochem. Pharmacol. 73, 1982–1994 (2007).
Ramnath, D. et al. Hepatic expression profiling identifies steatosis-independent and steatosis-driven advanced fibrosis genes. JCI Insight 3, e120274 (2018).
du Plessis, J. et al. Association of adipose tissue inflammation with histologic severity of nonalcoholic fatty liver disease. Gastroenterology 149, 635–648 (2015).
Chang, H. H., Chen, C. S. & Lin, J. Y. Dietary perilla oil inhibits proinflammatory cytokine production in the bronchoalveolar lavage fluid of ovalbumin-challenged mice. Lipids 43, 499–506 (2008).
Chen, T. et al. Perilla oil supplementation ameliorates high-fat/high-cholesterol diet induced nonalcoholic fatty liver disease in rats via enhanced fecal cholesterol and bile acid excretion. Biomed Res. Int. 2016, 2384561 (2016).
Ernst, M. E. & Fravel, M. A. Febuxostat: a selective xanthine-oxidase/xanthine-dehydrogenase inhibitor for the management of hyperuricemia in adults with gout. Clin. Ther. 31, 2503–2518 (2009).
Krakoff, I. H. & Meyer, R. L. Prevention of hyperuricemia in leukemia and lymphoma: use of allopurinol, a xanthine oxidase inhibitor. JAMA 193, 1–6 (1965).
Banini, B. A. & Sanyal, A. J. Current and future pharmacologic treatment of nonalcoholic steatohepatitis. Curr. Opin. Gastroenterol. 33, 134–141 (2017).
Sumida, Y. & Yoneda, M. Current and future pharmacological therapies for NAFLD/NASH. J. Gastroenterol. 53, 362–376 (2018).
Anstee, Q. M., Reeves, H. L., Kotsiliti, E., Govaere, O. & Heikenwalder, M. From NASH to HCC: current concepts and future challenges. Nat. Rev. Gastroenterol. Hepatol. 16, 411–428 (2019).
Eguchi, A., De Mollerat Du Jeu, X., Johnson, C. D., Nektaria, A. & Feldstein, A. E. Liver Bid suppression for treatment of fibrosis associated with non-alcoholic steatohepatitis. J. Hepatol. 64, 699–707 (2016).
Liu, X. J. et al. Characterization of a murine nonalcoholic steatohepatitis model induced by high fat high calorie diet plus fructose and glucose in drinking water. Lab. Invest. 98, 1184–1199 (2018).
Lawan, A. & Bennett, A. M. Mitogen-activated protein kinase regulation in hepatic metabolism. Trends Endocrinol. Metab. 28, 868–878 (2017).
Jiang, Y. et al. Bioactivity-guided fractionation of the traditional Chinese medicine Resina Draconis reveals loureirin B as a PAI-1 inhibitor. Evid. Based Complement. Alternat. Med. 2017, 9425963 (2017).
Liu, Y. H. et al. Loureirin B suppresses RANKL-induced osteoclastogenesis and ovariectomized osteoporosis via attenuating NFATc1 and ROS activities. Theranostics 9, 4648–4662 (2019).
Xin, Y., Yuan, Q., Liu, C., Zhang, C. & Yuan, D. MiR-155/GSK-3β mediates anti-inflammatory effect of Chikusetsusaponin IVa by inhibiting NF-kappaB signaling pathway in LPS-induced RAW264.7 cell. Sci. Rep. 10, 18303 (2020).
Zhao, C. M., Han, S., Yang, S. B. & Xin, W. D. Validation and application of a novel LC/MS/MS method for the determination of isoginkgetin in rat plasma. Biomed. Chromatogr. 33, e4667 (2019).
Shi, Y. R. et al. Imaging of pre-mRNA splicing in living subjects using a genetically encoded luciferase reporter. Biomed. Opt. Express 9, 518–528 (2018).
Si, L. et al. Triterpenoids manipulate a broad range of virus–host fusion via wrapping the HR2 domain prevalent in viral envelopes. Sci. Adv. 4, eaau8408 (2018).
Cetin, B. et al. Antimicrobial activities of essential oil and hexane extract of Florence fennel [Foeniculum vulgare var. azoricum (Mill.) Thell.] against foodborne microorganisms. J. Med. Food 13, 196–204 (2010).
Zhong, D. D. et al. Ganoderma lucidum polysaccharide peptide alleviates hepatoteatosis via modulating bile acid metabolism dependent on FXR-SHP/FGF. Cell. Physiol. Biochem. 49, 1204–1220 (2018).
Yamaguchi, T., Kakefuda, R., Tajima, N., Sowa, Y. & Sakai, T. Antitumor activities of JTP-74057 (GSK1120212), a novel MEK1/2 inhibitor, on colorectal cancer cell lines in vitro and in vivo. Int. J. Oncol. 39, 23–31 (2011).
Yamaguchi, T., Kakefuda, R., Tanimoto, A., Watanabe, Y. & Tajima, N. Suppressive effect of an orally active MEK1/2 inhibitor in two different animal models for rheumatoid arthritis: a comparison with leflunomide. Inflamm. Res. 61, 445–454 (2012).
Jing, J. P. et al. Comprehensive predictive biomarker analysis for MEK inhibitor GSK1120212. Mol. Cancer Ther. 11, 720–729 (2012).
Varga, A. et al. A first-in-human phase I study to evaluate the ERK1/2 inhibitor GDC-0994 in patients with advanced solid tumors. Clin. Cancer Res. 26, 1229–1236 (2020).
Blake, J. F. et al. Discovery of (S)-1-(1-(4-chloro-3-fluorophenyl)-2-hydroxyethyl)-4-(2-((1-methyl-1H-pyrazol-5-yl)amino)pyrimidin-4-yl)pyridin-2(1H)-one (GDC-0994), an extracellular signal-regulated kinase 1/2 (ERK1/2) inhibitor in early clinical development. J. Med. Chem. 59, 5650–5660 (2016).
Ohori, M., Takeuchi, M., Maruki, R., Nakajima, H. & Miyake, H. FR180204, a novel and selective inhibitor of extracellular signal-regulated kinase, ameliorates collagen-induced arthritis in mice. Naunyn Schmiedebergs Arch. Pharmacol. 374, 311–316 (2007).
Acknowledgements
We thank Z. Liu from PKU for discussions on structural analysis. This work was supported by the National Key R&D Program of China (2018YFA0900200 to Z.X., 2017YFC1700402 to R.Z.), NSFC (31771519 to Z.X., 81873597 to H.Z., 81870590 to R.Z.) and the Beijing Municipal Natural Science Foundation (5182012 to Z.X.). The HUA and NASH studies were supported by Beijing Gigaceuticals Tech. Co, Ltd.
Author information
Authors and Affiliations
Contributions
Z.X. (theory and algorithm); B.G., Miaomiao Gao, J.W., J.L. and R.Z. (obesity); J.Z., J.W., X.W., X.S. and N.Z. (NASH); Mingjing Gao, L.W., Y.A., Z.H. and H.Z. (HUA); Z.X., MMiaomiao Gao, Y.Y., W.K. and Z.L. (computation); Z.X., J.Z. and R.Z. wrote the paper; Z.X., R.Z., N.Z. and H.Z. co-supervised the teams of subprojects. Z.X. supervised all the teams and is the lead contact.
Corresponding authors
Ethics declarations
Competing interests
J.W., Mingjing Gao and Z.L. work for Beijing Gigaceuticals Tech. Co., Ltd.
Additional information
Peer review information Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Statistical analysis of the training data, control statistics and across-genes analysis.
a, Distribution of number of changed genes for all molecules in L1000 data. b, Distribution of mean z-score across all genes for all molecules in L1000 data. c, As control, here shows the distribution of Pearson correlation coefficient r of randomly paired predicted profiles and empirical profiles. d, The ROC-like curve of well fitted fraction versus threshold Pearson r for distribution in c). e, g, The scatter plot of empirical versus predicted changes of one gene (each subplot) over all molecules (dots in each subplot) in training set (e) and test set (g). f, h, Distribution of Pearson correlation coefficient r of predicted and empirical profiles for genes over molecules in training set (f) and in test set (h).
Extended Data Fig. 2 Statistical and structural analysis of DLEPS’ performance.
a, The distribution of maximum Tanimoto Similarity based on CDK fingerprint (CDK TS) of each test molecule among comparison with all training molecules. b, c, The distribution of Pearson correlation coefficient r of predicted versus empirical changes of transcriptional profiles (CTPs) of test molecules with CDK TS < 0.4 (b) (mean r = 0.60, peak r = 0.8) and with CDK TS > 0.4 (c) (mean r = 0.79, peak r = 0.93). d, A few well-predicted test molecules (r > 0.74) and their most similar molecules in the training set, indicating DLEPS is capable of predicting CTPs of structurally novel molecules. The Maximum Common Sub-Structures (MCSS) are highlighted in cyan. e, The distribution of Pearson correlation coefficient r of predicted versus empirical CTPs among selected molecule pairs. One molecule in these pairs is from well-predicted test set (r > 0.74, n = 2033 out of 3000) and the other one in the pair is a structurally similar molecule from the training set, with CDK TS > 0.35. The mean Pearson r equals to 0.50. f, As comparison, Pearson r for randomly permutated pairs equals to 0.07. g−i, Similarity versus correlation analysis of molecule pairs. g, Principal component analysis (PCA) of CTPs of test molecule BRD-K70918941 and its most similar molecules in training set. MCSS were highlighted in cyan for each molecule. DLEPS predicted CTP was highlighted in red. The heatmap of CDK Tanimoto similarity (h) and correlation coefficient matrix (i) of sampled pairs. j, Scatter plot of CDK TS versus correlation coefficient of CTPs, indicating that high CDK TS not necessarily yield high correlation and vice versa. k, The exemplar fragments tend to disrupt (upper) and retain (bottom) the CTPs, analyzed from the well (r > 0.80) / poorly (-0.3 < r < 0.3) correlated pair groups in e).
Extended Data Fig. 3 Chikusetsusaponin IV reduced body weight in DIO mice and results of molecules from negative set.
a, Increase of body weight (n = 6) for 8 week-old mice that were housed at 22 °C, fed a HFD and treated with Isoginkgetin (3 mg/kg), Loureirin B (1 mg/kg), Chikusetsusaponin IV (20 mg/kg) or DMSO for 2 weeks. b, Body weight change of DIO mice that were treated by Chikusetsusaponin IV (20 mg/kg) continuously for 5 weeks. The average body weight at day 0 is 55 g in both groups (n = 6). c, d, Daily and cumulative food intake (n = 6) and e, f) physical activity for mice in Fig. 3g−j. g−i, Body weight and food intake for molecules from negative set. Body weight g), increase of body weight h) and food intake i) (n = 6) for 8 week-old mice that were housed at 22 °C, fed a HFD and treated with 4 molecules from negative set (Mudanpioside C 5 mg/kg, Syringic acid 25 mg/kg, Agnuside 5 mg/kg, 13-acetyl-9-Dihydrobaccatin-III 10 mg/kg or DMSO for 2 weeks). ** P < 0.01 compared with model group. All P values were determined by one-way ANOVA. All data are presented as the mean ± sem.
Extended Data Fig. 4 Transcriptional analysis of perillen treated mice, extra DLEPS analysis and pharmacokinetic analysis of perillen.
a, Blood uric acid levels (BUA) of control, HUA model mice and HUA model mice treated with 4 molecules from negative set (Marbofloxacin, Captopril, Parecoxib and Mupirocin at 20 mg/kg, n = 6). b, Kidney index of normal, HUA model mice and perillen treated HUA model mice at 2.5, 5 and 10 mg/kg and topiroxostat treated HUA model mice (n = 6). Body weight (c) Food intake (d) and water intake (e) of mice with treatment of perillen for 7 days (c−e, n = 6). f, Principal component analysis of normal, HUA model, and HUA model perillen-treated mice (n = 3). g, Scatter plot of gene expression in HUA model versus non-induced control mice. The color gradient represents dot intensity. h, Scatter plot of gene expression in perillen treated mice versus that of HUA model mice. i, Scatter plot of slopes in h) versus that in g) (r = -0.23, P < 3e-232). j, GO analysis of upregulated genes in model mice (n = 3). k, GO analysis of downregulated genes in perillen treated model mice (n = 3). l−n, Extra analysis of anti-inflammation and fibrosis score using a NASH phase IV gene signatures (l) and hepatic steatosis gene signatures (m). Big red dot highlights perillen, indicating prediction of perillen is robust to various inflammation/fibrosis gene signatures. n, Scatter plot of the inflammation/fibrosis score in Fig. 4b versus the NASH phase IV score (r = 0.51, P < 2e-238), indicating a well correlation of these two scores. o, Chromatograms of perillen. p, The serum concentration-time curves of perillen for 4 various conditions. * P < 0.05, ** P < 0.01, **** P < 0.0001 compared with model group. ## P < 0.01, #### P < 0.0001 compared with normal group (Normal). All P values were determined by two-tailed paired t-test. All data are presented as the mean ± sem.
Extended Data Fig. 5 Histological, serum and TUNEL analysis of molecules treated MCD model mice.
8 week-old mice were housed at 22°C, received MCD diets for two weeks, and then treated with positively predicted compounds: Normilin (6 mg/kg), Lupenone (2 or 6 mg/kg), Telmisartan (10 mg/kg), Bendroflumethiazide (1.5 mg/kg), GI02002 (10 mg/kg), Ravoxertinib (1 mg/kg) in a), and with negatively predicted compounds: Butoconazole (10 mg/Kg), Benfotiamine (10 mg/kg), Menatetrenone (2.5 mg/kg), Phenacetin (70 mg/kg), GI02002 (10 mg/kg, positive control) or vehicle (0.5%CMC-Na containing 3%DMSO) in b−d) by i.p. injection for 14 days. a, H&E (hematoxylin and eosin) staining of liver (3 mice replicates). b, Serum ALT and AST level (n = 6 in MCD, Butoconazole and Phenacetin group, and n = 7 in other groups, The P values of ALT in each group compared with model group were 0.8374, 0.4412, 0.5640, 0.1975 and 0.0002 respectively. The P values of AST in each group compared with model group were 0.6609, 0.5452, 0.1093, 0.8002 and 0.0002, respectively). c, Serum CHO and TG level (n = 6 in MCD, Butoconazole and Phenacetin group, and n = 7 in other groups, The P values of CHO in each group compared with model group were 0.1014, 0.1176, 0.0958, 0.0909 and 0.0177, respectively. The P values of TG in each group compared with model group were 0.8872, 0.5317, 0.4414, 0.2618 and 0.9238, respectively). d, H&E staining of liver (upper row, 3 mice replicates). Scale bar indicates 50 μm. Oil-Red staining of liver (bottom row, 3 mice replicates). Scale bar indicates 100 μm. e, Representative images of TUNEL staining (3 mice replicates, The P values model group compared with normal group were < 0.0001 and the P values Trametinib group compared with model group were < 0.0001, respectively). Scale bar indicates 200 μm. All P values were determined by two-tailed paired t-test.* P < 0.05, *** P < 0.001, **** P < 0.0001 compared with model group (MCD). #### P < 0.0001 compared with normal group (Normal). All data are presented as the mean ± sem.
Extended Data Fig. 6 Support figures of HFD + HF/G experiments and transcriptional analysis of Trametinib treated mice.
a, The schematic of administration protocol for Trametinib in HFD + HF/G-fed mice. b, The representative image of livers from different groups. c, The liver index for different groups (n = 5 in normal group, n = 7 in HFD + HF/G group, n = 6 in Trametinib group). d, The mean body weight of different groups after treatment (n = 5 in normal group, n = 7 in HFD + HF/G group, n = 7 in Trametinib group). The food intake (e, n = 7 for each group) and the body weight (f) in different groups during drug administration. g, The schematic of administration protocol for Trametinib in MCD mice. KEGG and GO enrichment analysis of restored genes in Trametinib treatment. * P < 0.05 compared with model group. # P < 0.05, #### P < 0.0001 compared with normal group (Normal). All P values were determined by one-way ANOVA. All data are presented as the mean ± sem.
Supplementary information
Supplementary Information
Supplementary Note.
Supplementary Table 1
Fragment analysis of well and poorly correlated CTPs of molecule pairs.
Supplementary Table 2
Small-molecule list of libraries D3680 and, D11294 and ERK inhibitors.
Supplementary Table 3
Up/downregulated gene list for obesity, HUA and NASH.
Supplementary Table 4
Pharmacokinetic parameters of perillen.
Supplementary Table 5
Primers for quantitative PCR analysis.
Source data
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Fig. 5
Statistical source data.
Source Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 3
Statistical source data.
Source Data Extended Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 6
Statistical source data.
Rights and permissions
About this article
Cite this article
Zhu, J., Wang, J., Wang, X. et al. Prediction of drug efficacy from transcriptional profiles with deep learning. Nat Biotechnol 39, 1444–1452 (2021). https://doi.org/10.1038/s41587-021-00946-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41587-021-00946-z
This article is cited by
-
Identification of potent antimicrobial peptides via a machine-learning pipeline that mines the entire space of peptide sequences
Nature Biomedical Engineering (2023)
-
Interrogating mechanisms of liver fibrosis with omics
Nature Reviews Gastroenterology & Hepatology (2022)
-
The landscape of aging
Science China Life Sciences (2022)