Abstract
Drug discovery focused on target proteins has been a successful strategy, but many diseases and biological processes lack obvious targets to enable such approaches. Here, to overcome this challenge, we describe a deep learning–based efficacy prediction system (DLEPS) that identifies drug candidates using a change in the gene expression profile in the diseased state as input. DLEPS was trained using chemically induced changes in transcriptional profiles from the L1000 project. We found that the changes in transcriptional profiles for previously unexamined molecules were predicted with a Pearson correlation coefficient of 0.74. We examined three disorders and experimentally tested the top drug candidates in mouse disease models. Validation showed that perillen, chikusetsusaponin IV and trametinib confer disease-relevant impacts against obesity, hyperuricemia and nonalcoholic steatohepatitis, respectively. DLEPS can generate insights into pathogenic mechanisms, and we demonstrate that the MEK–ERK signaling pathway is a target for developing agents against nonalcoholic steatohepatitis. Our findings suggest that DLEPS is an effective tool for drug repurposing and discovery.
Data availability
The training and test data for DLEPS were downloaded from the Gene Expression Omnibus with the accession number GSE92742. Browning markers, inflammation and fibrosis markers, and NASH markers were analyzed from ArrayExpress with accession numbers E-GEOD-8044, E-MEXP-980, E-GEOD-58979, respectively. All gene signatures are listed in Supplementary Table 3. RNA-seq data for WAT from mice treated with chikusetsusaponin IV and control mice (GSE165171), kidney from perillen-treated and control mice (GSE165173) and liver from trametinib-treated and control mice (GSE165174) are available at the Gene Expression Omnibus. Source data are provided with this paper.
Code availability
The code for DLEPS is available at https://github.com/kekegg/DLEPS. We have also set up a user-friendly online computing interface (https://www.dleps.tech/dleps/index). Commercial use of DLEPS requires a license.
Acknowledgements
We thank Z. Liu from PKU for discussions on structural analysis. This work was supported by the National Key R&D Program of China (2018YFA0900200 to Z.X., 2017YFC1700402 to R.Z.), NSFC (31771519 to Z.X., 81873597 to H.Z., 81870590 to R.Z.) and the Beijing Municipal Natural Science Foundation (5182012 to Z.X.). The HUA and NASH studies were supported by Beijing Gigaceuticals Tech. Co, Ltd.
Ethics declarations
Competing interests
J.W., Mingjing Gao and Z.L. work for Beijing Gigaceuticals Tech. Co., Ltd.
Extended data
Extended Data Fig. 1 Statistical analysis of the training data, control statistics and across-genes analysis.
a, Distribution of number of changed genes for all molecules in L1000 data. b, Distribution of mean z-score across all genes for all molecules in L1000 data. c, As control, here shows the distribution of Pearson correlation coefficient r of randomly paired predicted profiles and empirical profiles. d, The ROC-like curve of well fitted fraction versus threshold Pearson r for distribution in c). e, g, The scatter plot of empirical versus predicted changes of one gene (each subplot) over all molecules (dots in each subplot) in training set (e) and test set (g). f, h, Distribution of Pearson correlation coefficient r of predicted and empirical profiles for genes over molecules in training set (f) and in test set (h).
Extended Data Fig. 2 Statistical and structural analysis of DLEPS’ performance.
a, The distribution of maximum Tanimoto Similarity based on CDK fingerprint (CDK TS) of each test molecule among comparison with all training molecules. b, c, The distribution of Pearson correlation coefficient r of predicted versus empirical changes of transcriptional profiles (CTPs) of test molecules with CDK TS < 0.4 (b) (mean r = 0.60, peak r = 0.8) and with CDK TS > 0.4 (c) (mean r = 0.79, peak r = 0.93). d, A few well-predicted test molecules (r > 0.74) and their most similar molecules in the training set, indicating DLEPS is capable of predicting CTPs of structurally novel molecules. The Maximum Common Sub-Structures (MCSS) are highlighted in cyan. e, The distribution of Pearson correlation coefficient r of predicted versus empirical CTPs among selected molecule pairs. One molecule in these pairs is from well-predicted test set (r > 0.74, n = 2033 out of 3000) and the other one in the pair is a structurally similar molecule from the training set, with CDK TS > 0.35. The mean Pearson r equals to 0.50. f, As comparison, Pearson r for randomly permutated pairs equals to 0.07. g−i, Similarity versus correlation analysis of molecule pairs. g, Principal component analysis (PCA) of CTPs of test molecule BRD-K70918941 and its most similar molecules in training set. MCSS were highlighted in cyan for each molecule. DLEPS predicted CTP was highlighted in red. The heatmap of CDK Tanimoto similarity (h) and correlation coefficient matrix (i) of sampled pairs. j, Scatter plot of CDK TS versus correlation coefficient of CTPs, indicating that high CDK TS not necessarily yield high correlation and vice versa. k, The exemplar fragments tend to disrupt (upper) and retain (bottom) the CTPs, analyzed from the well (r > 0.80) / poorly (-0.3 < r < 0.3) correlated pair groups in e).
Extended Data Fig. 3 Chikusetsusaponin IV reduced body weight in DIO mice and results of molecules from negative set.
a, Increase of body weight (n = 6) for 8 week-old mice that were housed at 22 °C, fed a HFD and treated with Isoginkgetin (3 mg/kg), Loureirin B (1 mg/kg), Chikusetsusaponin IV (20 mg/kg) or DMSO for 2 weeks. b, Body weight change of DIO mice that were treated by Chikusetsusaponin IV (20 mg/kg) continuously for 5 weeks. The average body weight at day 0 is 55 g in both groups (n = 6). c, d, Daily and cumulative food intake (n = 6) and e, f) physical activity for mice in Fig. 3g−j. g−i, Body weight and food intake for molecules from negative set. Body weight g), increase of body weight h) and food intake i) (n = 6) for 8 week-old mice that were housed at 22 °C, fed a HFD and treated with 4 molecules from negative set (Mudanpioside C 5 mg/kg, Syringic acid 25 mg/kg, Agnuside 5 mg/kg, 13-acetyl-9-Dihydrobaccatin-III 10 mg/kg or DMSO for 2 weeks). ** P < 0.01 compared with model group. All P values were determined by one-way ANOVA. All data are presented as the mean ± sem.
Extended Data Fig. 4 Transcriptional analysis of perillen treated mice, extra DLEPS analysis and pharmacokinetic analysis of perillen.
a, Blood uric acid levels (BUA) of control, HUA model mice and HUA model mice treated with 4 molecules from negative set (Marbofloxacin, Captopril, Parecoxib and Mupirocin at 20 mg/kg, n = 6). b, Kidney index of normal, HUA model mice and perillen treated HUA model mice at 2.5, 5 and 10 mg/kg and topiroxostat treated HUA model mice (n = 6). Body weight (c) Food intake (d) and water intake (e) of mice with treatment of perillen for 7 days (c−e, n = 6). f, Principal component analysis of normal, HUA model, and HUA model perillen-treated mice (n = 3). g, Scatter plot of gene expression in HUA model versus non-induced control mice. The color gradient represents dot intensity. h, Scatter plot of gene expression in perillen treated mice versus that of HUA model mice. i, Scatter plot of slopes in h) versus that in g) (r = -0.23, P < 3e-232). j, GO analysis of upregulated genes in model mice (n = 3). k, GO analysis of downregulated genes in perillen treated model mice (n = 3). l−n, Extra analysis of anti-inflammation and fibrosis score using a NASH phase IV gene signatures (l) and hepatic steatosis gene signatures (m). Big red dot highlights perillen, indicating prediction of perillen is robust to various inflammation/fibrosis gene signatures. n, Scatter plot of the inflammation/fibrosis score in Fig. 4b versus the NASH phase IV score (r = 0.51, P < 2e-238), indicating a well correlation of these two scores. o, Chromatograms of perillen. p, The serum concentration-time curves of perillen for 4 various conditions. * P < 0.05, ** P < 0.01, **** P < 0.0001 compared with model group. ## P < 0.01, #### P < 0.0001 compared with normal group (Normal). All P values were determined by two-tailed paired t-test. All data are presented as the mean ± sem.
Extended Data Fig. 5 Histological, serum and TUNEL analysis of molecules treated MCD model mice.
8 week-old mice were housed at 22°C, received MCD diets for two weeks, and then treated with positively predicted compounds: Normilin (6 mg/kg), Lupenone (2 or 6 mg/kg), Telmisartan (10 mg/kg), Bendroflumethiazide (1.5 mg/kg), GI02002 (10 mg/kg), Ravoxertinib (1 mg/kg) in a), and with negatively predicted compounds: Butoconazole (10 mg/Kg), Benfotiamine (10 mg/kg), Menatetrenone (2.5 mg/kg), Phenacetin (70 mg/kg), GI02002 (10 mg/kg, positive control) or vehicle (0.5%CMC-Na containing 3%DMSO) in b−d) by i.p. injection for 14 days. a, H&E (hematoxylin and eosin) staining of liver (3 mice replicates). b, Serum ALT and AST level (n = 6 in MCD, Butoconazole and Phenacetin group, and n = 7 in other groups, The P values of ALT in each group compared with model group were 0.8374, 0.4412, 0.5640, 0.1975 and 0.0002 respectively. The P values of AST in each group compared with model group were 0.6609, 0.5452, 0.1093, 0.8002 and 0.0002, respectively). c, Serum CHO and TG level (n = 6 in MCD, Butoconazole and Phenacetin group, and n = 7 in other groups, The P values of CHO in each group compared with model group were 0.1014, 0.1176, 0.0958, 0.0909 and 0.0177, respectively. The P values of TG in each group compared with model group were 0.8872, 0.5317, 0.4414, 0.2618 and 0.9238, respectively). d, H&E staining of liver (upper row, 3 mice replicates). Scale bar indicates 50 μm. Oil-Red staining of liver (bottom row, 3 mice replicates). Scale bar indicates 100 μm. e, Representative images of TUNEL staining (3 mice replicates, The P values model group compared with normal group were < 0.0001 and the P values Trametinib group compared with model group were < 0.0001, respectively). Scale bar indicates 200 μm. All P values were determined by two-tailed paired t-test.* P < 0.05, *** P < 0.001, **** P < 0.0001 compared with model group (MCD). #### P < 0.0001 compared with normal group (Normal). All data are presented as the mean ± sem.
Extended Data Fig. 6 Support figures of HFD + HF/G experiments and transcriptional analysis of Trametinib treated mice.
a, The schematic of administration protocol for Trametinib in HFD + HF/G-fed mice. b, The representative image of livers from different groups. c, The liver index for different groups (n = 5 in normal group, n = 7 in HFD + HF/G group, n = 6 in Trametinib group). d, The mean body weight of different groups after treatment (n = 5 in normal group, n = 7 in HFD + HF/G group, n = 7 in Trametinib group). The food intake (e, n = 7 for each group) and the body weight (f) in different groups during drug administration. g, The schematic of administration protocol for Trametinib in MCD mice. KEGG and GO enrichment analysis of restored genes in Trametinib treatment. * P < 0.05 compared with model group. # P < 0.05, #### P < 0.0001 compared with normal group (Normal). All P values were determined by one-way ANOVA. All data are presented as the mean ± sem.
