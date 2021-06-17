a, The distribution of maximum Tanimoto Similarity based on CDK fingerprint (CDK TS) of each test molecule among comparison with all training molecules. b, c, The distribution of Pearson correlation coefficient r of predicted versus empirical changes of transcriptional profiles (CTPs) of test molecules with CDK TS < 0.4 (b) (mean r = 0.60, peak r = 0.8) and with CDK TS > 0.4 (c) (mean r = 0.79, peak r = 0.93). d, A few well-predicted test molecules (r > 0.74) and their most similar molecules in the training set, indicating DLEPS is capable of predicting CTPs of structurally novel molecules. The Maximum Common Sub-Structures (MCSS) are highlighted in cyan. e, The distribution of Pearson correlation coefficient r of predicted versus empirical CTPs among selected molecule pairs. One molecule in these pairs is from well-predicted test set (r > 0.74, n = 2033 out of 3000) and the other one in the pair is a structurally similar molecule from the training set, with CDK TS > 0.35. The mean Pearson r equals to 0.50. f, As comparison, Pearson r for randomly permutated pairs equals to 0.07. g−i, Similarity versus correlation analysis of molecule pairs. g, Principal component analysis (PCA) of CTPs of test molecule BRD-K70918941 and its most similar molecules in training set. MCSS were highlighted in cyan for each molecule. DLEPS predicted CTP was highlighted in red. The heatmap of CDK Tanimoto similarity (h) and correlation coefficient matrix (i) of sampled pairs. j, Scatter plot of CDK TS versus correlation coefficient of CTPs, indicating that high CDK TS not necessarily yield high correlation and vice versa. k, The exemplar fragments tend to disrupt (upper) and retain (bottom) the CTPs, analyzed from the well (r > 0.80) / poorly (-0.3 < r < 0.3) correlated pair groups in e).