Integrating gene expression and clinical data to identify drug repurposing candidates for hyperlipidemia and hypertension

Discovering novel uses for existing drugs, through drug repurposing, can reduce the time, costs, and risk of failure associated with new drug development. However, prioritizing drug repurposing candidates for downstream studies remains challenging. Here, we present a high-throughput approach to identify and validate drug repurposing candidates. This approach integrates human gene expression, drug perturbation, and clinical data from publicly available resources. We apply this approach to find drug repurposing candidates for two diseases, hyperlipidemia and hypertension. We screen >21,000 compounds and replicate ten approved drugs. We also identify 25 (seven for hyperlipidemia, eighteen for hypertension) drugs approved for other indications with therapeutic effects on clinically relevant biomarkers. For five of these drugs, the therapeutic effects are replicated in the All of Us Research Program database. We anticipate our approach will enable researchers to integrate multiple publicly available datasets to identify high priority drug repurposing opportunities for human diseases.


Description of Additional Supplementary Files
File Name: Supplementary Data 1 Description: Known disease-associated genes found in disease gene expression signatures. Columns are disease, gene, target_name, gene_expression_change, source, interaction_type, drug_name, and drug_type.
File Name: Supplementary Data 2 Description: S-PrediXcan estimated transcriptomic signatures used to query iLINCS for hyperlipidemia drug repurposing candidates, K = 50. Columns are disease, gene, zscore, and pval. P values are from two-tailed association (elastic net) tests between S-PrediXcan predicted gene expression variation and LDL-C levels. 1 -Abbreviations. iLINCS: Integrative Library of Integrated Network-based Cellular Signatures; LDL-C: low-density lipoprotein cholesterol.
File Name: Supplementary Data 3 Description: S-PrediXcan estimated transcriptomic signatures used to query iLINCS for hyperlipidemia drug repurposing candidates, FDR (q < 0.05). Columns are disease, gene, zscore, pval, and qval. P values are from two-tailed association (elastic net) tests between S-PrediXcan predicted gene expression variation and LDL-C levels. 1 -Abbreviations. iLINCS: Integrative Library of Integrated Network-based Cellular Signatures; LDL-C: low-density lipoprotein cholesterol.
File Name: Supplementary Data 4 Description: S-PrediXcan estimated transcriptomic signatures used to query iLINCS for hypertension drug repurposing candidates, K = 50. Columns are disease, gene, zscore, and pval. P values are from two-tailed association (elastic net) tests between S-MultiXcan predicted gene expression variation and SBP readings. 2 -Abbreviations. iLINCS: Integrative Library of Integrated Network-based Cellular Signatures; SBP: systolic blood pressure.
File Name: Supplementary Data 5 Description: S-PrediXcan estimated transcriptomic signatures used to query iLINCS for hypertension drug repurposing candidates, FDR (q < 0.05). Columns are disease, gene, zscore, pval, and qval. P values are from two-tailed association (elastic net) tests between S-MultiXcan predicted gene expression variation and SBP readings. 2 -Abbreviations. iLINCS: Integrative Library of Integrated Network-based Cellular Signatures; SBP: systolic blood pressure.
File Name: Supplementary Data 6 Description: Aggregated iLINCS drug repurposing candidate list for hyperlipidemia. Columns are disease, signatureid, drug, concentration, tissue, time, concordance, pval. P values are from two-tailed weighted Pearson correlation tests between S-PrediXcan predicted gene expression levels and drug-perturbation induced gene expression changes from iLINCS. 3 -Abbreviations. iLINCS: Integrative Library of Integrated Network-based Cellular Signatures.
File Name: Supplementary Data 7 Description: Aggregated iLINCS drug repurposing candidate list for hypertension. Columns are disease, drug, correlation, zscore, pval. P values are from two-tailed Empirical Bayes weighted t-tests. 4 -Abbreviations. iLINCS: Integrative Library of Integrated Network-based Cellular Signatures.
File Name: Supplementary Data 8 Description: Cohort selection numbers. Columns are: "Source", "Drug", "Disease", "Exposed to drug repurposing candidate in outpatient setting and had ≥1 outpatient biomarker measurements within one year after index date", "In the EHR, did not have evidence for continued exposure to drug repurposing candidate after 30 day induction period", "At start of observation period, was not ≥ 18 y or < 90 y", "Did not have ≥1 outpatient biomarker measurements during both baseline and treatment periods", "Exposed to known FDA-approved drug for disease of interest (with exception if drug being tested is FDA-approved for disease of interest)", and "Final clinical validation cohort".  -"pct_black": Percentage of cohort who were black. -"obs_period_length_median": Observation period length, days; median. -"obs_period_length_iqr": Observation period length, days; interquartile range. -"treatment_period_length_median": Treatment period length, days; median. -"treatment_period_length_iqr": Treatment period length, days; interquartile range. -"*": If there were less than twenty individuals in "n_female", "n_white", "n_black", then values in all three columns and their associated percentages were suppressed to protect individual privacy.
Column definitions: -"source": Source of data, either Vanderbilt (i.e., VUMC SD) or All of Us.
-"drug": Name of drug tested in clinical validation study.
-"pval": Two-tailed P values were calculated using Wilcoxon signed rank tests to identify statistically significant differences between baseline and treatment periods.
Column definitions: -"source": Source of data, either Vanderbilt (i.e., VUMC SD) or All of Us.
-"drug": Name of drug tested in clinical validation study.
-"n": Total number of individuals in cohort.
-"baseline_count": Number of individuals in cohort with comorbidity during baseline period.
-"treatment_count": Number of individuals in cohort with comorbidity during treatment period.
-If there were less than twenty individuals in either "baseline_count" or "treatment_count" columns, then the statistics were not reported.  Column definitions: -"source": Source of data, either Vanderbilt (i.e., VUMC SD) or All of Us.
-"drug": Name of drug tested in clinical validation study.
-"biomarker": Identity of biomarker, either LDL Cholesterol or Systolic Blood Pressure. -"n": Number of total individuals in cohort.
-"sig_50" and "sig_fdr": "yes" means that the drug was found using the K = 50 genes and commonly used FDR metric (q < 0.05) to generate disease gene signatures, and "no" means that the drug was not found using the respective approach.