Central nervous system (CNS) disorders are the leading cause of disability worldwide. There is a substantial amount of research dedicated to understanding the molecular, cellular, and neurobiological underpinnings of CNS disorders, but integration of these lines of inquiry for drug repurposing is lacking. The present challenges for drug discovery for brain disorders, as summarized by the Institute Of Medicine’s Forum On Neuroscience And Nervous System Disorders, can be broadly categorized into two domains: regulatory affairs and knowledge discovery [1]. At the regulatory domain, the failure of late-stage clinical trials for neuropsychiatric disorders is disproportionately high. According to the Food and Drug Administration (2013), only 2 out of 27 (7%) recently approved drugs had indications for CNS disorders [1, 2]. Overall drug development for CNS disorders is associated with high risk of failure, inordinate time to reach the market, and financial cost greater than the motivation to treat a large population in need. In effect, large pharmaceutical companies are led to withdraw investments from the development of therapeutics for CNS disorders. This regulatory challenge is tightly coupled to the second barrier to drug discovery, the knowledge domain. Amidst the complexity of the brain, the pathogenesis, and etiology of CNS disorders is not well-characterized: there is a lack of biomarkers, molecular targets, and appropriate models, whether it be in vivo or in vitro, to fully recapitulate the disorders in question [3]. While we wait for the promise of modern neuroscience, genome-based technologies and large consortia of “omics” data provide an opportunity to strengthen therapeutic options for CNS disorders, and in particular for psychiatric diseases, through drug repurposing approaches.

Drug repurposing is the practice of identifying (Fig. 1) and redeveloping a known drug for a new indication (Table 1). As the repurposed drug has already been approved by the Food and Drug Administration (FDA), the economic burden of testing for tolerability, safety, side effects, efficacy, and the regulatory affairs associated with phase I trials is diminished, thereby making it a cost-effective approach. While a new therapeutic would need extensive characterization of the aforementioned parameters, the repurposed drug would only need to be characterized for the new indication of interest [4, 5]. Combining drug repurposing with more sophisticated characterization of receptors, pathways, and effectors in the CNS provides an efficient opportunity to identify unexplored disease mechanisms and address the challenges associated with the knowledge domain shortfall described by the Institute of Medicine [1].

Fig. 1: Chronology of drug repurposing approaches.
figure 1

The thickness of each circle depicts the relative expansion of each approach in terms of the number of structure or signature profiles available and potential to identify new candidate. Various data integration approaches can be used to analyze structure and signature-based data to identify new therapeutic indication and knowledge mining efforts.

Table 1 Glossary of relevant bioinformatic terms.

While appealing, there are some significant regulatory and economic barriers to repurposing. At regulatory level, a repurposed drug needs a utility patents covering method of use (MOU) or composition of matter (COM) related discovery [6]. A MOU patents cover products that change the mode of drug delivery or drugs applied to a new disease while COM patents cover a wider array of modifications including changes in the active ingredient of the drug, structural changes, novel formulations changing dosing, new route of active ingredient delivery and combination therapies with repurposed compounds [7,8,9]. At economic level, academic researchers (for example) may secure a non-dilutive grant (equity is not shared) from a government agencies [10]. In contrast, companies investing in a repurposed drug typically rely on highly competitive, scarce, and dilutive investment from venture capitalist and/or large pharmaceutical companies [11]. In this scenario, if the drug is generic or has limited patent life, the financial outlook for developing a new indication may be poor, discouraging venture capital from investing in repurposing ventures [10]. Collaborative initiatives to combine expertise, data, and resources to reduce the risk of drug development is required between industry leaders and academic researchers [12].

There is promise for new pharmacologic therapies to be implemented for a number of CNS disorders and the source of these new therapies may already be present in the field [13]. For example, within the realm of psychiatric medicine, a recent study suggested that drugs originally designed to treat a psychiatric disorder may be repurposed for other CNS disorders [13]. A total of 77% of source drugs (i.e. new drugs approved by the FDA) originally indicated for a psychiatric disorder have been repurposed three or more times [13]. Repurposing for Alzheimer’s disease has occurred the most, followed by substance use disorders, bipolar disorder, depression, neuropathy, multiple sclerosis, and schizophrenia. Such repurposing efforts have been largely empirical (Table 2), if not accidental, suggesting a single drug has multiple targets and can crossover into different systems for seemingly disparate conditions. This promiscuity has been exploited to bolster drug-repurposing efforts in psychiatry.

Table 2 Empirically repurposed drugs for CNS disorders.

Examples of empiric drug repurposing

The most successful examples of empiric drug repurposing in psychiatry include repurposing of valproic acid and ketamine (Table 2). Valproic acid has anticonvulsant properties and shortly after its initial characterization for epilepsy, a therapeutic indication was established for bipolar disorder [14, 15]. The FDA approved divalproex sodium, a modified valproic acid, for the mania phase of bipolar disorder [16, 17]. Though evidence suggests that valproic acid derivates are more tolerable than other bipolar medications such as carbamazepine or olanzapine [18], it is not currently a first-line drug due to potential side- and adverse-effects including teratogenicity, weight gain, hyperandrogenism, and tremor [19,20,21]. Besides its repurposing for bipolar disorder, valproic acid is still commonly used as an anti-epileptic [22].

Ketamine, a dissociative anesthetic, was recently repurposed for its rapid anti-depressant effects [23]. Esketamine, a ketamine enantiomer, was recently approved by the FDA as a nasal-spray for treatment-resistant depression [14, 15]. Due to its serious side-effects, including dissociation and abuse liability, the prescription of esketamine is under strict regulation [24]. However, various open-label case studies and randomized controlled trails have indicated that ketamine serves as an effective therapeutic for treatment-resistant depression [25]. This is an important group to target as 20% of individuals who receive standard pharmacological therapy for depression do not respond, rendering them treatment resistant [26, 27].

Major depressive disorder is largely considered a disorder of monoaminergic dysregulation [28]. However, ketamine was originally classified as an NMDA-subtype glutamate receptor antagonist, leading to investigation of glutamatergic mechanisms in MDD [29]. The most recent evidence suggests ketamine may be acting through opioid-receptor antagonism [30, 31]. In the context of drug repurposing, ketamine’s repurposed use has led to rethink the standard neurochemical hypotheses of MDD, an example of knowledge discovery following empirical drug repurposing.

Examples of bioinformatics-based drug repurposing

In the past five years, with the advent of omics technologies, more sophisticated methods have been used to identify drugs for repurposing in psychiatric medicine [32]. One promising approach involves transcriptional profiling of cell lines after pharmacologic exposure, generating a drug signature that provides mechanistic clues for the downstream effects of the drug. A repository of such signatures can be readily used to repurpose drugs for new indications. Two notable examples using this approach pertaining to depression [33] and Alzheimer’s dementia (AD) [34] are discussed below.

With the goal of identifying new MDD treatments, human hippocampal neural progenitor cells were treated with escitalopram, a selective-serotonin reuptake inhibitor, or nortriptyline, a tricyclic antidepressant [33]. Changes in cellular transcriptomes were characterized and used to develop an “antidepressant” transcriptional signature [35, 36]. Using transcriptomic based drug-repurposing approaches, the following candidates were identified: Clomipramine (a tricyclic antidepressant), W-7 (an intracellular calmodulin antagonist), and vorionstat (a histone deacetylase inhibitor). One of these drug candidates confirms this methodology, while the other two provide possible new leads and mechanisms of action that may be explored for the treatment of MDD.

Following mining of the Gene Expression Omnibus (GEO) database for transcriptomic signatures of AD from postmortem brain and animal models, bioinformatics approaches were used to identify compounds with therapeutic potential [34]. Treatment of induced pluripotent stem cells differentiated into adult cortical neurons with candidate compounds from these bioinformatics analyses led to identification of pathways related to bioenergetics. These results support a role for impaired mitochondria in the pathogenesis of AD, yielding candidate drugs that may be exploited as therapeutic agents for AD [37, 38].

The past success of drug repurposing in psychiatry, both through empiric observation (Table 2) and bioinformatics approaches, encourages greater investment in high-throughput, systematic approaches for drug discovery. Transcriptomic approaches, for instance, can be implemented with increasing ease due to their low cost, public availability of datasets, and ability to incorporate advanced network and systems biology approaches for causal ontology-based drug repurposing. Thus, in this review we emphasize transcriptional profiling as a major new development for drug repurposing efforts. We also provide an overview of different approaches, resources, and data integration methods utilized to repurpose drugs that may be particularly relevant for psychiatric disorders. Present limitations and other potential avenues where drug-repurposing approaches can be deployed will also be discussed.

Approaches for drug repurposing

Omics data can be used for systematic in silico repurposing through structure- and/or signature-based methodologies. Structure-based methods take advantage of resolved (or modeled) 3D structures of relevant protein targets and aim to identify potential drug-like modulators by assessing shape complementarity and strength of binding between the target and ligands [39]. Signature-based methods use the transcriptomic “fingerprints” of disease states and animal models, as well as the effects of drugs on in vitro substrates including organoids and cell lines [40]. Use of structure-based approaches dates back the discovery of X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy; tools that determine the two- or three-dimensional structure of small molecules and their biological targets. These structure-based technologies have been a cornerstone of drug repurposing in medicinal chemistry [41]. In contrast, transcriptomic signature-based drug discovery is a more recent approach that classifies drugs based on their transcriptomic signatures [42]. Both approaches largely rely on public databases of drug structure and transcriptional signatures. These data may inform drug repurposing by incorporating information related to precise structure/signature-and-function relationships. In this section we discuss these two in silico methods with specific emphasis on the signature-based approach and the development of data science methods for its advancement. A summary of these different approaches and their respective tools is summarized in Table 3.

Table 3 Glossary of databases and software applications.

Structure-based approaches

Structure-based approaches for repurposing rely on the principle of shape complementarity between the target protein and candidate molecule, and thus require the 3D structure of the target protein be resolved either by X-ray crystallography, NMR spectroscopy, or computational techniques, such as homology modeling [41]. With rapid advances in structural biology, 3D structures for most druggable proteins are available from the Protein Data Bank [43] or databases of pre-computed models, such as ModBase [44]. Computational docking and virtual screening approaches can then estimate the shape complementarity and binding affinity between the target and a ligand [45]. A typical docking simulation is performed for a large number of candidate compounds in order to rank them according to their predicted binding affinity and select top hits for further assessment and validation. This approach can accelerate drug repurposing by facilitating rapid identification of lead candidates by using a library of drug molecules, such as national cancer institutes (NCI) drug dictionary [46], library of integrated network-based cellular signature (LINCS) [47], ZINC (a non-commercial database of commercially available compounds for virtual screening) [48]. To speed up the computation, a docking program requires a computational cluster, or a distributed computing platform and virtual screening pipelines integrated with chemoinformatic analyses.

Conceptually, docking simulations involve two main components: sampling and scoring [49, 50]. Sampling algorithms are used to find plausible conformations of the receptor-ligand complex, while scoring functions are required to estimate relative binding affinities and rank ligand poses (conformations of the receptor-ligand complex) [51]. This search through the space of possible conformations of the receptor-ligand complex can be computationally expensive and involves using various optimization techniques, such as a Monte Carlo simulation, simulated annealing, or genetic algorithms [52]. In order to provide the basis for scoring and ranking, atomic force fields and simplified solvation potentials are typically combined into empirical scoring functions that introduce many approximations to describe both intra- and inter-molecular interactions in the system, as well as to estimate the strength of interactions between the ligand and receptor [53]. As a result, different scoring functions may introduce distinct biases that have to be taken into account when selecting one of the available docking methods [54].

Stimulated by both methodological advances and fast changes in computing architectures, docking methods have improved considerably over the last decade. There are many software packages now available for computational docking (of small molecules) that may be used for docking simulations and drug repurposing, including AutoDock [55], DOCK [56], Glide [57], and RosettaDock [58]. Benchmarking of docking packages suggests that no single method consistently outperforms other approaches [59]. Therefore, different targets may require different combinations of methods, potentially enhanced by re-scoring approaches, including those using transcriptional signature-based approaches, to further limit significant false positive and false negative rates observed in docking studies [60]. Considering a consensus approach utilizing multiple programs offers a viable strategy to more reliably identify candidate drugs that are true binders of specific targets [59].

Examples of structure-based approaches

Many drugs have been repurposed for CNS disorder using virtual screening and molecular docking-based approaches [61]. Notably, the repurposed drugs in most incidences have polypharmacological interaction with multiple drug targets [3]. Interestingly, G-coupled receptors of amine family (for example dopamine, serotonin) are more promiscuous and have been target for most structure-based drug repurposing [3]. Few successful examples include Ropinirole, a D2 receptor agonist with initial indication for hypertension, repurposed for Parkinson’s disease [62, 63], and Mecamylamine, a Nicotinic receptor antagonist with initial indication for hypertension, repurposed for Depression [64,65,66].

Signature-based approaches

Disease or drug-mediated alterations in mRNA expression can be used to define unique molecular signatures (Table 1) [40]. A compilation of molecular signatures allows for the selection of drugs for a particular disease signature by way of the signature reversion principal, which assumes that if a drug-induced transcriptional signature is similar or dissimilar to a disease signature, then that drug may restore or reverse the disease phenotype, respectively [67].

Disease-associated transcriptomic data are either self-generated in laboratories or are readily available from public repositories such as the National Center for Biotechnology Information (NCBI), European Bioinformatics Institute (EBI), or the DNA Data Bank of Japan (DDBJ). However, the data for drug-induced transcriptomic responses is not readily available and is complicated by several factors. On the clinical side, there is a paucity of systematic studies testing the effect of drugs and doses in the human subjects. In addition, there is a lack of drug treatment associated demographics in the available omics datasets. Further, on the pre-clinical and experimental side, there is a huge cost involved in generating transcriptomic profiles of millions of molecules, and the lack of standardized animal models and doses in which the drugs are tested precludes meaningful analyses. Circumventing these issues requires a systematic and cost-effective approach and repository to generate, assemble, and analyze such parameters across drugs.

The Broad Institute first piloted a gene expression profile compendium of pharmacological perturbagens, leading to the generation of the connectivity-map (CMap) database (Table 1) [68]. Initially, CMap was designed to utilize Affymetrix GeneChip microarrays to generate transcriptome-wide signatures by testing 164 small molecules on four cancer cell-lines (MCF7, PC3, HL60, and SKMEL5) [42]. To scale up the workflow and more deeply characterize a given perturbagen’s cellular signature, a new assay, the L1000, was developed [69]. The assay utilizes an optically addressed microsphere and a flow-cytometric detection system to measure gene expression levels of 978 landmark genes (nearly 1000, thus the name L1000) along with 80 control genes with invariable expression across different experiments [69].

The landmark genes provide a reduced representation of the full transcriptome derived from 12,031 gene expression profile available at GEO. The L1000 landmark genes were used to infer the expression of 11,350 other genes of the remaining transcriptome [70]. The inferred genes had high degree of similarity with the profile generated from RNAseq. The assay dropped the cost of generating the transcriptomic profile to approximately two dollars per sample and propelled the expansion of signature generation [70]. Now called the Library of Integrated Network-Based Cellular Signatures (LINCS), there are over one million L1000 profiles for more than 19,000 small molecules in >100 cell lines. Each small molecule is profiled in triplicate following treatment for 6 through 24 h [70]. A number of FDA approved small molecules (~2500/19,000) with known mechanism of action, protein targets, pharmacological classification, and clinical indication were profiled in nine cancer cell lines (Table 4). Data from these molecules, called “Touchstone,” are used as a reference to functionally annotate users query signatures by correlating implicated genes and pathways. The remaining uncharacterized small molecules profiled on >100 cell lines are termed “Discovery” datasets and may be annotated by connecting them with the Touchstone dataset. More L1000 transcriptome data is actively being generated from six different LINCS centers [71], as well as phosphoproteins signatures from the same cell culture studies (called P100) [47].

Table 4 Primary cell lines in the LINCS database.

The standard approach for utilizing LINCS resources involves using disease- or phenotype-associated differentially expressed genes as a query against the drug signatures. Connectivity (or similarity) scores are then computed based on nonparametric, rank-based, pattern-matching algorithms [72], which identify a disease-inducing or therapeutic drug based on drug-disease similarity or dissimilarity, respectively. This connectivity score reflects the extent to which a drug modifies a disease-associated gene, without necessarily considering the magnitude of that change. To mitigate this issue, connectivity scores are normalized to account for global differences in cell and perturbation type. In addition, to compare the observed normalized connectivity score to all others in the LINCS repository, a percentile score ranging from −100 to +100 (called tau) is calculated. Since the reference to calculate the percentile is fixed, the tau score can be used to compare the results across many queries. In such an instance, a connection with significant p-value and but low tau would suggest a promiscuous drug with non-unique connections, while those with high tau scores would suggest a drug more specific to the disease signature. Several other approaches [73, 74] and tools [75, 76] exist to evaluate drug-drug and drug-disease similarity comparisons. A well-conceived article by Zhou et al. provides a systematic evaluation of a variety of these approaches [77].

Finally, compared to structure, the transcriptomic signature of a drug is highly variable. This is an inherent issue related to the present method where drugs are treated in various cell types, usually with three or more biological replicates. To overcome this challenge, as a final step towards signature-based data generation, the LINCS consortium is now generating consensus transcriptional signatures, which are consistent (and thus comparable) across different molecules.

Examples of signature-based approaches

Many drugs have been suggested to repurpose in CNS disorder using signature-based approaches. Examples include campthothecin (promotes apoptosis), chlorambucil (promotes apoptosis), flupenthixol (dopamine receptor antagonist), valdecoxib (anti-inflammatory) and nimesulide (inhibits prostaglandin synthesis) for bipolar disorder [78]; cefuroxime (bactericidal activity), cyproterone (blocks androgen receptor), metrizamide (iodine-based radiocontrast agent), trimethadione (reduces T-type calcium currents, stabilizes neuronal membrane) and vronostat (histone deacetylase inhibitor) for Alzhimers disease [79] and terric acid (a tyrosine kinase inhibitor) and pergolide (dopamine and serotonin receptor agonist) against binge drinking [80]. Notably all these studies utilized connectivity map approach and mostly repurposed cancer drugs for these disorders.

Limitations of signature-based approaches

There are some limitations associated with signature-based drug repurposing. At the data generation level, the L1000 approach, though cost effective, has limited transcriptome coverage [70]. Data imputation methods are used to predict the remaining transcripts, but the estimate is reliable only for ~50% of the transcriptome [70]. In this regard Perturb-seq, an assay combining clustered regulated interspaced short palindromic repeats (CRISPR) based perturbation and single-cell RNAseq should be considered as alternate [81]. Compared to L1000, Perturb-seq provide greater coverage of transcriptome from large number of perturbations with reduced cost. The LINCS database primarily has cancer cell lines, limiting its use for CNS disorders. However, NeuroLINCS, has been established, a center generating signatures from patient-derived induced pluripotent stem cells (iPSC) with a focus on neurons and CNS disorder [71]. For example, multi-omics datasets, including transcriptomics, proteomics, imaging, and epigenomics signatures of iPSC from Amyotrophic Lateral Sclerosis (ALS), Alzheimer’s, and Spinal Muscular Atrophy (SMA) patients are available in NeuroLINCS.

Next steps: informed repurposing using data integration

The success of any drug development or repurposing approaches is contingent upon its potential to characterize the phenotype of interest and elucidate a mechanism of action. Signature-based approaches, being associated with gene expression, can make full use of expanding omics-based technology, data analysis, and integration workflows. For instance, knowledge can be incorporated from genome-wide association studies (GWAS), disease biomarkers, and biological pathways associated with a disease state to filter for precise and causal drug candidates. In addition to omics-based filtering, current drug repurposing approaches involve integration of both structure and signature-based approaches with advanced data mining and machine learning methods. In the following section we discuss efforts for more informed drug repurposing.

GWAS based gene signatures for repurposing

Single nucleotide polymorphisms (SNPs) are the most common cause of variation in the human genome [82]. SNPs are single base pair changes that occur mostly in non-coding regions of the genome and may have a biological or functional contribution towards disease states. SNPs may impart disease risk by changing the affinity of a transcription factor for DNA binding site, the stability of the transcript, or the amino acid sequence of the translated protein [83]. Neuropsychiatric disorders have diverse risk alleles, where genomic variations, including SNPs, confer susceptibility to developing the disorder [84]. GWAS compares SNP alleles between the cases and controls to characterize disease-associated variations explained by SNPs. It is notable that individual SNP risk alleles individually only contribute very small levels of increased risk for psychiatric disorders [85]. Despite this limitation, genomic variants provide information on the genetic and biological underpinnings of a disease and represent a potentially powerful approach to target causal genes and gene-products for drug repurposing [86].

Several studies have used GWAS-based results as a disease signature to repurpose drugs for an array of neuropsychiatric disorders (Table 5). For example, a meta-analysis of 796 GWAS studies filtered 991 genes identified by GWAS [86, 87]. 21% (212/991) of these genes were considered targetable by small molecules and 47% (469/991) were considered biopharmable, that is genes annotated as having a peptide product or containing a transmembrane domain. The list of genes returned from their analysis was enriched compared to those derived from the entire genome, which contains 17% drug-target for small molecules and 38% biopharmable genes. This study provides a strong rational for a GWAS based drug-discovery and repurposing pipelines [86].

Table 5 GWAS-identified drugs which may be repurposed for each indicated disorder.

Another example of using GWAS focused on known genetic risk for Alzheimer’s dementia (AD) [88]. In this study, a Bonferroni corrected list of risk loci and associated genes were cross-referenced against three “gene-to-drug” databases: Kyoto Encyclopedia of Genes and Genomes (KEGG), Drugbank, and Drug Repurposing Hub. The results confirmed the ability of current AD-approved drugs to target the known genes and also ascertained if other newly categorized genes were being targeted by these drugs [88].

Large consortium and disease-specific databases have been instrumental in GWAS studies [89] and recently have utilized these resources along with available drug repurposing resources to go beyond disease-disease similarity and predict druggable mechanism against the different psychiatric disorders [90]. In a cross psychiatric disease analysis using psychiatric genomics consortium and other disease-specific consortia databases potential genes associated with seven psychiatric disorders, including AD, Major Depressive Disorder (MDD), and Schizophrenia, was imputed from the GWAS summary statistics. Potential pharmacologic therapies were then identified using genome-wide expression profiles of various cell lines treated with drugs from the Connectivity Map (Cmap) database. Going beyond the identification of drugs, GWAS based MDD data from the Psychiatric Genomics Consortium along with tissue expression data, and the perturbagen signature of different drugs was integrated to examine the gene and drug-target interaction. This integrative approach highlighted the genes associated with drug-resistance in MDD.

GWAS and transcriptomic data can also be integrated together using network approaches to predict target drugs [91]. Here, network approaches help in filtering genes that are highly correlated with the disease. This was demonstrated in schizophrenia, where using Genotype-Tissue Expression (GTEx) database transcriptomic variants of schizophrenia were first determined. Next, by implementing weighted gene co-expression network analysis (WGCNA), gene modules (cluster of highly co-expressed genes) highly correlated with schizophrenia phenotype were identified [91]. The identified module was enriched in pathways associated with synapse, calcium/calmodulin-dependent kinase inhibitor, voltage-gated sodium channels blocker, and glucocorticoids. Using the drug repurposing approaches SHANK3 inhibitor was identified as drug targeting those module-specific pathways [91].

A criticism of GWAS is that it may identify SNPs with spurious (or “passenger”) associations to the disease or disorder within a region of the genome instead of pinpointing bona fide associations [92, 93]. Inherent in the large number of individuals and the large number of SNPs being tested, one needs to correct for multiple testing in these studies. While using methods such as false discovery rates abrogates the risk of finding a false positive, it simultaneously increases the threshold at which an allele or SNP with a true but small effect size may be detected. While the statistical means to identify meaningful associations between SNPs and a given phenotype are challenging, GWAS are intrinsically limited biologically for a given SNP by its effect on the genome and its frequency in the genome. The limitations of GWAS can be ameliorated and informed by integrating GWAS hits with network- and pathway-informed databases, ultimately narrowing in on the “druggable” targets. However, there are techniques available that permit identification of causal genes. Technologies such as Transethnic GWAS (studies prioritizing candidate genes across diverse populations) [94], copy number variant analysis (identifying variations in the number of gene copies) [95], quantitative trait loci analyses (mapping complex phenotypes to a chromosomal locus) [96], imputed gene expression profiles [97], and epigenomic methods improve identification of causal disease genes [98], narrowing down “druggable” targets [99]. These methods are typically deployed in conjunction with network- and pathway-informed approaches.

Data driven drug repurposing

The network-based approaches to drug-discovery and repurposing largely builds upon the guilty-by-association principle, which assumes that closely related drug signatures in a network may share gene expression signatures and are likely to have related functions [100, 101]. Thus, efficacy of an unknown drug may be inferred based on its proximity to drugs with known function in a defined network. To date, this approach has only been widely applied to FDA approved drugs by examining drug-disease interaction networks. However, this approach has significant limitations, as it typically centers on a small number of interactions whose functional properties cannot be extrapolated to the rest of the network. In effect, it only encompasses outliers (representing a few strong connections) whose function is not necessarily generalizable [102]. To mitigate this limitation, signature-based pathway information [103] and data from other networks (including drug–drug, target–target, and disease–disease) may be integrated to create a more heterogeneous network [104].

Pathways or ontologies functionally represent related genes and provide summative information for their interactions. Pathways have been extensively utilized to explore cross-talk between different biological process, gene–gene interactions, molecular mechanisms underlying a disease, and disease causality. For example, a network-based approach was combined along with the disease-associated pathways [105]. This platform, called CauseNet, mimics a manual pathway analysis for drug repurposing and works via a multilayer network construct linking drug to target, target to pathway, pathways to gene, and gene to disease. The transition likelihood from one link to another is learned using statistical methods. A novel indication for a drug is predicted using a maximum likelihood estimation. In a cross-validation test, the approach showed high performance (AUC = 0.859) in predicting novel indications for known repurposed drugs [105].

Using a similar strategy, a heterogeneous network was created by combining information from disease–disease, drug–drug, and target–target networks [106]. This approach outperformed similar methods utilizing only drug-drug similarity and permits prediction of drug-target and drug-disease relationship. This approach was recently extended using a Bi-Random walk algorithm to predict novel indications of an existing drug [107]. The algorithm improved the predicted drug-disease similarity in two parts. First, it adjusts the weak and uninformative drug-disease similarity by randomly permuted correlation analysis. Second, after adjustment, rather than correlation, any two drugs were considered similar based on number of shared (common) drugs between the two drugs. The approach performed better than network-based algorithms (AUC = 0.91) [107]. A recent example deployed these approaches. Utilizing a heterogenous network of protein-protein interaction and drug-protein interactome, cromoglicic acid (inflammatory mast cells stabilizer), acetazolamide (a carbonic anhydrase inhibitor) and cinnarizine (blocks L- and T-type voltage-gated calcium channels and binds to D2 dopamine receptor) were identified as potential therapeutics against schizophrenia. The findings were validated by integrating clinical information of the suggested drugs [108].

Ongoing work is also merging data from both structure and signature-based approaches to increase efficiency and selectivity. For example, a database of 4296 compounds was created using signature profiles derived from 60 human tumor cell lines [109]. The pairwise similarity of each signature was computed and correlations between compounds greater than or equal to 0.75 were considered robust. Next, the chemical similarity for these compounds was computed and those with similar structural and signature profiles were identified as possibly having the same molecular target, suggesting new indications for repurposed compounds. A related attempt to integrate signature and structure-based approaches for drug repurposing generated 62 vectors (per compound) for 147 drugs that are inhibitors of cruzipain, a parasitic cysteine protease [110]. The model trained using these vectors was then used to predict cruzipain inhibiting activity for 5000 compounds from Merck Index 12th database [111]. Docking simulation of compounds predicted by the model was performed to assess their bioactivity against the cruzipain protein. Such integration of signature and structural approaches is a promising and robust advance for drug repurposing.

Text mining to filter relevant information

The structural and transcriptomic signatures of a drug may suggest hundreds of compounds with similar effects; in such an instance, filtering or enriching for the most promising candidates can be challenging. However, combining critical information regarding a drug’s family, pharmacology, toxicity, protein target, and targeted pathways, as well as structural and signature-based similarity, may enhance the drug-repurposing workflow [112]. Text mining combines information from thousands of documents and articles to deliver new meaning and possible answers to complex questions [113]. Text mining has been used prolifically in the medical field and in conjunction with data-driven drug repurposing approaches [114]. A typical biological text mining effort involves four steps: (1) information retrieval, including parsing of relevant information from large data sources; (2) biological name entity recognition, with identification of valuable biological concepts using controlled vocabularies, and the last two steps; (3) biological information extraction; and finally, (4) biological knowledge discovery, which involves extracting useful biological information and constructing a knowledge graph, a compilation of interlinked descriptions of objects [115].

Text mining was recently utilized in combination with network analysis to explore disease-protein and drug-protein relationships [116]. Using known proteins implicated in AD, a protein-protein interaction network was curated. The list of proteins in this network was used as query to text mine drug-related information from PubMed abstracts [116]. This query returned a list of 1249 possible AD-related drugs. Then, drug-target similarity scores were calculated to assess for biological relevance. Their approach led to the identification of diltiazem and quinidine, prescribed for hypertension and cardiac arrythmias, respectively, as possible drugs of interest in treating AD. This work highlights the potential of text mining as an approach to identify previously unknown patterns memes and to generate hypothesis from accumulating text sources and databases.

Beyond traditional drug repurposing: inclusion of diverse omics datasets

Using signature-based drug repurposing approaches for a given disorder may generate surprising and novel insights into disease pathology. While acquiring new knowledge of disorders is crucial for medicine and the scientific community, such techniques also allow for an improved understand of the drugs used to treat them. As we grow our attempts to intelligently repurpose drugs, we find that we may discover new knowledge related to drug classifications, adverse drug reactions, mechanisms of action, as well as the genetic predispositions that drive drug sensitivity and resistance. Although there are few published examples that utilize these repurposing domains, in this section we discuss how diverse omics datasets may be utilized for repurposing.

Drug classification

The anatomical therapeutic chemical (ATC) system created by world health organization classifies drugs into different groups based on the organ system they target, as well as their chemical, pharmacological and therapeutic properties [117]. Such a classification serves as a tool to improve drug utilization and development. Transcriptomic signatures of drugs may also be used to develop drug classifications. A novel machine learning-based drug classifier was recently developed, which focuses primarily on the drug characteristics and class extracted from ATC, instead of relying on drug-disease similarity [118]. The classifier involves calculating average similarity between drugs to predict drug class using three features: gene-expression signatures, chemical structure, and known common targets calculated based on human protein-protein interaction networks. Each feature by itself had minimal performance in the classifiers. However, the performance increased to an accuracy of 78% when all three features were used in combination to predict the ATC class of 281 drugs. This method may be even more powerful than these high accuracy levels suggest. Subsequent analyses suggested that these “misclassified” drugs (i.e., the remaining 22% of these drugs) are more accurately conceptualized as being “reclassified.” In other words, this particular classifier goes beyond simple validation of the ATC classification system and has an ability to suggest new drug classifications that may prove more useful than the original drug classification paradigm. The classifier could predict most known drugs into new therapeutic classes, which were consistent with several literature reports. In a similar study, chemical–chemical similarity and interaction were used as features to predict the ATC class of 3833 drugs. This approach predicted drug class with 73% accuracy, substantially higher than 7% accuracy using prediction by chance [119].

The ATC also has important limitations. It has incomplete drug coverage, and the classification rubric presently involves 14 levels, with each level further divided into subgroups; this occasionally leads to singleton drug classes with only one drug [120]. In addition, the classification of a new drug using ATC is a tedious process, involving a formal request for classification from researchers to world health organization [121]. Data driven drug-classification using drug signatures will improve the utility and accuracy of the ATC and related databases.

Adverse drug effects

An adverse drug reaction (ADR) can be defined as a harmful reaction to a drug, causing injury with a clear causative link to the drug being administered [122,123,124]. ADRs are a major concern for both drug development and public health, and failure to identify ADRs can lead to significant morbidity and economic loss. However, ADRs may also help in understanding drug-disease phenotype connections. Typically, predicting an ADR was considered a binary classification problem where the chemical and biological aspects of the drugs were used to predict the presence or absence of adverse effects. Taking advantage of this possible connection, SEP-L1000, a machine learning-based classifier, was developed to predict ADRs of over 20,000 small molecules available in the LINCS database [125]. The steps in developing the classifier involved gathering drug-associated data (features) from multiple resources including, but not limited to, structure from PubChem [126], signatures from LINCS, as well as side effect data from Drug Side Effect Resource [127] and PharmGKB (a resource for ADRs of FDA approved drugs) [128]. To prioritize the most predictive features for each ADR class, feature selection was performed using a regularized logistic regression model. After selecting the top 50 predictive features, classification algorithms were applied to train each ADR class [129]. Importantly, benchmark metrics showed that gene expression signature was the best predictive feature and was used further to associate each drug’s ADR with the gene ontology-based pathway networks. This novel approach performs better than target-based binary classification for predicting ADRs and is scalable. Further, incorporation of gene ontology in the model may provide new insights about a drugs mechanism for ADRs.

Drug Mode of action

Mode of action (MOA) refers to specific drug-target interaction through which the pharmacological effect of drug is observed [130]. Beside the traditional structure-based molecular docking approaches of identifying the drug-target interaction, newer and faster computational modeling approaches leveraging a drug’s transcriptome signatures are being developed [131]. For example, a Bayesian machine learning approach (called BANDIT: Bayesian ANalysis to determine Drug Interaction Targets) was developed to identify drug targets. BANDIT combines LINCS resources, as well as data for drug structure, growth inhibition, side effects, known targets, and bioassay results. With integration of these predictors, BANDIT provides an improvement in predicting drug targets over other similar approaches [132]. The method calculates a pairwise similarity score for each predictor for all drugs with both known and unknown shared targets. To assess the degree of each data type’s ability to separate the pairs groups, a Kolmogorov–Smirnov test was used. Their results indicated that structural similarity; bioassay and growth inhibition assays had the strongest differentiation statistic, while the transcriptional and adverse effects had the weakest. Benchmarking showed that BANDIT had overall accuracy of ~90% in predicting the mode of action of a drug. Determining MOA in silico allows for additional characterization, rather than purely relying on biochemical bench-work, with high accuracy that may increase the drug-development pipeline.

Drug resistance

Drug resistance, a decrease in effectiveness of a therapeutic to treat a disease condition, is one of the primary obstacles facing drug design and discovery today. In cancer, somatic mutations have considerable impact on drug resistance [133]. To address this concern, expression-based variant-impact phenotyping was developed, an approach that compares wild-type and mutant alleles of the same gene’s transcriptional signatures to deduce functional role(s) of that specific mutations [134]. This approach also segregates the impact of somatic mutation on drug resistance into consequential mutations, called drivers, and non-consequential mutations, called passengers, and allows omics scale determination of the contribution of somatic mutations to cellular functions. It follows that for a successful drug repurposing, gene expression profiling of a drug should distinguish the contribution of somatic mutations to perturbagen signatures. As the number of molecule-specific transcriptional signatures in LINCS grows providing signatures associated with gene variants, the ability to develop drugs that consider and overcome drug resistance will improve. Recent work highlights this approach.

LINCS resources and transcriptional profiles of biopsies from patients before, during, and after relapse were used to assess the variable efficacy of chemotherapies, including MEK and BRAF inhibitors, which have been suboptimal in treating tumors [70]. A comparison of mutations in the tumors with profiling of existing perturbagen, knockdown, and overexpression signatures in LINCS cancer cell lines showed a strong negative correlation with MAP kinase signaling, suggesting a re-activation of the MAPK pathway in these patients during relapse [70]. Although the contribution of somatic mutations to psychiatric disorders is not well characterized, this method could be leveraged to study genomic variants associated with treatment-resistant depression, schizophrenia, or other psychiatric disorders, highlighting an opportunity to develop more effective pharmacotherapies.

Drug permeability

Influenced by lipophilicity, size, charge, molecular weight, and hydrogen bonding capacity, permeability of drugs across membranes is key parameter influencing its absorption, distribution, and elimination across blood-brain barrier [135]. Based on the available 2D or 3D structure of the drug, permeability across membranes can be estimated [136]. One approach to this challenge utilizes the LINCS L1000 landmark genes. Based on the principal that a more permeable compound will tend to induce larger changes in transcript expression, the expression levels of the L1000 genes can be used as a proxy of cellular permeability for a compound or drug in the LINCS database. Transcriptional activity score (TAS), a proxy for molecules cellular permeability and activity, is estimated for each compound in the LINCS database. TAS ranges between 0 and 1 and is a geometric mean of signature strength (number of differentially expressed landmark gene with absolute z-score >2 for a given compound) and replicate correlation (correlation between biological replicate of a compounds L1000 profile) normalized by number of L1000 genes. So far the use of TAS is limited to drug repurposing efforts against antimicrobial agents [70, 137] and its potential to predict complex blood-brain barrier is uncertain. However, since TAS is associated with gene signatures it may be linked to ontology and used as a predictor in complex models to determine drug activity.

Generation of seed gene knockdown signatures

Studies investigating the pathophysiology of CNS disorder using postmortem brain or model systems often focus on one or a few candidate genes. Such hypothesis- or candidate-driven research often fails to account for the complexity of CNS diseases, as well as the heterogeneity of biological processes. However, using so-called “seed genes” chosen based on findings related to specific candidate genes or pathways can be an entry point for bioinformatics analyses, connecting candidate-based studies with drug repurposing. Combining the changes in expression of a small number of genes in a disease state along with LINCS gene knockdown and/or overexpression signatures permits interrogation of the LINCS perturbagen database for compounds that reverse or simulate consensus “seed gene” signatures. This approach is built on the assumption that seed gene-specific signatures would reflect disease-associated changes at the network level [138].

A recent study highlights the potential of the seed gene approach. Knockdown signatures of glycolytic genes decreased in pyramidal neurons in schizophrenia were used to generate clusters of genes to probe the LINCS database [32, 139, 140]. Using LINCS, drugs, and compounds were identified that reversed the consensus seed gene clusters for the implicated glycolytic genes. Pioglitazone, a synthetic ligand for Peroxisome proliferator-activated receptor gamma, a nuclear receptor, and a member of the thiolazinedione drug family, showed significant reversal of these bioenergetic profiles. Pioglitazone is a well-characterized regulator of bioenergetic function, including lipid homeostasis, adipocyte differentiation, and insulin sensitivity. It also increases expression of glucose transporters. As a confirmation study, decreased levels of the glucose transporters GLUT1 and GLUT3 were found in schizophrenia as well as in the Grin1KD mouse, an animal model of developmental disorders [138]. Treatment of Grin1KD mice with pioglitazone for 1 week improved executive function. Start-to-finish this work illustrates the potential for repurposing FDA approved drugs, starting with candidate or seed genes, followed by bioinformatics-driven identification of drug candidates, and finally testing in animal models [138].

Validation and benchmarking

At the data integration level, most studies are still in the exploratory stage and results are limited to FDA approved drugs with known mechanisms of action [132]. Expanding these methods to include yet unclassified and untested small molecules from compound libraries is ongoing, but validating biological significance remains incomplete. A typical validation step may involve an in silico exploration or animal studies. While comparison of across studies and species is a common practice for in silico approaches, the most rigorous in silico validation could include using large-scale patient-level routine health care data involving the repositioned drug [141]. Being large-scale, these data allow detection of small differences and adjustment of covariates to minimize the effect of any confounding variables, thus making them ideal for validating the repositioned drug [141]. However, due to their restricted availability, only a few studies have incorporated patient-level data for validation of repurposing. Numerous studies have used animal models as an experimental validation of the repurposed drugs. However, animal models for most disease are either unavailable or are unable do not recapitulate the disease in question. Novel in vitro organ-on-chip [142] or organoid [143] based drug screening techniques may be integrated with the present in silico approaches to circumvent some of these limitations.

Future research directions

Historically, many causal theories of psychiatric disorders were based on or informed by unintended discoveries of medications with efficacy for a specific disorder. For instance, the use of monoamine oxidase inhibitors contributed to the monoamine hypothesis of depression [29, 144,145,146]. Recent work highlights the promise of this concept. Converging evidence from animal models and studies of postmortem brain indicate changes in bioenergetic function in schizophrenia. This work is complicated by the potent metabolic effects of antipsychotic medications, making it difficult to separate genetic risk, adverse drug response, and disease pathophysiology. The example provided previously where a modulator of glucose uptake (pioglitazone) was identified using a bioinformatics approach (in this case transcriptional profiling in the LINCS database) to reverse the disease signature strongly supports the bioenergetic dysfunction hypothesis for severe mental illness [32, 147]. These examples highlight the notion that drug repurposing can expand or drive understanding of pathophysiology.

A recurrent theme that emerges in the aforementioned approaches of signature-based repurposing and knowledge discovery is integration, both for datasets and bioinformatic tools. A feature (or predictor) defined by a particular approach by itself has limited power to predict the outcome (new indication or knowledge). Combining an ensemble of approaches along with computational power and statistical innovation has shown promising results [77]. In this regard, several future avenues should be considered. First, with more recent expansion of transcriptomic technologies to cellular resolution, data-centric repurposing approaches can now integrate single-cell-transcriptomic data to refine pathophysiology mechanisms to a cellular or circuit level [148, 149]. Second, lead drugs from signature-based repurposing may be used as skeletons to design or optimize new molecules using chemoinformatics approaches [150]. Third, transcriptomic signatures of drugs can be associated with gene ontology, enabling pathway-specific drug repurposing. Fourth, by Integrating the aforementioned knowledge associated with adverse drug reaction of each drug, a safety profile of repositioned drug may be developed. Taken together, integration of datasets, bioinformatics tools, and platforms offers a promising avenue for advancing the field of drug discovery via repurposing for CNS disorders.

Computational drug repurposing and precision medicine

Pharmacogenomics, an emerging field to study genomic influence in an individual’s response to drug, highlights inter-individual differences in drug response and contributes to developing precision medicine; a customized treatment decision tailored for an individual based on his/her genetic profile [151]. Pharmacogenomics represents an early attempt at precision medicine for in CNS disorders [152]. However, contradictory study outcomes have limited the impact of pharmacogenomics in the clinical setting [153, 154]. For instance, while there are studies demonstrating antidepressant treatment outcomes with common genetic variants in patients with major depressive disorder [155], other studies failed to predict therapeutic outcomes [156]. We posit that only considering genetic variability for precision medicine platforms is not sufficient for tailoring drug choices, due to the phenotypic variability observed even with highly penetrant genomic variants; the DISC mutation is an instructive example, as carriers may have a range of diverse psychiatric syndromes within the same family cohort [157].

It follows that the landscape of a disease phenotype may be regulated at multiple levels beyond the genetic code, including the transcriptome, proteome, metabolome, and kinome domains [158]. Ideally, precision medicine would be better guided by complete or enriched information from these different levels of gene expression and regulation. With the emergence of omics technology for multiple levels of expression or regulation, the opportunity now exists to integrate and leverage these data to individualize clinical care. Currently providers often customize clinical care via off-label prescribing, an ad hoc approach that lacks rigor and often is only supported by anecdotal experience. However, genomic and RNA sequencing data are being used for state-of-the-art, individualized drug repositioning [76] (also see Talevi [159], and Andreas [160] for psychiatric disorders). In other fields, cancer treatment is now often guided by genomic profiles from cancer cell lines, solid tumors, leukemias, and stem cells [161].

Improving precision medicine in psychiatric disorders

In contrast, in psychiatry it is difficult to get brain tissues for obvious reasons, limiting access to the source of the pathophysiology for non-malignant CNS disorders. However, inducible pluripotent stem cells (iPSCs), organoids, and other primary tissue sources (such as exosomes from living patients) are providing important new substrates that may be exploited for bioinformatic analyses [162]. For example, iPSCs from a schizophrenia subject with the DISC mutation were used for an informatics-based study of kinase activity, identifying drug candidates that may reverse the cellular signatures found in these cells [163]. It is now possible to conceive of a pipeline where patient-derived stem cells are used to generate neuronal, glial, or mixed cultures, as well as brain organoids, that may be interrogated across omics platforms, yielding candidate drugs that reverse the disease signature.


In this review, we present the current status in the field for drug repurposing as it applies to CNS disorders, and more specifically for psychiatric illnesses. Signature-based approaches using transcriptional profiling represent an important advance for drug repurposing, and integration of traditional approaches with this accessible technique show promise for advancing the field. Finally, there is considerable promise for deployment of precision drug repurposing for psychiatric disorders, offering new avenues for translational research connecting big data analytics with the afflicted.

Funding and disclosure

This work was supported by NIMH MH107487 and MH121102. The authors have nothing to disclose.