Main

The novel SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) pathogen has infected around 60 million people and caused more than a million deaths worldwide (https://covid19.who.int/; as of November 2020). As a result, there is a need to find treatments that can be applied immediately to reduce mortality or morbidity.

Repurposing existing drugs is a rapid and effective way to provide such treatments by identifying new uses for drugs that have well-established pharmacological and safety profiles1. Many drugs used to treat different diseases have already been successfully repurposed and approved for new indications2. While repurposing can be conducted at any point in drug development, its greatest potential can be applied to drugs that are already approved3. In the case of the COVID-19 pandemic, it is a fast and cost-efficient approach to identify novel treatments4.

Recent studies have increasingly employed computational methods to systematically predict new drug targets or drug repurposing candidates. In contrast to experimental high-throughput screening, in silico approaches are faster, lower-cost, and can serve as an initial filtering step for evaluating thousands of compounds. Thus, they are useful for prioritizing drugs that warrant further evaluation and experimental validation. This requires the application of suitable algorithmic approaches to identify mechanisms relevant or specific to the disease4.

This Review discusses current in silico drug repurposing efforts for COVID-19, followed by a discussion of the lessons learned from different perspectives (from data resources to the quality of predictions) and a proposed unified strategy to improve the response in potential future outbreaks. The covered studies employed standard drug repurposing workflows and data-driven algorithms.

As new studies are published almost every day, it is not possible to provide a broad and comprehensive overview of all repurposing studies. Hence, this Review focuses on the computational methods for drug repurposing, their application, availability and feasibility in a selection of studies (peer-reviewed and preprint) that were selected to cover a wide variety of different methods. It is worth noting that most of these studies are not considered successful clinically. Nevertheless, it is important to properly evaluate and improve the predictive power of in silico approaches that are capable of utilizing information from existing drugs as well as host and virus biology, even with limited availability of data on the novel emerging pathogen. This promotes a rapid and practical response to infection and therefore improves success in future pandemics, particularly in tackling the rise in infection cases at the early stages of the pandemic or ahead of vaccine development.

Data resources

Besides experimental datasets, the rapid availability of resources that integrate different data types is crucial in a pandemic. Sharing data accelerates research, as computational methods depend on high-quality datasets, and experimental labs do not need to collect the information on their own. The large number of resources used in COVID-19 drug repurposing studies have shown that data can be quickly generated and gathered through strong community efforts. This section presents a selection of data resources used in the reviewed studies to describe the resource types that accelerated computational drug repurposing approaches: most of them are general data resources that were already established before the pandemic but that have been extended with COVID-19 or SARS-CoV-2-specific data. The resources used in the reviewed studies are listed in Supplementary Table 1. A list of COVID-19 specific data resources that were not used in the reviewed studies but may become relevant in the future is given in Supplementary Table 2.

Molecular data resources

All molecular data used in the reviewed publications were extracted from already established, general data resources that were quickly extended with SARS-CoV-2-specific data. Resources such as GenBank5, the GISAID initiative6, or UniProt7 provide genomic/proteomic sequence information about hosts and SARS-CoV-2. Structural resources collecting information about proteins, such as the Protein Data Bank (PDB)8, were extended by various SARS-CoV-2-specific proteins. Finally, transcriptome resources that collect gene expression data were used in several COVID-19 drug repurposing approaches. For instance, the Genotype-Tissue Expression (GTEx)9 program offers insights into tissue-specific gene expression. Expression in lung tissues is of high interest in COVID-19 drug repurposing research and was often integrated in computational models or studies. Other resources, such as the LINCS L1000 database10, profile gene expression changes under certain drug treatment conditions and were used to identify drugs with reverse expression profiles to the samples infected with SARS-CoV-2.

Network and interaction resources

Protein–protein interaction (PPI) networks enable visualization and analyses of the interactions between either host or virus proteins and other host proteins. Furthermore, PPI networks allow for particular adaptation and search strategies (for example, edge filtering) and can be connected to drug resources. Gordon et al.11 identified 332 high-confidence virus–host interactions between SARS-CoV-2 and human proteins. It was the only newly created and exclusively SARS-CoV-2-related resource used in the reviewed publications of this work. VirHostNet12,13, a virus–host PPI resource that already existed before the 2019/2020 SARS outbreak, was expanded with 167 new SARS-CoV-2 interactions. In contrast to virus–host PPIs, host PPIs are not virus specific. All resources that were used in the reviewed studies were already available before the pandemic but have since been widely used in COVID-19 drug repurposing approaches14,15. Besides molecular networks, knowledge graphs, such as the Global Network of Biomedical Relationships (GNBR)16, have demonstrated their utility for drug repurposing. These networks comprise various types of biological relationships assembled from literature and were integrated into COVID-19 drug repurposing approaches17.

Drug and trial resources

Drug databases that already existed before the pandemic and that are continuously extended with newly developed drugs were used to connect the results of different approaches to potential drugs. A widely used drug database is DrugBank18, with more than 13,000 drug entries of approved and in-trial drugs, including drug targets. On the other hand, ChEMBL19 and ZINC1520 contain millions of compounds that exhibit drug-like properties.

Drug repurposing approaches also benefited from trial databases as they can be used to validate whether the predicted drugs are already in trial or have not yet been evaluated. Examples of such resources are the EU Clinical Trials Register (https://www.clinicaltrialsregister.eu/) and ClinicalTrials.gov (https://clinicaltrials.gov/). The latter contains more than 350,000 research studies from 219 countries.

Drug repurposing studies

Various clinical, experimental and computational drug repurposing efforts have been rapidly mobilized prioritizing compounds to identify promising drug candidates for the SARS-CoV-2 pandemic. In this section, we examine a selection of studies representing the different computational approaches to identify potential new targets and repurposable drugs for COVID-19.

Virus-targeting approaches

Virus-targeting approaches mostly rely on structure-based drug screening methods, which take the three-dimensional structures of target proteins to predict affinities or interaction energies of known chemical compounds to the proteins (Fig. 1). These methods were mainly used to identify candidate drugs that target viral proteins, so we refer to them as virus-targeting approaches, although they can also be applied to host proteins. Two main methodological workflows were applied, namely, structure-based21 and deep-learning (DL)-based drug screening. Here, we describe these methods and compare 23 COVID-19 drug repurposing studies22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44.

Fig. 1: Workflows of virus-targeting computational drug repurposing approaches.
figure 1

The input data consist of protein structure information (experimental or predicted) and chemical structure of drugs from public databases. Two analysis workflows can be applied: standard analysis consisting of docking followed by molecular dynamics (MD) simulations and DL-based analysis. Finally, the output data of both approaches generally consist of a ranking of drugs based on their (predicted) docking scores. The drugs can be further evaluated by whether or not they are in clinical trials.

Structure-based drug screening

The first step for structure-based screening is the selection of the drug library and the target protein. For COVID-19, the intuitive candidate for targeting virus proteins were antivirals. Thus, many studies limited their search to these. The number of screened drugs ranged from 3 (ref. 37) to 123 antiviral drugs33. Broader studies, such as that by Chen et al.26, combined compounds from the KEGG (Kyoto Encyclopedia of Genes and Genomes) and DrugBank databases to screen 7,173 drugs.

The other crucial step is the selection of the target protein and its corresponding three-dimensional structure (experimental or predicted). Wu et al.40 performed screening on 19 encoded proteins of the virus. By comparison, most other studies focused on the 3CLpro, envelope (E), spike, RNA polymerase and methyltransferase proteins.

Virtual screening of the drug libraries utilized established software, such as Autodock45 and Glide46. Candidate drugs were selected using respective scoring methods, followed by validations with molecular dynamics simulations30,37.

Most drugs were predicted for 3CLpro (Supplementary Table 3), which was also the focus of most studies (17 studies), followed by RdRp and PLPro. For 3CLpro, the predictions ranged from 2 (ref. 29) to 27 (ref. 40) drugs per study. The 5 most frequently predicted drugs were ritonavir (8 studies), lopinavir (6 studies), nelfinavir, remdesivir and saquinavir (5 studies each). However, 99 of the candidate drugs were only predicted in 1 study, showing a high variability in the resulting candidate sets. Interestingly, the studies that screened full databases also predicted antiviral drugs as top scorers (Supplementary Table 4). Of the 23 studies, 10 have not yet been peer-reviewed, which we discuss in the section on ‘A unified drug repurposing strategy’.

DL-based repurposing strategies

DL models can predict binding affinities or docking scores and have shown advantages over conventional docking protocols. While standard docking protocols are limited to millions47, DL approaches can analyze billions of chemical compounds. This allows them to be applied to whole databases, which increases the diversity of the tested compounds and the likelihood of finding unconventional compounds47. Furthermore, they are capable of processing more (physico-)chemical features48 and can find features related to a non-favorable docking47. However, most of these methods require datasets for training, which often come from real docking simulations; thus, the performance of many DL-based approaches still rely on the accuracy of the docking software used for training.

Ton et al.42 developed DeepDocking47, which utilizes quantitative structure–activity relationship models trained to predict docking scores of compounds targeting the SARS-CoV-2 3CLpro protein. It requires fewer docking pipelines, since it performs docking only on subsets of compounds and can produce a reduced list of compounds, which is also enriched in potential top hits.

Nguyen et al.49 developed the method MathDL, which utilizes low-dimensional mathematical representations of the drug–target protein complex structures, which are then fed to DL algorithms to predict binding energies of drug–protein complexes. For SARS-CoV-2, the authors used experimental binding affinity data from SARS-CoV ligand–3CLpro complexes from PDBbind and SARS-CoV protease inhibitors as training data to predict binding energies on DrugBank compounds for SARS-CoV-2 3CLpro (ref. 50) and does not depend on docking software.

Beck et al.44 developed a DL-based drug-target interaction prediction model, named Molecule Transformer-Drug Target Interaction. It utilizes simplified molecular-input line-entry system (SMILES)51 representations for drugs and protein sequences as input for training and predicts affinities. For SARS-CoV-2, the model was trained on commercially available antiviral drugs and viral target proteins. Antiviral drugs already used against SARS-CoV-2 were found among the candidate drugs identified.

Host-targeting approaches

Host-targeting approaches involve identifying potential drugs that interfere with host mechanisms that contribute to viral pathogenesis, which also makes them less prone to drug resistance52,53. In addition, SARS-CoV-2 infections can trigger a hyper-reactive immune response characterized by the excessive release of pro-inflammatory cytokines and chemokines54. Thus, drugs that modulate the host immune response can benefit critically ill patients with COVID-19 by targeting specific dysregulated pathways54,55,56.

Signature-based approaches

Signature-based approaches primarily utilize transcriptome datasets from samples infected with SARS-CoV-2 or closely related human coronaviruses to identify candidate drugs through connectivity mapping (Fig. 2), a well-established approach that relies on finding drug-induced expression signatures exhibiting reverse profiles to a disease signature57,58. Several studies adopted this as a primary method for identifying new therapeutics for COVID-19. Loganathan et al.59 performed differential expression analysis of virus-infected cells and extracted consistently dysregulated genes in infected conditions. They were used to query the Connectivity Map database58 for drug perturbation profiles exhibiting anti-correlated expression signatures. A modified approach was implemented by Jia et al.60, wherein expression data from infected and healthy individuals were used as input to a pathway-guided drug repurposing framework. They identified disease co-expression clusters and performed enrichment analyses prior to reverse signature matching60.

Fig. 2: Workflows of host-targeting computational drug repurposing approaches.
figure 2

Signature-based methods involve finding drug-induced expression profiles that exhibit reverse patterns to the coronavirus disease signature. Network-based approaches typically assemble heterogeneous networks from diverse data types, including gene–disease associations or drug–target associations. Algorithms such as network proximity, random walk/diffusion-based methods, or network enrichment are then employed. Some studies combined them with machine-learning-based methods, particularly autoencoders and graph convolutional networks. The outputs can be ranked lists of host targets or drug candidates.

Network-based approaches

The general network-based approach applied in drug repurposing studies on COVID-19 integrates multiple data sources, including virus–host interactions, PPIs, co-expression networks, functional associations or drug–target interactions (Fig. 2). Network-based algorithms or topology measures are applied to the assembled networks to identify relevant host protein targets or regions of the host interactome that can be targeted.

Multiple studies implement random-walk-based algorithms as the primary method to identify new putative drug targets. Law et al.61 implemented several algorithms on a virus–host interactome to identify additional SARS-CoV-2 interactors. The coronavirus spike protein primarily has been established to mediate viral entry into host cells62. Similarly, but focusing on a specific context, Messina et al.63 explored the pathogenic mechanisms triggered by the spike protein using data from three closely related coronaviruses. They implemented a random walk algorithm on assembled molecular networks using the spike protein as seed to identify relevant targets for COVID-1963. In addition, CoVex64 implemented TrustRank65, a variant of the PageRank66 algorithm, to propagate scores from user-defined seeds to the other host proteins and rank host drug targets.

Network proximity relies on the principle that a drug can be effective if it targets proteins within the neighborhood of disease-associated proteins in the interactome67. Zhou et al.68 utilized this concept to compute the network proximity measure between drug targets and coronavirus-associated proteins in the human interactome. They also used the ‘complementary exposure’ pattern, which is based on the shortest distance between targets of two drugs predicted by network proximity, to identify potential drug combinations to treat COVID-19 patients68.

Several studies combined multiple network-based strategies to predict drug candidates. Gysi et al.69 characterized and extracted a COVID-19 disease module using experimentally determined SARS-CoV-2 interactors. They performed network-based analyses accounting for tissue specificity and potential disease comorbidities. They employed a multi-modal approach to the virus–host interactome integrating network proximity, diffusion state distance and graph convolutional networks (GCNs) to identify drugs that can perturb the activity of host proteins associated with the COVID-19 disease module. The final drug list was obtained by rank aggregation from the different pipelines69.

CoVex64 is a web platform for exploring SARS-CoV and SARS-CoV-2 virus–host–drug interactomes64. Users can predict drug targets and drug candidates using several graph analysis methods that allow custom seed proteins as input. For instance, KeyPathwayMiner70 is a network enrichment tool that identifies condition-specific subnetworks by extracting a maximally connected subnetwork from the host interactome starting from the seeds. CoVex also implements a weighted multi-Steiner tree method that aggregates several non-unique approximations of Steiner trees, which are subnetworks of minimum cost connecting the set of seeds, into a single subnetwork.

Other studies additionally utilize machine learning to predict drug candidates against SARS-CoV-2. Belyaeva et al.71 implemented a hybrid approach between signature matching and network-based methods. Using autoencoders, they learned feature embeddings for drugs using drug-induced expression profiles to identify drugs exhibiting reverse profiles to the SARS-CoV-2 infection signature. Steiner tree and causal network discovery algorithms were then used to extract the mechanisms mediated by both SARS-CoV-2 and aging71. Ge et al.72 constructed a virus-related knowledge graph and employed a GCN algorithm. The list of drug candidates was further filtered for existing evidence of antiviral activities through text mining72. Similarly, Zeng et al.17 assembled a large-scale knowledge graph derived from PubMed articles. A GCN model was then applied to learn low-dimensional embeddings of the nodes and edges17.

Lessons learned

In the following, we examine the quality and potential of the reviewed data resources and computational methods in order to improve the response in future pandemics.

Data resources

The availability of molecular datasets is a precondition to develop drug repurposing methods quickly. Besides that, network-based resources were a large driver in drug repurposing. However, a large portion of the publications are based on only a few primary resources, which always induces the risk of bias or measurement errors. In addition, the only type of molecular interaction network used was PPI. Still, high confidence PPIs are needed since, for instance, none of the approaches included structure data. In the future, other network types, such as gene regulatory networks, should be considered. Other data resources, such as off-label data for drugs, should also be integrated in drug repurposing studies.

Finally, existing drug and trial resources were widely used for developing the drug repurposing pipelines. However, we observed no standardization in trial resources, making it hard to analyze trials for certain drugs due to different names, different spellings, or typing errors. Standardization is usually implemented for drug resources (for example, DrugBank), but some drugs undergoing trials could not be found in the databases. Keeping the resources up to date and interconnected should be a focus and will enhance accessibility.

Computational predictions

Assessing the quality of predictions is challenging, since many studies are not peer-reviewed, do not perform experimental evaluation, or rely on clinical trial databases. We examined the quality of predictions by determining the overlap between the final candidate drug lists from the individual studies and the drugs undergoing clinical trials from ClinicalTrials.gov (https://clinicaltrials.gov/) and Biorender (https://biorender.com/covid-vaccine-tracker) databases. In addition, we provide supplementary in vitro screening data, such as IC50 values for viral targets and inhibition indices from cell culture studies for SARS-CoV-2 (Supplementary Data 1). Our effort to compile these data shows that a substantial number of predictions have not been experimentally tested.

Evaluating virus-targeting approaches

We identified 53 drugs predicted with docking simulations that are undergoing current trials (Supplementary Table 5). Wu et al.40 identified most of the drugs (36 drugs); however, these drugs were predicted for multiple viral proteins (for example, chlorhexidine for 11 and methotrexate for 6 different viral proteins). This indicates that their approach did not yield specific and feasible candidates. After excluding this study, the remaining drugs were only predicted for one specific protein each, except for chloroquine (3CLpro and PLpro) and remdesivir (3CLpro and RdRp). The top five drugs in clinical trials, which were predicted by docking simulations using the 3CLpro main protease, were predicted by 14.3% (darunavir), 19.0% (remdesivir), and 23.8% (lopinavir, nelfinavir, ritonavir) of the total number of included docking studies (Supplementary Table 6), showing that for each drug, the majority of studies were not able to predict them. Similar drugs were identified by the DL approach of Beck et al.44, who identified ritonavir, lopinavir and remdesivir, which are being tested in multiple clinical trials. However, these antiviral drugs have not yet shown well-defined results in patients. For ritonavir/lopinavir, only four trials are completed73,74,75,76 and preliminary results suggest no difference in the outcome after treatment77,78,79. Further investigation is required80. For remdesivir, some trials have been completed and the preliminary results in patients81,82,83 and human cell lines84 showed that it could be effective in treating SARS-CoV-2 infection.

Antiviral drugs are always the top hits among a large selection of drugs from databases, indicating high accuracy of the methods. These drugs are good candidates for experimental screening or clinical trials, independently of how reliable the computational predictions are. More interesting candidates are the additional drugs identified by these approaches; however, little experimental validation is available for these drugs and the majority of them do not enter clinical trials. A similar situation is observed in the emerging field of DL approaches, where most studies focused on demonstrating the accuracy of their predictions and developing benchmarking datasets85,86. DL and docking simulation-based approaches are promising tools to identify repurposable drugs given their capacity to deliver results in a short time. While a standard workflow is already established for docking simulations, DL-based approaches might robustly deliver testable candidate drugs. However, docking studies in particular were rarely peer reviewed, found very different candidate sets and partially used different scores for evaluation and ranking. This makes it necessary to validate these results by systematic comparisons of experiments.

Evaluating host-targeting approaches

Host-targeting approaches typically involve integration and analysis of multiple omics types and employ data-driven network-based methods; thus, a major limitation is the lack of gold-standard datasets and the scarcity of data from the MERS-CoV (Middle East respiratory syndrome coronavirus) and SARS-CoV outbreaks. Prior to the availability of sufficient SARS-CoV-2-specific data, earlier studies utilized preliminary data or augmented the analyses using data from closely related viruses. While the quality of the predictions is highly data-dependent, continued generation of SARS-CoV-2-specific omics data and pending results on clinical studies are expected to improve the predictions. Clinical expert knowledge remains crucial for filtering the drug predictions based on criteria such as toxicity and pharmacological properties. However, the efficacy of these candidate drugs in trial remains to be established and firm conclusions cannot be made because of the limited data availability.

The degree of overlap with drugs in clinical trials was generally low (Supplementary Tables 7 and 8), but more than half of the drugs (26 out of 41) predicted by an ensemble method primarily based on knowledge graphs17 are also undergoing clinical trials. While it should be noted that the drugs registered for clinical trials were also used as their validation set at the time of writing, more of their predicted drugs were registered for clinical trials later on. We also noted several drugs that were predicted by both signature-based and network-based approaches and thus warranted further examination (Supplementary Table 9). Ribavirin was predicted by four out of six studies17,60,69,71, thereby providing a mechanistic basis for its predicted efficacy. Methotrexate, which is indicated for rheumatoid arthritis, was also predicted by three studies17,68,69.

It is worth noting that several predicted compounds are currently used to treat critically ill COVID-19 patients. An example is dexamethasone (predicted by one signature-based60 and two network-based studies17,69), which was supported by the RECOVERY trial87. Hydrocortisone (predicted by three studies17,68,69) has also demonstrated efficacy for critically ill patients88. Dexamethasone and hydrocortisone are corticosteroids that act by modulating an overactive immune response, which is typically observed in severely ill COVID-19 patients.

Notably, drugs reaching advanced phases in clinical trials were not selected based on in silico predictions, but were repurposed based on clinical experience with the previous SARS or MERS outbreaks89 and selected based on known effects in alleviating disease symptoms. Furthermore, the predictions were not followed-up by experimental validation in the majority of the studies reviewed. This translational gap between computational efforts for drug repurposing and clinical application is a major and widely recognized bottleneck in drug repurposing and medicine in general. Results from systematic validation efforts will also be important for identifying the algorithms and datasets that are specifically suitable for drug repurposing in the COVID-19 context. Given the urgency of identifying effective therapies in a pandemic, close collaboration between clinicians, experimental biologists and computational biologists is expected to address this gap.

A unified drug repurposing strategy

Although overlaps between computationally predicted drug repurposing and clinical trials exist, there are no indications that clinical trials were conducted based on computational predictions, despite their promising potential. For future pandemics, computational tools should be able to deliver promising sets of candidates, which could then be validated in trials or screenings. Therefore, a unified strategy is necessary. In the following, we identify important issues and discuss potential solutions to make computational drug repurposing more effective.

Availability of standardized data

Newly developed methods often rely on the same data types (Fig. 3a). The fast generation of different kinds of data in future disease outbreaks is a key initial step. Notable examples are the interaction data from Gordon et al.11 and the publication of the 3CLpro90 structure, which were both used by many subsequent studies. However, experimental replication of datasets obtained from different laboratories and the integration of different data types are crucial to increase robustness and require improvement.

Fig. 3: Proposed elements of a unified drug repurposing strategy.
figure 3

a, Availability of standardized data. b, Accessible workflows for computational predictions. c, Combination of predictions from different methods. d, Feedback from clinical experts of drug candidate sets and screening parameters. e, Validation of predicted drugs with different approaches.

Tool accessibility

Despite the large variety of computational tools and software, it has so far been of limited practical use to clinical researchers during the COVID-19 pandemic (Fig. 3b). For virus-targeting therapies, docking pipelines remain stable and a large amount of software has been developed; however, their corresponding outputs showed wide variability depending on the algorithm used, lowering comparability (standardization problem). For host-targeting therapies, the in silico pipelines are more methodologically diverse and several strategies were developed to target specific biological contexts. However, the general availability of computational tools and software in the context of the COVID-19 pandemic has been highly limited. Tool accessibility allows researchers to run custom analyses using the developed algorithms (for example, on newly available data). This will help non-computational scientists to use these tools and continue with validation routines, avoiding many preprint manuscripts that are never validated and consequently accelerating research.

Consolidation of predictions

Results from different approaches were not entirely integrated. In structure-based repurposing approaches, candidate drugs obtained from different docking tools or homology modeling methods could be consolidated to provide an ensemble of repurposable drugs (Fig. 3b). For host-targeting therapies, one study used rank aggregation to integrate results from different algorithms69. Another study derived the final predictions by combining the output of their model with results from gene set enrichment and expert knowledge68. While it should be noted that the drugs in clinical trials were used to develop the methods, these two studies predicted the highest proportion of overlaps with drugs being tested in clinical trials. The latter shows the potential of ensemble approaches, which are well known to output more robust results91,92. Consolidation of multiple approaches could significantly increase confidence for repurposing candidates and guide clinical researchers through the drug selection process. This requires a streamlined solution, considering tool accessibility and standardization, as in a standardized database that stores drug candidate predictions enabling meta-analyses.

Combinatorial treatment development

Computationally identifying synergistic drug combinations is an underexplored domain which could provide highly valuable information to augment clinical decision-making, since they have been demonstrated to be more effective than finding monotherapies91,92 (Fig. 3c). So far, targeting of viral and host proteins has been performed independently. There is a lack of methods aiming to find complementary drug groups while considering side effects. Combining drugs from both virus- and host-targeting categories is a promising strategy that acts by blocking the viral and host molecular machinery required for SARS-CoV-2 entry into cells and disrupting the host pathways involved in disease progression in combination with inhibitors for viral replication. While thousands of compounds can be evaluated in vitro90, combinatorial validations are considerably more challenging. Predicted combinatorial treatments could drastically reduce the search space for subsequent in vitro validation. Existing screening databases such as the NIH OpenData portal93 or the ReFRAME library94 have been sparsely used, but their potential has not been exhausted. By extending them with in silico predictions, they could link in silico and in vitro research, and help identify promising combinatorial treatments. Furthermore, screening results help verify computational predictions. Especially for docking simulations, model predictions and parameters can be easily released in a standardized format, which can be evaluated by experimental researchers. For host-targeting therapies, the study of Zhou et al.68 is an example of a combinatorial approach. Furthermore, several trials are registered for combination therapy that include candidate drugs from both categories; of these, ten drugs were included in the predictions from the reviewed studies (Supplementary Table 10). However, these drugs are either in the recruitment phase or limited results were reported; thus, data regarding their effectiveness has been inconclusive.

Expert knowledge

Limited understanding of the complex biological mechanisms underlying COVID-19 has required expert knowledge or manual curation in certain stages of the workflow, either at protein or pathway selection or at filtering of drug predictions (Fig. 3d). Expert vetting is mainly intended to uncover inconsistent or contradictory results while still allowing the identification of new predictions and can be crucial for filtering candidate drug lists for possible adverse side effects. To illustrate this, the antimalarial drug (hydroxy)chloroquine raised concerns regarding its potential toxicity. Chlorhexidine was found by a docking-based study40 as a potential drug targeting SARS-CoV-2 proteins; however, chlorhexidine is a widely used disinfectant whose mechanism of action is not SARS-CoV-2-specific and it is approved for topical or dental application only95. Consequently, the use of expert knowledge for careful evaluation of potential repurposable drugs would have been helpful to allocate limited experimental and computational resources on safe and effective drugs that have greater potential for widespread application. Close collaboration between computational and clinical researchers is therefore crucial, because computational approaches are still limited in side effect data and annotations for drug actions on the targets.

Validation strategies

Drug repurposing studies usually validate the computational models by constructing their own ‘ground truth’; these can include data from in vitro screening of predicted compounds, in vivo experiments using animal models, ongoing clinical trials, electronic health records, literature mining or expert knowledge96 (Fig. 3e). Thus, there is considerable heterogeneity in the sources of these standards, but efforts are ongoing to address this. For instance, newly released databases, such as the NIH’s OpenData portal93, collect and continuously update SARS-CoV-2 in vitro screening data for thousands of compounds and other SARS-CoV-2-related assays. We encourage future studies to utilize such resources for further validation or filtering of in silico predictions. However, except for one study,69 no direct follow-up experimental validation has been performed in the drug repurposing efforts for COVID-19. In the reviewed studies, validation was implemented through several strategies. Some studies performed signature matching of drug profiles or gene set enrichment analysis17 to provide evidence of the potential effectiveness69,72. Others evaluated the performance of their pipelines using the drugs undergoing clinical trials for COVID-1917,69 or experimental results from in vitro drug screening69. However, an extensive list of candidate drugs remains experimentally invalidated; thus, systematic validation of candidate drugs would be required to provide a landscape of the accuracy of methods. Since this is infeasible in practice, combining the predictions with expert knowledge becomes even more important.

The proposed strategy in this work has the potential to address the gaps of previous studies and is intended to serve as a guideline on computational drug repurposing to accelerate research, promote standardization, and react faster and more precisely in the case of future pandemics.