Background & Summary

Coronaviruses are enveloped viruses carrying 27 to 31 kb single-stranded positive-sense RNA genomes encoding structural and accessory proteins. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a respiratory infection that was first recorded in December 2019 and declared as a pandemic in March 2020. Since the start of 2020, there has been almost 5.5 M deaths worldwide with at least 307 Mio assessed infection cases1.

Repurposing of known drugs offers a cost and time-effective alternative to the classical antiviral drug discovery approach, which can take many years and involves high costs. Combining high-performance computing and experimental biology is a useful route for drug repurposing efforts in rare, orphan and challenging diseases2 as well as health emergencies3. The high number of repurposing-based clinical trials currently active worldwide are part of an unprecedented effort in fighting SARS-CoV-24. In line with these efforts, EXSCALATE4COV (E4C) is an EU H2020-funded emergency project which uses high-performance computing and screening infrastructures to identify clinical candidates and progress mono- or combination therapies against SARS-CoV-2 viral infections5. Already in the context of the SARS-CoV-2 pandemic, E4C’s supercomputing platform6,7 and high-throughput screening assays have been applied to identify modulators of viral protein function across a panel of targets8,9,10.

The dataset presented in this study was generated from the phenotypic screening of a large scale repurposing collections totalling ca. 8702 compounds (Fig. 1). The primary efficacy assay readout involved fluorescence imaging measurements of compound induced anti-SARS-CoV-2 cytopathic effects on EGFP expressing African green monkey kidney cells (VeroE6-EGFP)11. Following primary screening, hit profiling and cytotoxicity assessment, 110 compounds had IC50 values below 20 µM and 46 of these have a safety index (ratio of cytotoxic to anti-cytopathic effects) greater than 2 (Workflow shown in Fig. 2). Our results confirmed known, and identified novel repurposed candidates against SARS-CoV-2, including raloxifene (a marketed estrogen receptor antagonist) which, following orthogonal validation studies and independent confirmation by other groups12, has been progressed to clinical investigations against COVID-1913.

Fig. 1
figure 1

Comparison of the compound collection. Venn diagram showing overlap across the three compound collections in the screened set.

Fig. 2
figure 2

High throughput repurposing screening. (a) Workflow and number of compounds at each stage of screening cascade. (b) Z’ factor versus screening order for plates across the 3 experimental phases. All values Z’ > 0.5. (c) Duplicate compound potencies in hit profiling. IC50 values for compounds present in both the Fraunhofer (X axis) and Dompe or EU-OPENSCREEN (Y axis) collections (R2 = 0.81). (d) Distribution of the primary target of screened compounds (lhs) and 110 confirmed hit compounds (rhs). Explanation of keys: “Channel” gathers all cellular channels comprising metal channels and efflux pumps; “DNA-RNA” comprises all cellular DNA/RNA-dependent enzymes; “Enzymes” gathers a large set of metabolic enzymes involved in cellular metabolism/catabolism mechanisms like de-novo syntheses and/or oxidative or proteolytic processing of non-peptidic substrates; “GPCR” comprises G-Protein coupled receptors; “NHR” stands for Nuclear Hormone Receptors; “Proteases” is self-explaining and “Other” categorizes all the cellular proteins classes not previously listed (e.g. glycosylative enzymes, farnesyltransferase and similar).


Compound screening set

The screened set was composed of three compound libraries from different sources (Fig. 1). Firstly, the Fraunhofer Repurposing Library contains 5632 compounds including 3,400 compounds that have reached clinical use across 600 indications, as well as 1582 preclinical compounds with varying degrees of validation. This library was assembled by an external partner (SPECS; Netherlands) to mirror a set originally established by the Broad Institute. A curated database is available listing the compounds, indications, primary targets (where known) and mechanism of action, as well as analysis tools that can help to determine the mechanism of action and target. Secondly, the EU-OPENSCREEN Probe Library is a collection of 2500 compounds sampled from launched, clinical and preclinical studies aimed to cover multiple therapeutically relevant cellular mechanisms14. Finally, the Dompe’ Farmaceutici S.p.A. proprietary library is a collection of ca 700 candidate drugs which have completed at least phase I clinical trials. All compounds (Table 2) were quality controlled by liquid chromatography/ mass spectrometry (LC/MS) for purity and identity (minimum purity > 90%). The compounds were stored at a concentration of 10 mM in 100% DMSO at –20 °C.

Cell and virus culture

The African green monkey kidney cell line (Vero E6) was previously engineered to constitutively express GFP15. Cells were maintained in Dulbecco’s modified Eagle’s medium (DMEM; Gibco) supplemented with 10% v/v foetal calf serum (FCS; Biowest), 0.075% Sodium Bicarbonate (7.5% solution, Gibco) and 1x Pen-strep (Gibco) and kept under 5% CO2 on 37 °C. Assay medium contained 2% FCS. SARS-CoV-2 strain BetaCov/Belgium/GHB-03021/2020 recovered from a nasopharyngeal swab taken from an asymptomatic patient returning from Wuhan, China in the beginning of February 2020 was sequenced on a MinION platform (Oxford Nanopore). After serial passaging on Huh7 and Vero E6 cells, infectious content of the virus stock was determined by titration on Vero E6 cells using the Spearman-Kärber method. All virus-related work was carried out in certified, high-containment biosafety level-3 facilities in the Rega Institute at KU Leuven.

Primary anti-cytopathicity assay development: 384-well plates

The primary assay protocol was optimised with respect to cell seeding, viral MOI and pharmacological response. Cells were seeded at 4000 and 8000 cells/well in 384-well plates. The following day, cells were incubated with the control compounds and the virus at different MOI (0.01 and 0.001). A panel of control compounds was tested to evaluate the pharmacological performance of the assay. Compounds tested were: chloroquine, hydroxychloroquine, and loperamide at a starting concentration of 50 µM; lopinavir and remdevisir at a starting concentration of 100 µM. A dose response curve was achieved with serial dilution at seven different concentration points following a half-log dilution schema. Remdevisir was the most active reference compound under all experimental conditions16, (Fig. 3) and was selected as the positive control used at 20 µM final concentration in the primary assay. Positive control remdesivir showed IC50 comparable to literature10 as depicted in Fig. 4.

Fig. 3
figure 3

Control compound profiling. Five control compounds were tested: chloroquine (light-blue); hydroxychloroquine (orange bars); loperamide (grey); lopinavir (yellow) and remdesivir (blue). The IC50 was measured and reported on the y-axis at different days of incubation. The days of incubation (d) and the number of cells seeded (4000 or 8000) were reported on the x-axis. Data are reported as the mean.

Fig. 4
figure 4

Remdesivir curve fitting example. Performance of positive control. %Confluence versus compound concentration for Remdesivir, IC50 = 1.7 µM, in accord with literature10.

Hit identification and compound profiling

To measure inhibition of the SARS-CoV-2 cytopathic effect, 384-well imaging plates (Greiner #781092) were spotted with test compounds and controls (16 positive and 16 negative control wells per plate) using an acoustic dispenser (Echo, Labcyte) to yield 10 µM final test compound concentration at 0.1% vol/vol DMSO. For concentration response studies, eight concentrations of a semi-log dilution series from 33 µM to 10 nM at 0.33% DMSO were used. One day before infection (Day -1), test plates were equilibrated to room temperature and 30 µL of Vero E6 EGFP cells were added at 8,000 cells/well. On the day of infection (Day 0), plates were transported to the CAPS-IT robotic system for the addition of virus (MOI = 0.001) using a liquid handler (EVO 100, Tecan) to a final volume of 60 µl, and incubated at 37 °C, 5% CO2 for 4 days. Plates were then imaged on an Arrayscan XTI, (Thermofisher). Parallel assessments of the underlying cytotoxicities of the compounds were performed as described above in dose response studies, but without virus infection and using sodium-selenite (20 µM final) as the intra-plate positive cytotoxicity control.

Image acquisition and analysis

At day four post-infection, the GFP signal was captured using wide field fluorescence imaging by exciting at 485–20 nm and emitting with the BGRFRN filter set. A 5 X objective captured 80% of the well bottom area in the 384 plate. The optimal exposure time was determined based on fluorescence intensity and was set as 0.023 seconds. A 2 × 2 binning was used and autofocus plane count was reduced to increase image acquisition speed. An image analysis protocol was developed in-house by using the SpotDetector bioapplication (Cellomics, Thermofisher). After background reduction on the raw image files, a fixed fluorescent intensity threshold was determined for the identification of fluorescent cells and their morphologoical parameters were then determined. The two most relevant extracted parameters describing the anticytopathic effects of compounds cells were: i) the number of fluorescent cells in each well (NumberOfCells); and ii) the area of each well covered by fluorescence cells (CellAreaMean).

Test compound results were normalised relative to the corresponding intra-plate controls. For cytopathicity experiments the positive control (100% inhibition of virus induced cytopathicity) were 16 remdesivir (20 µM) containing wells per 384 well plate in column 24. For cytotoxity experiments the positive control (100% cytotoxicity) were 16 sodium-selenite (20 µM) containing wells in column 24. The negative controls (0% effect) for both cytotoxicity and anti-cytopathic experiments were 16 wells with DMSO (0.1% vol/vol) in column 23. The normalised value of “CellAreaMean” was termed “% Confluence” whilst the normalised value of “NumberOfCells” was termed “% Inhibition”. Although, it might be expected that number of cells in each well would scale linearly with the area of the well covered by the cells, compound exposure can also induce changes in cell morphology and dimensions due to poly-pharmacological effects unrelated to any anti-viral properties. Therefore, whilst both parameters are of interest and are made available for re-analysis in the raw data sets provided, the parameter “% Confluence” was used for the purposes of reporting compound effects in ChEMBL and in subsequent compound selection and prioritisation.

Large scale data analysis for Primary and Hit Profiling studies was performed in two ways. The first method used commercial software (ActivityBase, IDBS, Version in a procedure which was aligned with in-house data management policies. Hit profiling dose response data (% Confluence versus compound concentration) were fitted to 4-parameter logistic functions to give the IC50 for the anti-cytopathic effect or the CC50 for the cytoxicity effects in the absence of virus. Assay quality was assessed using the Z′ factor calculation (Fig. 2B) with Z’ factor > 0.5 as the threshold for assay acceptance17. The cytotoxity index was defined as CC50/IC50. A higher CI value indicates a wider window between the anti-cytopathic effects and possible underlying in-vitro toxicity. Individuals seeking to reanalyse the data may not have access to the ActivityBase software, therefore, a second method was established in the KNIME environment to calculate IC50 values from the raw data. The workflow is deposited in github repository and replicates the ActivityBase analysis18.

Data Records

The analysed data and raw data have been made available for download and reuse (Table 1). An overview of all analysed Primary Screening and Hit Profiling data in the ChEMBL repository can be found in the document report card CHEMBL4495565, whilst individual data sets are available for viewing and download from four assay report cards (Table 1). Raw data files for Primary and Hit Profiling have also been made available by the ChEMBL administrators on a FTP server and through the Figshare platform. The variables present in the tables of ChEMBL entries and the raw data files are defined in Table 2.

Table 1 Description and location of data records.
Table 2 List of files and variables contained46,56. 2a: Metadata information for ChEMBL document report card CHEMBL449556557. 2b: Data file for primary and hit profiling raw data (20201217_primary_PS_HP.xlsx), variables and descriptions. Table 2b contains the information of the Primary Screen sub-table49. A KNIME workflow provided in the code availability section generated the curve fit metrics17. 2c: Data file for primary and hit profiling raw data (20201217_primary_PS_HP.xlsx), variables and descriptions. Table 2c contains the information of the Hit Profiling sub-table49. 2d: Data file for primary and hit profiling raw data (20201217_primary_PS_HP.xlsx), variables and descriptions. Table 2d contains the information of the Hit Profiling Fit Results sub-table49. 2e: Data file for activity information (ACTIVITY.tsv), variables and descriptions49. 2f: Data file for assay information (ASSAY.tsv), variables and descriptions49. 2g: Data file for compound record lists (COMPOUND_RECORD.xlsx), variables and descriptions49. 2h: Data file for Dose_Response_raw_data (ACTIVITY_SUPP.tsv), variables and descriptions49. 2i: Data file for Activity map (ACTIVITY_SUPP.MAP.tsv), variables and descriptions49. 2j: Data file for assay reference text (REFERENCE.tsv), variables and descriptions49.

Technical Validation

Screening Assay

The primary screen resulted in 158 hits (% confluence > 14%) (CHEMBL4495565 – Fig. 2a). At all phases (Primary, Profiling and Cytotoxicity) the Z’ exceeded > 0.5, indicating acceptable assay quality (Fig. 2b). Sources of compounds from the three different libraries used in the screening campaign are shown in Table 3. Primary hit compounds were cherry-picked and progressed to dose response profiling and cytotoxicity assessment. In profiling studies, identical compounds present in different libraries (Fig. 1) showed consistent potency (Fig. 2c,R2 = 0.81), suggesting good assay reproducibility with respect to compound origin. Some 110 compounds gave IC50 values below 20 µM and were classified as confirmed hits. From the 110 compounds, 66 have a SI > 2 (Supp. Table S1 and Fig. 2a) and of this group 18 compounds are either marketed drugs or in advanced clinical trials (Fig. 2a). (CHEMBL449565). Fig. 2d shows the distribution of primary therapeutic targets annotated for the screened collection (lhs) and the 110 confirmed hits (rhs). No changes in the targets of the confirmed hits relative to those of the screened compound set were observed.

Table 3 Source of compounds from the three different libraries used in the screening campaign.

Assay properties and influence on hit compound identification

SARS-CoV infects and replicates more efficiently in some cell types, such as VeroE6, FRhK4, Caco2, LLCMK2, compared to Calu3 and Hek293T cells, while there is a very low efficiency of replication in U251 and MDCK cells under the same multiplicity of infection. Therefore, cell model selection in SARS-COV-2 phenotypic screening studies is important, and recent reports have shown differences in compound potencies depending which cell line has been used in the primary assay19,20,21,22,23,24,25,26,27. The cell line used in this study, Vero E-6 kidney epithelial cells from the African green monkey28, has been extensively used for SARS-CoV-like virus studies29,30,31,32,33. In those models, cell viability and virus titre were usually verified after 3–5 days post infection (p.i.)27,34 and in our assay the incubation of compounds for a period of 4 days p.i. resulted in a robust readout. Nevertheless, it has already been reported that for the same active drug, infection with different virus MOI may result in variable safety index values35, suggesting that cytotoxicity analyses in all studies could be somewhat limited. It should also be noted that the VeroE6-EGFP cell line used for this study is not sensitive to ACE inhibitors or some SARS-CoV active drugs such as ribavirin and glycyrrhizin36,37,38,39, which are the subject of ongoing clinical trials We cannot exclude, therefore, that alternative experimental setups may lead to partially different hit populations, as others have observed19. Nevertheless, the GFP reporter line presents an opportunity to perform fast, automated and homogeneous high-throughput screening, with a high signal-to-noise ratio and low variability and is fit-for-purpose for hit identification studies.

Within the confirmed compounds with IC50 < 20 µM, some 70% modulate the intracellular signalling pathways. Notable groups are the inhibitors of the Growth Factor Receptors (PDGFR) (like masitinib or tandutinib), Dihydrofolate Reductase (DHFR) (trimetrexate) and Estrogen Receptor Modulator (clomiphene and raloxifene). In addition, many show common protein targets such as the Phosphatidylinositol 3-Kinase (PI3K) (VPS34-IN-1) or mTOR protein (VE-822) two key elements of pro-survival signalling. Finally, where a therapeutic indication was annotated, the majority of the compounds were associated with cancer and anti-infective (antifungal and anti-malaria) therapy. These observations suggest that drugs associated with cell survival and growth may be an optimal choice for antiviral therapies for SARS-CoV-2, if adequate safety and exposure/efficacy can be achieved. Comparing the hit population in this dataset with reported studies show five compounds (amodiaquine, ciclesonide, eltrombopag, loperamide, niclosamide) that are also reported by Jeon et al., who screened 3000 FDA approved drugs against Vero cells15. Similarly, two (amodiaquine, chlorpromazine) overlap with a set of 20 inhibitors identified by Weston29, also in Vero cells. The common compounds have an antifungal, antimalarial or anticancer activity, suggesting again that drugs for these indications may contain the most promising antiviral compounds. Among compounds selected for further studies, raloxifene was prioritised as it has been found active in an independent phenotypic assays of SARS CoV-2 viral infections in VERO-E6 cells40 and against coronavirus OCR43 in LLC-MK2 cells41. This compound can inhibit RNA replication42 and related estrogenic receptor modulators have been found to be active against in-vitro viral infections43. Interestingly, there have been evidence from other datasets44 that several SERMS, either agonist or antagonists, are active at SARS-CoV 2 viral entry level and we are also exploring this and other possible mechanisms of action for this class of compounds.

Usage Notes

We suggest that other similar screening data sets are available from the Covid-19 portal45, the NCATS databases46 and a newly developed collection called “The COVID-19 Drug and Gene Set Library”47.