Cytopathic SARS-CoV-2 screening on VERO-E6 cells in a large-scale repurposing effort

Worldwide, there are intensive efforts to identify repurposed drugs as potential therapies against SARS-CoV-2 infection and the associated COVID-19 disease. To date, the anti-inflammatory drug dexamethasone and (to a lesser extent) the RNA-polymerase inhibitor remdesivir have been shown to be effective in reducing mortality and patient time to recovery, respectively, in patients. Here, we report the results of a phenotypic screening campaign within an EU-funded project (H2020-EXSCALATE4COV) aimed at extending the repertoire of anti-COVID therapeutics through repurposing of available compounds and highlighting compounds with new mechanisms of action against viral infection. We screened 8702 molecules from different repurposing libraries, to reveal 110 compounds with an anti-cytopathic IC50 < 20 µM. From this group, 18 with a safety index greater than 2 are also marketed drugs, making them suitable for further study as potential therapies against COVID-19. Our result supports the idea that a systematic approach to repurposing is a valid strategy to accelerate the necessary drug discovery process.

www.nature.com/scientificdata www.nature.com/scientificdata/ Duplicate compound potencies in hit profiling. IC 50 values for compounds present in both the Fraunhofer (X axis) and Dompe or EU-OPENSCREEN (Y axis) collections (R 2 = 0.81). (d) Distribution of the primary target of screened compounds (lhs) and 110 confirmed hit compounds (rhs). Explanation of keys: "Channel" gathers all cellular channels comprising metal channels and efflux pumps; "DNA-RNA" comprises all cellular DNA/ RNA-dependent enzymes; "Enzymes" gathers a large set of metabolic enzymes involved in cellular metabolism/ catabolism mechanisms like de-novo syntheses and/or oxidative or proteolytic processing of non-peptidic substrates; "GPCR" comprises G-Protein coupled receptors; "NHR" stands for Nuclear Hormone Receptors; "Proteases" is self-explaining and "Other" categorizes all the cellular proteins classes not previously listed (e.g. glycosylative enzymes, farnesyltransferase and similar).

Description Reference
ChEMBL Document Report Card for the complete study Data available 48 .
Single concentration primary screen for anti-cytopathic effect of compounds (Confluence, %) in ChEMBL DB Data available 49 .
Hit profiling results for compound anti-cytopathic effect (IC 50 ) in ChEMBL DB Data available 48 .
Hit profiling results for compound cytotoxic effect (CC 50 ) in ChEMBL DB Data available 50 .
Derived cytotoxity index results (CI) in ChEMBL DB Data available 51 .
Data from the primary screen and hit profiling deposited on CHEMBL FTP server All screening results in separate files 52 .

Value various Result value
Workflow dimensionless Identifier of workflow used to generate result. It can be "KNIME" or "ActivityBase" Unit dimensionless Identifier of result unit. It can be "micromolar", "percent", "dimensionless"

ACT_ID dimensionless
Activity index text collating type of activity measured and progressive index to compound (ACT_INH_001 = primary screen inhibition at fixed concentration for compound 1) www.nature.com/scientificdata www.nature.com/scientificdata/ performance of the assay. Compounds tested were: chloroquine, hydroxychloroquine, and loperamide at a starting concentration of 50 µM; lopinavir and remdevisir at a starting concentration of 100 µM. A dose response curve was achieved with serial dilution at seven different concentration points following a half-log dilution schema. Remdevisir was the most active reference compound under all experimental conditions 16 , (Fig. 3) and was selected as the positive control used at 20 µM final concentration in the primary assay. Positive control remdesivir showed IC50 comparable to literature 10 as depicted in Fig. 4.

Hit identification and compound profiling.
To measure inhibition of the SARS-CoV-2 cytopathic effect, 384-well imaging plates (Greiner #781092) were spotted with test compounds and controls (16 positive and 16 negative control wells per plate) using an acoustic dispenser (Echo, Labcyte) to yield 10 µM final test compound concentration at 0.1% vol/vol DMSO. For concentration response studies, eight concentrations of a semi-log dilution series from 33 µM to 10 nM at 0.33% DMSO were used. One day before infection (Day -1), test plates were equilibrated to room temperature and 30 µL of Vero E6 EGFP cells were added at 8,000 cells/well. On the day of infection (Day 0), plates were transported to the CAPS-IT robotic system for the addition of virus (MOI = 0.001) using a liquid handler (EVO 100, Tecan) to a final volume of 60 µl, and incubated at 37 °C, 5% CO2 for 4 days. Plates were then imaged on an Arrayscan XTI, (Thermofisher). Parallel assessments of the underlying   Table 2b contains the information of the Primary Screen sub-table 49 . A KNIME workflow provided in the code availability section generated the curve fit metrics 17 . 2c: Data file for primary and hit profiling raw data (20201217_primary_PS_HP.xlsx), variables and descriptions. www.nature.com/scientificdata www.nature.com/scientificdata/ cytotoxicities of the compounds were performed as described above in dose response studies, but without virus infection and using sodium-selenite (20 µM final) as the intra-plate positive cytotoxicity control.
Image acquisition and analysis. At day four post-infection, the GFP signal was captured using wide field fluorescence imaging by exciting at 485-20 nm and emitting with the BGRFRN filter set. A 5 X objective captured 80% of the well bottom area in the 384 plate. The optimal exposure time was determined based on fluorescence intensity and was set as 0.023 seconds. A 2 × 2 binning was used and autofocus plane count was reduced to increase image acquisition speed. An image analysis protocol was developed in-house by using the SpotDetector bioapplication (Cellomics, Thermofisher). After background reduction on the raw image files, a fixed fluorescent intensity threshold was determined for the identification of fluorescent cells and their morphologoical parameters were then determined. The two most relevant extracted parameters describing the anticytopathic effects of compounds cells were: i) the number of fluorescent cells in each well (NumberOfCells); and ii) the area of each well covered by fluorescence cells (CellAreaMean).
Test compound results were normalised relative to the corresponding intra-plate controls. For cytopathicity experiments the positive control (100% inhibition of virus induced cytopathicity) were 16 remdesivir (20 µM)  www.nature.com/scientificdata www.nature.com/scientificdata/ containing wells per 384 well plate in column 24. For cytotoxity experiments the positive control (100% cytotoxicity) were 16 sodium-selenite (20 µM) containing wells in column 24. The negative controls (0% effect) for both cytotoxicity and anti-cytopathic experiments were 16 wells with DMSO (0.1% vol/vol) in column 23. The normalised value of "CellAreaMean" was termed "% Confluence" whilst the normalised value of "NumberOfCells" was termed "% Inhibition". Although, it might be expected that number of cells in each well would scale linearly with the area of the well covered by the cells, compound exposure can also induce changes in cell morphology and dimensions due to poly-pharmacological effects unrelated to any anti-viral properties. Therefore, whilst both parameters are of interest and are made available for re-analysis in the raw data sets provided, the parameter "% Confluence" was used for the purposes of reporting compound effects in ChEMBL and in subsequent compound selection and prioritisation.
Large scale data analysis for Primary and Hit Profiling studies was performed in two ways. The first method used commercial software (ActivityBase, IDBS, Version 8.0.5.4) in a procedure which was aligned with in-house data management policies. Hit profiling dose response data (% Confluence versus compound concentration) were fitted to 4-parameter logistic functions to give the IC 50 for the anti-cytopathic effect or the CC 50 for the cytoxicity effects in the absence of virus. Assay quality was assessed using the Z′ factor calculation ( Fig. 2B) with Z' factor > 0.5 as the threshold for assay acceptance 17 . The cytotoxity index was defined as CC 50 /IC 50. A higher CI value indicates a wider window between the anti-cytopathic effects and possible underlying in-vitro toxicity. Individuals seeking to reanalyse the data may not have access to the ActivityBase software, therefore, a second method was established in the KNIME environment to calculate IC 50 values from the raw data. The workflow is deposited in github repository and replicates the ActivityBase analysis 18 .

Data Records
The analysed data and raw data have been made available for download and reuse (Table 1). An overview of all analysed Primary Screening and Hit Profiling data in the ChEMBL repository can be found in the document report card CHEMBL4495565, whilst individual data sets are available for viewing and download from four assay report cards (Table 1). Raw data files for Primary and Hit Profiling have also been made available by the ChEMBL administrators on a FTP server and through the Figshare platform. The variables present in the tables of ChEMBL entries and the raw data files are defined in Table 2.

Technical Validation
Screening Assay. The primary screen resulted in 158 hits (% confluence > 14%) (CHEMBL4495565 - Fig. 2a). At all phases (Primary, Profiling and Cytotoxicity) the Z' exceeded > 0.5, indicating acceptable assay quality (Fig. 2b). Sources of compounds from the three different libraries used in the screening campaign are shown in Table 3. Primary hit compounds were cherry-picked and progressed to dose response profiling and cytotoxicity assessment. In profiling studies, identical compounds present in different libraries ( Fig. 1) showed consistent potency (Fig. 2c,R 2 = 0.81), suggesting good assay reproducibility with respect to compound origin. Some 110 compounds gave IC 50 values below 20 µM and were classified as confirmed hits. From the 110 compounds, 66 have a SI > 2 (Supp. Table S1 and Fig. 2a) and of this group 18 compounds are either marketed drugs or in advanced clinical trials (Fig. 2a). (CHEMBL449565). Fig. 2d shows the distribution of primary therapeutic targets annotated for the screened collection (lhs) and the 110 confirmed hits (rhs). No changes in the targets of the confirmed hits relative to those of the screened compound set were observed.

Assay properties and influence on hit compound identification. SARS-CoV infects and replicates
more efficiently in some cell types, such as VeroE6, FRhK4, Caco2, LLCMK2, compared to Calu3 and Hek293T cells, while there is a very low efficiency of replication in U251 and MDCK cells under the same multiplicity of infection. Therefore, cell model selection in SARS-COV-2 phenotypic screening studies is important, and recent reports have shown differences in compound potencies depending which cell line has been used in the primary assay [19][20][21][22][23][24][25][26][27] . The cell line used in this study, Vero E-6 kidney epithelial cells from the African green monkey 28 , has been extensively used for SARS-CoV-like virus studies [29][30][31][32][33] . In those models, cell viability and virus titre were usually verified after 3-5 days post infection (p.i.) 27,34 and in our assay the incubation of compounds for a period of 4 days p.i. resulted in a robust readout. Nevertheless, it has already been reported that for the same active drug, infection with different virus MOI may result in variable safety index values 35 , suggesting that cytotoxicity analyses in all studies could be somewhat limited. It should also be noted that the VeroE6-EGFP cell line used for this study is not sensitive to ACE inhibitors or some SARS-CoV active drugs such as ribavirin and glycyrrhizin [36][37][38][39] , which are the subject of ongoing clinical trials We cannot exclude, therefore, that alternative experimental setups may lead to partially different hit populations, as others have observed 19 . Nevertheless, the GFP reporter line presents an opportunity to perform fast, automated and homogeneous high-throughput screening, with a high signal-to-noise ratio and low variability and is fit-for-purpose for hit identification studies.
Within the confirmed compounds with IC 50 < 20 µM, some 70% modulate the intracellular signalling pathways. Notable groups are the inhibitors of the Growth Factor Receptors (PDGFR) (like masitinib or tandutinib),  www.nature.com/scientificdata www.nature.com/scientificdata/ Dihydrofolate Reductase (DHFR) (trimetrexate) and Estrogen Receptor Modulator (clomiphene and raloxifene). In addition, many show common protein targets such as the Phosphatidylinositol 3-Kinase (PI3K) (VPS34-IN-1) or mTOR protein (VE-822) two key elements of pro-survival signalling. Finally, where a therapeutic indication was annotated, the majority of the compounds were associated with cancer and anti-infective (antifungal and anti-malaria) therapy. These observations suggest that drugs associated with cell survival and growth may be an optimal choice for antiviral therapies for SARS-CoV-2, if adequate safety and exposure/efficacy can be achieved. Comparing the hit population in this dataset with reported studies show five compounds (amodiaquine, ciclesonide, eltrombopag, loperamide, niclosamide) that are also reported by Jeon et al., who screened 3000 FDA approved drugs against Vero cells 15 . Similarly, two (amodiaquine, chlorpromazine) overlap with a set of 20 inhibitors identified by Weston 29 , also in Vero cells. The common compounds have an antifungal, antimalarial or anticancer activity, suggesting again that drugs for these indications may contain the most promising antiviral compounds. Among compounds selected for further studies, raloxifene was prioritised as it has been found active in an independent phenotypic assays of SARS CoV-2 viral infections in VERO-E6 cells 40 and against coronavirus OCR43 in LLC-MK2 cells 41 . This compound can inhibit RNA replication 42 and related estrogenic receptor modulators have been found to be active against in-vitro viral infections 43 . Interestingly, there have been evidence from other datasets 44 that several SERMS, either agonist or antagonists, are active at SARS-CoV 2 viral entry level and we are also exploring this and other possible mechanisms of action for this class of compounds.

Usage Notes
We suggest that other similar screening data sets are available from the Covid-19 portal 45 , the NCATS databases 46 and a newly developed collection called "The COVID-19 Drug and Gene Set Library" 47 .

Code availability
We developed a more comprehensive open-source KNIME workflow for the curve fitting calculations 17 .