SARS-CoV-2 CPE assay

Vero E6 cells (ATCC Vero E6 CRL-1586) previously selected for high angiotensin-converting enzyme 2 (ACE2) expression34 (grown in EMEM, 10% FBS and 1% penicillin–streptomycin) were cultured in T175 flasks and passaged at 95% confluency. Cells were washed once with PBS and dissociated from the flask using TrypLE. Cells were counted before seeding. A CPE assay previously used to measure antiviral effects against SARS-CoV35 was adapted for performance in 384-well plates to measure CPE of SARS CoV-2 with the following modifications. Cells, harvested and suspended at 160,000 cells per ml in MEM/1% PSG/1% HEPES supplemented 2% HI FBS, were batch inoculated with SARS CoV-2 (USA_WA1/2020) at a multiplicity of infection (MOI) of approximately 0.002, which resulted in approximately 5% cell viability 72 h after infection. Compound solutions in dimethyl sulfoxide (DMSO) were acoustically dispensed into assay-ready plates (ARPs) as three-point 1:5 titrations (or eight-point 1:3 titrations for confirmation screen). ARPs were stored at −20 °C and shipped to a Biosafety Level (BSL)-3 facility (Southern Research Institute) for CPE assay. The plates were transported to the BSL-3 facility, where ARPs were brought to room temperature and 5 µl of assay medium was dispensed to all wells. A 25-μl aliquot of virus-inoculated cells (4,000 Vero E6 cells per well) was then added to each well in columns 3–24. The wells in columns 23 and 24 contained virus-infected cells only (no compound treatment). A 25-μl aliquot of uninfected cells was added to columns 1 and 2 of each plate for the cell-only (no virus) controls. After incubating plates at 37 °C with 5% CO 2 and 90% humidity for 72 h, 30 μl of CellTiter-Glo (Promega) was added to each well. After incubation at room temperature for 10 min, the plates were sealed with a clear cover and surface decontaminated, and luminescence was read using a PerkinElmer Envision plate reader to measure cell viability.

NS1 TR-FRET assay

HEK293 cells were maintained in EMEM medium with 10% FBS and 1% penicillin–streptomycin (Gibco, cat. no. 15140–122). Cells were seeded at 1,000 cells per 3 µl per well in the white 1,536-well plate and incubated at 37 °C with 5% CO 2 overnight. Compounds in dilution were added to cells at 23 nl per well and incubated for 1 h, followed by addition of 2 µl per well of the prototypic ZIKV strain MR766 solution to cells (MOI = 0.5). After an incubation at 37 °C for 24 h, 2.5 µl per well of detection reagent mixture of two labeled anti-ZIKV NS1 antibodies was added to assay plates. Time-resolved fluorescence resonance energy transfer signals were measured using an Envision plate reader (PerkinElmer). Compounds were tested as seven-point 1:5 titrations in the primary screen and 11-point 1:3 titrations in triplicate in the confirmation screen. Data were normalized by using the control wells (without addition of ZIKV) as a negative control (0% NS1) and positive wells (with ZIKV) as 100% NS1 level.

ATP content assay for cell viability and compound cytotoxicity

Cells were seeded in 1,536-well assay plates and incubated for 16 h at 37 °C with 5% CO 2 . Test compounds dissolved in DMSO were added to assay plates at a volume of 23 nl per well by an automated pintool workstation (Wako Automation). Compounds were incubated with cells for 48 h at 37 °C with 5% CO 2 . ATPlite, the ATP monitoring reagent (PerkinElmer), was then transferred to the assay plates and incubated for 15 min at room temperature. The resulting luminescence was measured using the PHERAstar FSX plate reader (BMG LABTECH). Data were normalized using the wells without cells as a control for 100% cell killing, and cell-containing wells with DMSO control were used as full cell viability (0% cell killing).

EBOV-eGFP infection assay

As described previously15,20, Vero E6 cells were maintained in DMEM (HyClone) supplemented with 10% FBS (Sigma-Aldrich). The following Ebola virus was used: Ebola virus NML/H.sapiens-lab/COD/1976/Mayinga-eGFP-p3 (EBOV/May-eGFP) (derived from an Ebola virus, family Filoviridae, genus Ebolavirus, species Zaire ebolavirus, GenBank accession no. NC_002549). All work with infectious virus was performed in the BSL-4 facility at the National Microbiology Laboratory of the Public Health Agency of Canada in the Canadian Science Centre for Human and Animal Health. All procedures were conducted in accordance with international protocols appropriate for this level of biosafety. The toxicity of compounds was evaluated in Vero E6 cells by using the PrestoBlue cell viability reagent, which is a resazurin dye-based assay (Life Technologies). Cells were plated, allowed to adhere overnight and then treated with various compound concentrations for 2 h. Control cells received an equivalent volume of 10% DMSO only. PrestoBlue cell viability reagent was added according to the manufacturer’s protocol. Viability was determined by comparing fluorescence readings of treated cells to those of untreated controls.

3CLpro enzyme assay and counter-screen18

SARS-CoV-2 3CLpro, sensitive internally quenched fluorogenic substrate and assay buffer were obtained from BPS Bioscience. The enzyme was expressed in Escherichia coli expression system with a molecular weight of 34 kDa. The peptide substrate contains 14 amino sequence (KTSAVLQSGFRKME) with Dabcyl and Edans attached on its N- and C-termini, respectively. The reaction buffer is composed of 20 mM Tris-HCl (pH 7.3), 100 mM NaCl, 1 mM EDTA, 0.01% BSA and 1 mM DTT. The 3CLpro enzyme assay was carried out in 1,536-well black, medium-binding microplates (Greiner Bio-One), with a total volume of 4 µl that includes 2 µl of 2× enzyme (50 nM) in reaction buffer and 2 µl of 2× substrate (20 µM). The experiment was conducted at room temperature. In brief, 2 µl per well enzyme was first added into a 1,536-well plate. Compounds in DMSO were then transferred as 23 nl per well with an automated pintool workstation. The compounds and enzyme were incubated for 30 min at room temperature. Afterwards, 2 µl per well substrate was dispensed into the assay plate, followed by 1-h incubation for the enzyme reaction. The fluorescence intensity was measured on a PHERAstar FSX plate reader (BMG LABTECH) with excitation = 340 nm/emission = 460 nm. A counter-screen assay to eliminate the fluorescence-quenching compounds was carried out by dispensing 4 μl of substrate containing fluorescent Edans fragment, SGFRKME-Edans, into 1,536-well assay plates in the absence of enzyme. Compounds were pin transferred as 23 nl per well, and the fluorescence signal was read. Compounds were tested as 11-point 1:3 titrations in duplicate for both enzyme assay and counter-screen.

PP entry assay in 1,536-well format

Cell line and cell culture

HEK293 cell line with stable expression of human ACE2 (HEK293-ACE2) was generated by Codex BioSolutions36. In short, Expi293F cells (Thermo Fisher Scientific) were seeded into cells on a six-well plate with 70–80% confluency. For each well, the cells were transfected with 2.5 µg of pCMV_ACE2_IRES_puromycin plasmid (Codex BioSolutions) using Lipofectamine 3000 (Thermo Fisher Scientific). Twenty-four hours later, the cells were disassociated with trypsin and transferred into 100-mm dishes. The cells were selected with 1 µg ml−1 of puromycin for 2–3 weeks. Single colonies were picked into 24-well plates containing 1 ml of DMEM and 10% FBS supplemented with 1 µg ml−1 of puromycin. Western blot was performed to screen the ACE2 expression clones with an ACE2-specific antibody. The positive clones were further confirmed with SARS-CoV2-S PP entry assay.

PP generation

SARS-CoV2-S PP, VSV-G PP and delEnv (bald) PP were custom produced by Codex BioSolutions using previously reported methods using a murine leukemia virus pseudotyping system25,26. The SARS-CoV2-S construct with Wuhan-Hu-1 sequence (BEI no. NR-52420) was C-terminally truncated by 19 amino acids to reduce endoplasmic reticulum retention27 for pseudotyping.

PP entry assay

HEK293-ACE2 cells were seeded in white, solid-bottom, 1,536-well microplates (Greiner Bio-One) at 2,000 cells per well in 2 µl per well of medium and incubated at 37 °C with 5% CO 2 overnight (~16 h). Compounds were titrated 1:3 in DMSO and dispensed via pintool at 23 nl per well to assay plates. Cells were incubated with test articles for 1 h at 37 °C with 5% CO 2 , before 2 µl per well of PP was added. The plates were then spinoculated by centrifugation at 1,500 r.p.m. (453g) for 45 min and incubated for 48 h at 37 °C in 5% CO 2 to allow cell entry of PP and expression of luciferase reporter. After the incubation, the supernatant was removed with gentle centrifugation using a Blue Washer (BlueCatBio). Then, 4 µl per well of Bright-Glo Luciferase detection reagent (Promega) was added to assay plates and incubated for 5 min at room temperature. The luminescence signal was measured using a PHERAStar plate reader (BMG LABTECH). Compounds were tested as 11-point 1:3 titrations in duplicate. Data were normalized with wells containing PPs as 100% and wells containing control delEnv PP (no spike protein) as 0%.

GFP-LC3 high-content assay

As previously described28, GFP-LC3 mouse embryonic fibroblast cells (provided by Wen-Xing Ding from the University of Kansas Medical Center) were dispensed at 800 cells per 5 μl per well in 1,536-well tissue culture-treated black/clear-bottom, collagen-coated plates (Corning) using a Flying Reagent Dispenser (Aurora Discovery). The assay plates with cells were incubated at 37 °C with 5% CO 2 for 5 h, followed by the addition of 23 nl of compound or control, chloroquine diphosphate, into the assay wells using a Wako Pintool station. After 18-h incubation at 37 °C with 5% CO 2 , the cells were fixed with 4% (vol/vol) paraformaldehyde (EMS), and nuclei were stained with Hoechst 33342 (Invitrogen) for 30 min at room temperature. After washing twice with PBS using Blue Washer, the assay plates were imaged for GFP-LC3 puncta formation using an Operatta CLS (PerkinElmer) through ×20 objective in confocal format. EGFP channel (excitation 460–490 nm/emission 500–550 nm) and DAPI (excitation 355–385 nm/emission 430–500 nm) were used to measure the fluorescence intensities. Images were acquired from each well for one center field (around 25% of a single well area in a 1,536-well plate) and analyzed with Operetta Harmony 4.6 software. The compartment analysis algorithm was used to identify the nuclei, apply a cytoplasmic mask and quantitate GFP spots in the GFP channel. A nuclear mask was generated from DAPI-stained nuclei. Autophagosomal membrane-associated GFP-LC3 (puncta) was detected as GFP-fluorescent vesicular objects that exceeded a threshold defined by untreated cells and that were located exclusively in the cytoplasmic area. Data were expressed as three output parameters: % of positive cells, total spot area − mean per well, and relative spot intensity − mean per well. Compounds were tested as 11-point 1:3 titrations in triplicate.

In vitro assay and structure data

qHTS data generated on the NPC from the CPE assay (https://opendata.ncats.nih.gov/covid19/index.html), as well as compounds reported as active from recent anti-SARS-CoV-2 repurposing screens37,38,39 and drugs proposed by the scientific community as potential COVID-19 therapies40,41,42,43, were used to train the SARS-CoV-2 models. The detailed qHTS data analysis process, including data normalization, correction, classification of concentration response curves and activity assignment, was described previously44. Briefly, concentration response curves were fit to a four-parameter Hill equation yielding concentrations of IC 50 and maximal response (efficacy) values3,45. From the CPE assay, compounds that showed concentration-dependent response with >30% efficacy were considered active. Other compounds were considered inactive. Literature-reported anti-SARS-CoV-2 compounds were considered active.

qHTS data generated in house at NCATS were used to train the models for ZIKV NS1. NS1 activity data15 were generated in qHTS format on three bioactive collections: the LOPAC (1,280 compounds), the NPC (2,816 approved and investigational drugs)5 and the Mechanism Interrogation Plate (1,866 cancer drugs with known mechanism of action)46. Compounds that showed inhibition in both the ratio and 615-nm readouts were considered active. Compounds that were inactive in the ratio readout were considered inactive. Other compounds were considered inconclusive and excluded from modeling. A NCATS in-house collection, the Genesis library, of ~90,000 diverse compounds, was also screened for NS1 activity at a single concentration (14 µM). From these results, compounds that showed >30% inhibition in both the ratio and 615-nm readouts were considered active, and other compounds were considered inactive.

The activity data on ~2,600 drugs screened in an EBOV assay from a literature report were used to train the EBOV activity models16. These compounds were mapped to 2,065 unique compounds in the NCATS compound library. The anti-EBOV activities (active or inactive) of these compounds were assigned according to the literature report16. All compounds and their assay activities (1 = active, 0 = inactive) used to train the SARS-CoV-2, ZIKV NS1 and EBOV models are provided as Supplementary Data 6.

A subset of the compounds in the bioactive collections, NPC and LOPAC in particular, were screened in nearly all the assays available at NCATS. Two NCATS in-house diverse compound libraries—Sytravon, which contains ~44,000 compounds, and Genesis, which contains ~90,000 compounds—and a subset (~100,000 compounds) of the other NCATS bioactive libraries and a large diverse compound library (MLS) were also screened in subpanels of the NCATS assay portfolio. The bioactive compound activity profiles in the assays that also screened the Sytravon (130 readouts), Genesis (39 readouts) or MLS (225 readouts) library were used to train and test the activity-based models (BABM-S, BABM-G or BABM-M). Structure fingerprints were generated for all compounds using the ChemoTyper47 for the SBMs. Structure data on all the compounds with target activity data available were used to train and test the SBM. The compositions of these datasets are summarized in Supplementary Table 2, and the different types of models based on these datasets are summarized below and illustrated in Fig. 1c. The assay activity-based models (BABM-S, BABM-G and BABM-M) and the activity–structure combined models (CM-S, CM-G and CM-M) were applied to predict the target activity of the compounds with activity profiles available from the Sytravon/Genesis/MLS assays (Fig. 1). In the CMs, the activity profile and the structure fingerprint were concatenated to form a new fingerprint for each compound. The SBM was applied to predict the target activity of all ~600,000 compounds in the NCATS compound library. For activity-based models, only compounds that showed activity in at least 10% of the Sytravon, Genesis or MLS assay panel were kept for analyses. Here, the definition of ‘activeʼ is not as strict as what would normally be considered as a ‘hitʼ for lead identification. Any type of concentration-dependent activity observed, regardless of potency or efficacy, was labeled as ‘activeʼ. As such, compounds that showed activities in multiple assays are not compounds that were deemed ‘promiscuousʼ in the traditional sense.

Models built on different training datasets

Assay data source/model type Chemical structure Assay activity Activity and structure combined MLS SBM BABM-M CM-M Sytravon SBM BABM-S CM-S Genesis SBM BABM-G CM-G

Modeling

The WFS method previously developed at NCATS48 was applied to construct the models. Briefly, WFS is a two-step scoring algorithm. In the first step, a two-tailed Fisher’s exact test is used to determine the significance of enrichment for each feature in the active compounds compared to inactive compounds, and a P value is calculated for all the features present in the dataset. For structure data, the feature value was set to 1 for compounds containing that structural feature and to 0 for compounds that do not have that feature. For assay activity data, each assay readout was treated as a feature, and the feature value was set to 1 for ‘activeʼ compounds and to 0 for inactive compounds. If a feature is less frequent in the active compound set than the inactive compound set, then its P value is set to 1. These P values form what we call a ‘comprehensiveʼ feature fingerprint, which is then used to score each compound for its active potential according to Equation (1), where P i is the P value for feature i; C is the set of all features present in a compound; M is the set of features encoded in the ‘comprehensiveʼ feature fingerprint (that is, features present in at least one active compound); N is the number of features; and α is the weighting factor, which is set to 1 in all the models described here so that all assay features and structure features are treated equally. A high WFS score indicates a strong potential to be active.

$$\mathrm{WFS}=\frac{\sum \log ({P}_{i})}{\min (\log ({P}_{i}))\times (\alpha {N}_{C-M}+{N}_{M\cap C})}$$ (1)

For each model, compounds were randomly split into two groups of approximately equal sizes, one used for training and the other for testing. The randomization was conducted ten times to generate ten different training and test sets to evaluate the robustness of the models. Model performance was assessed by calculating the AUC-ROC, which is a plot of sensitivity [TP/(TP + FN)] versus 1 – specificity ([TN/(TN + FP)])49. A perfect model would have an AUC-ROC of 1, whereas an AUC-ROC of 0.5 indicates a random classifier. The random data split and model training and testing were repeated ten times, and the average AUC-ROC values were calculated for each model. For external experimental validation of models, model performance was measured by the PPV (PPV = TP/(TP + FP)). Statistical significance was determined by the two-tailed Fisher’s exact test comparing model PPV with the active rate in the training dataset for the corresponding target being modeled.

Selection of model predicted actives

Models with AUC-ROC > 0.75 were considered for compound selection. WFS score cutoff values for model-predicted actives were determined using the ROC curves where both sensitivity and specificity were optimized. Only compounds that scored higher than the cutoff values were considered candidates for follow-up selection. Owing to the limitations of different assays and resources, for each target we selected compounds with the largest possible structure diversity that could fit into one 1,536-well plate for experimental validation. When the candidate pool was much larger than the target number of compounds, the candidates were narrowed down based on structure type. For this purpose, the entire NCATS in-house compound library was clustered based on structure similarity (729-bit ChemoTyper47 fingerprints) using the self-organizing map algorithm50. From the clusters that contain model-predicted actives, a fraction of the active compounds was selected from each cluster based on the WFS score and the number of models that predicted the compound as active. Because the EBOV assay could test only ~100 compounds, the anti-EBOV candidates were manually inspected and narrowed down further based on literature reports, structure novelty and adsorption, distribution, metabolism and excretion properties. In most cases, the selection was driven by the availability of physical samples. All compounds that met the WFS score cutoff from a model were selected when fewer than 1,408 compounds had physical samples available for cherry-picking. The SARS-CoV-2 CPE assay (live virus) could be run in only 384-well format. Limited by the testing space available and physical sample availability, only 311 model-predicted compounds were selected for experimental confirmation in the SARS-CoV-2 live virus assay.

Statistical analysis and illustrations

Principal component analysis (PCA) was performed within R package version 3.4.3. The first three principal components (PCs)—PC1, PC2 and PC3—were calculated based on the 729-bit ChemoTyper fingerprints. Three-dimensional PCA plots were generated using the first three PCs in TIBCO Spotfire version 7.11.1. Concentration response curve plots were generated using Prism GraphPad 8, with IC 50 values calculated using a three-parameter logistic regression.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.