A normalized drug response metric improves accuracy and consistency of drug sensitivity quantification in cell-based screening

Accurate and robust quantification of drug effects is crucial for identifying pharmaceutically actionable cancer vulnerabilities. Current cell viability-based measurements often lead to biased response estimates due to varying cell growth rates and experimental artifacts, including background noise and cell seeding discrepancies, that explain part of the inconsistency in high-throughput screening results. To address these limitations, we developed an improved drug scoring model, normalized drug response (NDR), which accounts for differences in cell growth rates and experimental noise, and considers both positive and negative control conditions to characterize drug-induced effects. We demonstrate an improved performance of NDR compared to existing metrics in assessing drug responses of cancer cells in various culture models. Notably, NDR reliably differentiates a wider spectrum of drug behavior, including lethal, growth-inhibitory and growth-stimulatory modes, based on a single viability readout. The method will therefore substantially reduce the time and resources required in cell-based drug sensitivity screening.


INTRODUCTION
Cell-based compound profiling plays an important role both in basic biomedical research and in drug discovery. The availability of a wide range of approved and investigational compounds provides an exciting opportunity for systematic drug positioning and repurposing applications, where cellular screening based on phenotypic readouts have become crucial in establishing novel therapeutic strategies against cancers [1][2][3][4] . Quantitative assessment of drug efficacies in such largescale screening efforts is often based on dose-response measurement datasets, where hundreds or thousands of compounds are profiled at several concentrations in a cohort of cancer samples or cell types.
Single parameters or summary metrics based on the end-point dose-response curves are commonly being used to score drug responses in high-throughput studies 2,5-8 . However, due to their dependence on the end-point measurement, these metrics are bound to have systematic differences when applied to different cell types. For instance, fast-growing cells exhibit different response patterns than slow-growing ones, and this difference may be driven by the cell state bias rather than the actual selective drug response. In addition, variations in culture conditions and seeding density also contribute to differences in drug sensitivity measurements 9 .
In the seminal NCI-60 tumor cell line screening project 10,11 , multiple parameters, such as halfgrowth inhibition (GI50), total growth inhibition (TGI), and half-lethal concentration (LC50), have been applied to control for the varying growth rates of cells under normal conditions. Recently, a growth rate-based metric (GR) was developed to take into account the variable rate of dividing cells 12 . These approaches are solely based on absorbance/fluorescence differences between drugtreated wells and negative controls, whereas they neglect the information about background noise that can be extracted from positive control.
Variability in background noise typically occurs due to artifacts in the assay or differences in the measurement system, in addition to the seeding differences, signal bleed-through, or other experimental factors, and therefore needs to be considered for accurate and consistent drug effect scoring. The normalized percent inhibition (PI) metric uses end-point readouts of the positive control as a proxy for such background noise to quantify the variability between measurements 2,13,14 . This metric, however, does not model the dynamic changes that occur from the start of an experiment after treatment. Along with the drug-treated condition and negative control, the positive control readouts can also vary over time and across experimental conditions, which might partly explain the inconsistencies observed in large-scale drug response profiling 8,9,15,16 . Therefore, there is a need for a quantification model that normalizes for the effects of background noise that may vary between measurements and includes model parameters that can be easily interpreted in terms of the experimental and biological factors.
To address these limitations, we devised an improved normalized drug response (NDR) metric that models the growth rates not only in the drug-treated cells but also in both negative and positive control conditions to capture a wide spectrum of drug effects. The metric makes use of both the start and end-point of a drug experiment to model the dynamics of experimental variability and background noise across various measurement setups. In this study, we show that, compared to the other metrics, NDR significantly improves the consistency across measurements and it reliably captures the different classes of drug behavior. Further, based on its improved drug-response curve fitting in various cell growth rates or tissue origins, NDR shows better reproducibility than the existing metrics. We further introduced a summary score (DSSNDR), and show how it improves the accuracy of drug effect classification. The application of this metric to classify drug-responses based on a single viability readout and using a relatively simple measurement setup should make it useful especially in large-scale drug screening efforts.

Development and benchmarking of the NDR metric
To tackle the experimental challenges posed by high throughput screening, including assaydependent background noise and uneven cell seeding, we devised the normalized drug response (NDR) metric, which is based on the differences in signals measured at the start and the endpoint of an experiment (Fig. 1a). The unique aspect of the NDR is that it models also the dynamic behavior of the positive control, which reflects the sources of experimental variability.
To systematically assess the performance of NDR, we simulated its outcomes under a fixed set of control conditions in various growth rates mimicking the drug-treated conditions (Fig. 1b).
More specifically, we first calculated fold changes for drug-treated and control conditions at a specific time point (here, 80 h), and then used these fold changes to calculate the NDR-based drug response estimates. We found that the NDR metric captures a wider spectrum of possible drug-induced effects, ranging from complete cell death to growth-stimulatory effect (different shades of gray in Fig. 1c).

Figure 1:
A schematic representation of the NDR metric in various drug-treated and control conditions, assuming that the negative control has no effect and the positive control is 100% lethal. (a) Dynamic change in readout under three simulated settings that reflect negative control condition (green), drugtreated condition (blue), and positive control condition (red). The expected positive control corresponds to the ideal scenario in which the readings of positive control stays at 0, whereas the observed positive control corresponds to the real scenarios in which the readings of positive control is often non-zero. Computation of NDR is demonstrated at a specific time point (t = 80 h). (b) Dynamic change in readout under simulation settings that reflect various drug-induced growth rates. The lower growth rates correspond to highly effective drugs or drug concentrations (dark shades of blue). Green and red traces show the readouts from negative and positive control conditions, respectively. (c) NDR metric computed for different drug-induced growth rates under fixed positive and negative control conditions. The spectrum of drug-induced effects as captured by the NDR is illustrated in different shades of gray.
To further investigate how the NDR metric performs under multiple experimental drug-treated conditions (Fig. 1b), the growth rate of negative control was kept constant while the positive control background was varied to mimic differences in measurement setups (Fig. 2a). For comparison, we also calculated the PI-based and GR-based responses. We found that the PI and NDR responses vary accordingly ( Fig. 2b and 2d), indicating that the same readouts in drugtreated condition can lead to different responses, depending on the readouts of the positive control. However, PI had narrower spectrum compared to that of NDR. On the other hand, the GR metric did not capture these changes in the positive control (Fig. 2c), hence ignoring an important aspect of drug profiling assays. To study the performance of the NDR in cells with distinct growth characteristics, we next kept the positive control background constant while the negative control values were altered to mimic differently growing cells (Fig. 2e). We found that the PI responses were very sensitive to such changes in negative control (Fig. 2f). In contrast, even though both the GR and NDR reasonably accounted for the changes in negative control ( Fig. 2g and 2h), NDR remained more stable, especially in the simulated slow growth conditions (Fig. 2h). In both simulated conditions, the NDR metric captured the wide spectrum of drug effects, even in cells with slower division time.
These improvements are due to its capability to take into account for the differences in the positive control.

NDR improves consistency in large-scale drug screening
To investigate the behavior of NDR in drug profiling experiments, we screened MCF-7 and MDA-MB-231 cells in two biological replicate experiments, each with two plates containing 131 oncology drugs in five different concentrations (Supplementary Table S1; see Methods). Since the preparation of single cell suspension for MCF-7 is technically challenging and often compromises its uniform seeding, there were marked differences in the distributions of luminescence intensity readings (RealTime-Glo, Promega) at the start of measurement both within and between the two biological replicate drug screens ( Supplementary Fig. 1).
The NDR, GR and PI-based responses were computed for all the wells across the four plates in both cell lines separately. Figure 3 shows the consistency of NDR between replicates for a plate containing the same drugs as an example in MCF-7 cells. To assess the consistency across replicates, we calculated the absolute difference between the response levels at the corresponding wells of each replicate. The distribution of such differences with the NDR metric is closer to zero compared to those of PI and GI ( Fig. 3b; p< 0.005, Wilcoxon rank sum test), implying its improved consistency over replicates. Finally, based on the Z'-factor 17 as a quality control measure (Fig. 3c), we conclude that the drug response quantification using NDR effectively reduces technical differences between the measurements, and thus improves the consistency over We next investigated the consistency of NDR for different cell seeding densities, using Mia-PaCa-2 cells seeded differently at the baseline (before treatments). Notably, we found that the NDR responses were more consistent between two drug profiling experiments, in which 250 and 750 cells were seeded per well at the beginning of the experiment (see Supplementary Fig. 3). To further investigate the behavior of NDR in drug profiling experiments at various end time points, we calculated the difference between the response levels in Pa02C cells screened against 131 oncology drugs at 4 different time points, namely 20h, 28h, 51h and 72h (see Methods). We found the distribution of NDR-based differences was closer to zero, compared to those of PI and GR metrics (p<0.005, Wilcoxon rank sum test; Supplementary Fig. 4), implying an improved consistency of NDR over multiple time points. A consistent replicate experiment is expected to result in an absolute difference close to zero. The NDR distribution is closer to zero compared to the PI and GR metrics. The NDR distribution differ significantly from the PI and GR distributions (p<0.005; Kolmogorov-Smirnov test of equality of distributions). (c) Z'-factor for each plate and each replicate experiment. A high-quality assay is expected have a Z'-factor above 0.5 (dotted horizontal line).

NDR captures both the toxicity and viability readouts
To further examine the broader behavior of the NDR in drug profiling experiment, we screened 4 additional cancer cell lines against the same set of 131 oncology drugs. The 3 breast cancer cell lines, MDA-MB-231, MDA-MB-361, and HDQ-P1, are known to have different metabolic activity that mimics their doubling times 7,18 . The pancreatic cancer cell line, MIA PaCa-2, was chosen to represent a different tissue type 19 . Furthermore, all these cell lines have been extensively profiled as disease models in chemo-sensitivity studies [20][21][22][23][24] , providing additional information to validate our findings. In agreement with the reported metabolic activity of these cell lines, we observed marked differences in the fold changes of the readouts in the control conditions ( Supplementary Fig. 5).
To validate the reliability of the drug response results, we also measured an independent cytotoxicity end-point readout (CellTox Green, Promega). For all the cell lines tested, NDR at each drug concentration decreased with the increasing toxicity readout for most of the drugs, suggesting that the NDR relates closely to the toxicity measurements. As expected, the average NDR-based viability was negatively correlated with the average toxicity readout (p < 0.005; Fig.   4a).
The results from 4 representative compounds with different mechanism of action and differential response across the 5 cell lines illustrate the resemblance of NDR with toxicity readouts (  Omacetaxine, a protein synthesis inhibitor, was toxic in all cell lines except for MCF-7. This selective behavior was missed by the PI-based readout. Furthermore, NDR was also able to capture the cytostatic behavior of omacetaxine against MCF-7 cells, which could neither be inferred from PI-based viability or toxicity measurements. Tipifarnib, a farnesyltransferase inhibitor, was largely non-toxic to all the cell lines, which was clearly reflected with the NDR but not the PI readout. Finally, pevonedistat, a NEDD8 activating enzyme inhibitor was cytotoxic only to the MDA-MB-361 cells, which was reflected in the NDR readout. In this case, NDR also revealed a growth-stimulatory (enhanced metabolic-activity) effect in the HDQP1 cell line. Based on these results, we conclude that NDR captures the strikingly different behavior of different drugs.

NDR improves drug response quantification in large scale screening
To quantify the drug response for each drug, we generated dose-response curves based on NDR at 5 concentrations using drc R-package 25 . Based on visual inspection of the dose-response curves, we found a consistent improvement in the curve fitting across all 5 cell lines when compared to the GR-and PI-based responses (exemplified in Fig. 5c). To quantify the curvefitting behavior, we calculated the root mean squared distance (RMSD) between the observed and estimated dose-response curves for all the drugs with non-zero response in at least 1 of the 5 concentrations (illustrated in Supplementary Fig. 7).
The average RMSD calculated using NDR was lower in the fast-growing cell lines (MIA PaCa-2, MDA-MB-231 and MCF-7), when compared to both the GR and PI-based responses (p < 0.005, Wilcoxon rank sum test; Fig. 5a). Notably, in the slow growing cell lines (HDQP1 and MDA-MB-361), the GR normalization led to increased RMSD values compared to those obtained from the NDR and PI. The simple PI performs better than NDR in the MDA-MB-361 cells, which is the slowest growing cell line among the cell lines tested here. This suggests that in case of slow growing or non-dividing cells (during experimental time), even PI provides adequate responses provided there is not considerable differences in cell seeding uniformity.
However, the PI normalization for these cells is bound to be less effective in detecting cytostatic effects (see Supplementary file 1).
In dose-response curves, the lowest drug concentration is usually expected to have minimal or no effect, and any deviation from this baseline behavior can eventually bias the drug sensitivity parameters, such as IC50 or EC50 values. We observed that the NDR responses at the lowest drug concentration consistently was closer to the negative control level when compared to the GR and PI-based responses. To quantify this, we computed the distance of lowest concentration response from negative control viability value (100 for PI, 1 for GR and NDR), termed as baseline distance (illustrated in Supplementary Fig. 7). The variability of the baseline distance with NDR in the fast-growing cell lines was significantly lower than that obtained using GR and PI (p < 0.005, F-test for difference in variances; Fig. 5b). In the slow-growing cells, GR led to an increased variance of the baseline distances when compared to NDR or PI (see Fig. 5c for representative examples). We note that there were also drugs with low baseline distance but high RMSD values, implying that the lower baseline distance does not necessarily result in a lower RMSD values (see Supplementary Fig. 8).
Similar NDR-driven improvements in the RMSD values and baseline distances were also found when analyzing dose-response curves of MDA-MB-231 in two external datasets, one from Cancer Therapeutics Response Portal (CTRPv2) 26,27 and the other from Genomics of Drug Sensitivity in Cancer (GDSC1000) 28 (Supplementary Fig. 9). Furthermore, the NDR metric improved the dose-response curve fittings of 131 drugs screened against freshly extracted mononuclear cells from the bone marrow of an AML patient ( Supplementary Fig. 10), demonstrating its benefits also for functional profiling-based precision medicine. Figure 5: Improved curve fitting with NDR. (a) RMSD error computed between the estimated and observed dose-response curves obtained using the PI, GR and NDR metric in the 5 cell lines. Only drugs that showed non-zero values at least in 1 of the 5 concentrations for all the metrics were considered. **p< 0.05 and ***p< 0.005; Wilcoxon rank sum test for the difference in location. (b) Baseline distance from zero response computed at the lowest drug concentration using the PI, GR and NDR metric in the 5 cell lines. **p< 0.05 and ***p< 0.005; F-test for the difference in variance. (c) Dose-response curves obtained using PI, GR and NDR metric for 4 representative drugs that show extreme differences in curve fittings. The representative drugs illustrate both the improvement in curve-fittings as well as decrease in baseline distances.

NDR-based DSS distinguishes a wide spectrum of drug effects
After confirming that the NDR enables reliable quantification of different drug effects, we computed the NDR-based DSS (DSSNDR; see Methods) that summarizes the dose-response relationships over the whole concentration range into a single response score. As expected, the distributions of DSSNDR showed selective efficacy of only a few drugs in a particular cell line ( Supplementary Fig. 11). We next investigated whether DSSNDR values could be used to uniquely identify the drug-class of all drugs in the screening panel.
As a ground-truth, we first classified the effects of the drugs into four groups, namely, lethal, sub-effective, non-effective and growth-stimulatory, based on their fold change of the viability readouts at their highest drug concentration in the 5 cell lines (see Fig. 6a for MDA-MB-361; Methods). The viability readout of this classification showed a good overall agreement with the independent toxicity readout (Fig. 6b). More specifically, at higher concentrations of lethal drugs, the viability readout dropped while the toxicity readout increased. This behavior of the two readouts was weaker for sub-effective drugs that comprise either less toxic (not lethal) or cytostatic drugs. As expected, the toxicity readout barely changed for non-effective drugs and growth-stimulatory drugs. The viability readout, on the other hand, changed negligibly in response to non-effective drugs, but increased at higher concentrations of growth-stimulatory drugs. We further confirmed that the lethal class included drugs that are known to be toxic in these cell lines 20,21,29 , for example, proteasome and HDAC inhibitors (see Supplementary Table   S2). Furthermore, most of the anti-mitotic and kinase inhibitors were classified as sub-effective (Supplementary Table S2).  ; Fig. 6c). The lethal drugs were well-separated from the sub-effective drugs based on their DSSNDR values. Furthermore, most of the non-effective drugs had a DSSNDR close to zero, whereas the growth-stimulatory drugs tend to have a negative DSSNDR. Similar distributions for the 4 drug classes were observed when we merged all the drugs across all the cell lines ( Supplementary Fig. 12a). When comparing the NDR-based findings with those computed using the GR and PI metric, we found that even though the distribution of DSS looks similar, the overlap between the distributions of adjacent drug classes is smallest when using NDR ( Supplementary Fig. 12). This suggests that it is possible to reliably infer the effect of a drug solely based on its DSSNDR value, reducing the requirement of other validation experiments.

DISCUSSION
In vitro cell-based drug screening is commonly carried out as an end-point cell count surrogate assay, and such end-point drug response profiling is being widely applied to quantify the sensitivity of drugs in cancer cell lines or patient-derived samples. In these screening efforts, robust dose-response curve fitting is pivotal for defining accurate drug vulnerabilities. Due to the experimental limitations and noise inherent to high throughput settings, however, it is often difficult to obtain smooth dose-response curves using the existing measures, which results in significant number of false positive and negative hits. The experimental errors typically originate from inconsistent seeding of cells, or their differing growth rates, as well as from different readouts, among other technical issues. Due to the scale of these profiling experiments and their running costs, it is undesirable and many times even impossible to repeat the whole experiment, hence calling for a response metric that effectively normalizes for such experimental errors and reduces the false hit rates.
In this study, we developed and carefully tested a novel NDR metric that reduces the effects of experimental inconsistencies, leads to more accurate dose-response curves, and therefore improves the reliability of drug profiling results. The metric is based on the comparison of the end point readout with that of the initial state of an experiment in the drug-treated condition, as well as taking into account both negative and positive control conditions. By means of systematic simulations, we first demonstrated how the NDR reliably captures the wide spectrum of drug responses under different control conditions. The other metrics, such as GR, do not account for the positive control condition, and therefore it fails to capture drug responses in slow-dividing cells. This is particularly relevant in experimental models based on primary cells or patient samples, which generally grow slower than established cell lines. In studying such responses, the NDR was able to capture the various drug effects more accurately, as demonstrated by the simulations (Figs. 1 and 2) and in a proof of concept experiment on AML patient sample ex vivo drug screening (Supplementary Fig. 10).
The availability of real-time viability measurement reagents made it possible to test the NDR metric in large-scale drug screening setups. MCF-7 replicate drug screening results suggested that NDR effectively reduces the experimental variability, and thus significantly improves the consistency between different measurements (Fig. 3). This will offer improved solutions to the ongoing debate on the inconsistencies in drug response profiling 8,15,16 . While the existing drug response calculations are prone to the variability between measurements, this improvement is likely to lead to more reliable comparison of drug response quantifications across different samples as well as across different measurement assays. The NDR might also become valuable in 3D-culture models or clonogenic drug screening assays, where uniform cell seeding is crucial.
The reliability of NDR responses computed for 131 drugs across 5 cell lines with different doubling times was further validated by a parallel cell toxicity screen (Fig. 4). The higher consistency of NDR over the other metrics was also evident in the improved drug-response curve fittings (Fig. 5). As error in a single data point of a dose-response curve fit can result in overestimation of drug response, these results demonstrate that the NDR metric does not only improve curve fitting and the baseline quantification of drug effects, but consequently also reduces false hit callings in large-scale screenings. These enhancements, which were also confirmed on the external CCLE and GDSC datasets, highlight its wide-applicability in various large-scale screening datasets. The improved results in slow dividing patient-derived primary cells further support the usage of NDR as an accurate metric in the emerging functional profiling-based personalized medicine applications.
Viability/metabolic-activity measurements are classically used to assess the drug effect in largescale screenings. Even though metabolic activity is considered as representative of the number of cells, reduction in viability does not always correspond to lethality 21 ; rather, it may instead represent cytostatic, or anti-metabolic effects. Pioneering the concept, we showed here that NDR-based DSS can be used to infer the drug behavior from a single viability measurement.
More specifically, we showed that based on DSSNDR values, one can reliably classify the drugs according to their biological effects: lethal, sub-effective, non-effective and growth-stimulatory ( Fig. 6). This has a significant impact on large-scale high throughput drug profiling efforts as it will notably reduce the cost and time of further validation for cytotoxicity. Moreover, detection of growth-stimulatory drugs is very important in precision medicine as it provides insights into the cellular mechanism of specific cells, tissues or diseases. As drug resistance against monotherapies has directed oncology research towards combinatorial approaches, identifying growth-stimulatory targets will be valuable for deciphering the disease specific resistance driving pathways, and thereby devising novel and effective drug combination strategies.
One of the main limitations of metabolic readout-based viability measurement is its inability to distinguish the concurrent cell growth and cell death since the estimated cell growth with metabolic readout is the sum of growing and dead cells. As a result, metrics implemented for high-throughput settings, such as NDR, capture only the beginning and end of a given treatment period, but not the complex treatment dynamics. This issue can be addressed utilizing time-lapse high-content image-based profiling techniques, such as drug-induced proliferation (DIP) metric 34 . Even though such imaging methods can accurately measure the drug-induced effects, however, their translation to high-throughput drug profiling setting still remains a major challenge because of need of continuous imaging. Furthermore, as the DIP approach involves genetically engineered fluorescently labeled cells, its applications to the primary cells or patient samples is not straightforward. More recently, a scalable time-lapse analysis of cell death kinetics (STACK) method was introduced to quantify the kinetics of compound-induced cell death at the cell population level 35 . However, this method is based on a single control only. In the future, it would be therefore interesting to combine the benefits of NDR with the STACK-based methodology.
Based on the present results, we conclude that NDR accurately portrays a widened spectrum of drug-induced effects, as well as results in improved consistency across different measurement systems in high-throughput drug profiling setting. The calculation of NDR requires only a minor modification in the widely-used experimental setups for high-throughput drug profiling, making the NDR-based drug response quantification broadly feasible and beneficial in a wide range of applications with cell-based chemical screening.

Cell lines
The cell lines used in this study were human breast cancer cell lines MDA-MB-231, MDA-MB-361, HDQ-P1, MCF-7 and pancreatic ductal adenocarcinoma MIA PaCa-2 (details in Supplementary Table S3). All breast cancer cell lines were purchased from ATCC and MIA PaCa-2 was a generous gift from Professor Channing Der (University of North Carolina at Chapel Hill, NC, USA). All cells were maintained in DMEM with 2.2 g/L NaHCO3 (Life Technologies) at 37°C with 5% CO2 in a humidified incubator, according to provider's instruction. Table S1 Based on the NDR values, the drug effects can be classified as (see Figure 1): where the fold change between the readouts at start and end of the measurement is given as:

DSS calculation
Drug sensitivity score (DSS) is a quantitative scoring approach based on the continuous model estimation and interpolation to effectively summarize the complex dose-response relationships 6 .
More specifically, for a normalized drug-response R(x) at concentration x, the integral response I over the dose range that exceeds a given minimum activity level (Amin) is calculated analytically as a continuous function of multiple parameters of the non-linear response model, including its slope at IC50 as well as the top and bottom asymptotes of the response (Rmax and Rmin). Formally, the DSS is computed as For the DSS-related analyses, we used the DSS R-package freely available at https://bitbucket.org/BhagwanYadav/drug-sensitivity-score-dss-calculation. As the input to DSS computation R-package, we scaled the metrics as: •'8/:5 = × 100 •'8/:5 = 0.5 × (1 − GR) × 100 •'8/:5 = 0.5 × (1 − NDR) × 100 To compute the negative DSS for the drugs that have negative responses R(x) in all the five concentrations tested, we flipped the responses, using 1-R(x) scaling, so that the fitting of the drug-response curves was effectively mirrored. After the DSS values were computed based on the mirrored drug-response curves, we set the DSS value to be negative.

Data analysis and statistical tests
All the data analysis and statistical test were performed in the R statistical programming environment (http://R-project.org). All raw data and summary results as well as R function to compute and reproduce the NDR calculation are available at: https://github.com/abishakGupta/NDR_results.

Statistical analysis
To evaluate the association between two response profiles, we used Pearson correlation coefficient 30 31 . To compute the overlap between two distributions, we used the overlapping coefficient 32 as a point estimate of the overlap between two normal densities.

Root mean squared distance (RMSD) calculation
To quantify the goodness of dose-response curve fits, we computed the root mean squared distance (RMSD) between the observed and estimated values of the response curves. We used the conventional formula of RMSD computation given as: where N is the number of concentration points, and Oi and Ei are the observed and estimated drug response values at concentration i, respectively.

Simulated drug response data
To systematically test the NDR metric performance in a fully-controlled ground-truth setup, we used simulated data of representative drugs, where the control conditions were varied at different realistic rates.
For the first simulation model, we set the growth rate of negative control to 0.03 h -1 , such that the doubling time was ~30 h and the change rate in positive control to -0.01 h -1 . We set the growth rate of representative drugs to lie in between these rates of the controls. We also added growth rates higher than those in the negative control (with doubling time of ~25 h) to emulate the growth stimulating effect. We then computed the NDR metric at a specific time point with ℎ 9:16£¤/ = 4 folds, ℎ ¦0•6£¤/ = 0.5 folds, and ℎ §¤¨1 = 0.5 to 8 folds.
For the second simulation model, with the same representative growth rates of drugs, we set the growth rate of negative control to 0.03 h -1 and let the growth rate of positive control to vary from -0.015 to -0.005 h -1 . We then computed the NDR metric at a specific time point with ℎ 9:16£¤/ = 4 folds, ℎ ¦0•6£¤/ = 0.4 to 0.8 folds, and ℎ §¤¨1 = 0.5 to 8 folds.
For the third theoretical model, with the same representative growth rates of drugs, we let the growth rate of negative control to vary from 0.01 to 0.055 h -1 and set the growth rate in positive control to -0.01 h -1 . We then computed the NDR metric at a specific time point with ℎ 9:16£¤/ = 2 to 15 folds, ℎ ¦0•6£¤/ = 0.5 folds, and ℎ §¤¨1 = 0.5 to 8 folds.

Drug classification
The 131 drugs used in the drug sensitivity and resistance testing (DSRT) assay were classified into four groups, based on the fold change of the viability readouts at the highest drug concentration from the start to the end-point of measurement. The first group of drugs included the ones with a fold change less than 1. The final readout for these drugs is smaller than the readout at start, and hence these drugs are labeled as "lethal". As a second group, the drugs with fold change above 1 and lower than 1 standard deviation (SD) on the lower side of growth rate in the negative control (DMSO) were labeled as "sub-effective" (Supplementary Fig. 12). This group of drugs is expected to comprise of cytostatic as well as less toxic drugs. The third set of drugs is labeled "non-effective", since their fold change was similar to the growth rate in the negative control condition. The final drug group consists of drugs that result in proliferation higher than in 1 SD on the higher side of the growth rate in the negative control, and are labelled as "growth-stimulatory".

NDR calculation on CCLE and GDSC datasets
To test the performance of NDR in independent datasets, we extracted two publicly available raw drug sensitivity screening data, namely Cancer Therapeutics Response Portal (CTRPv2) 26,27 from the Broad Institute and Genomics of Drug Sensitivity in Cancer (GDSC1000) 28,33 datasets from the Sanger Institute. We used MDA-MB-231 cell line data against all drugs and across all concentrations (9 concentrations in GDSC1000 and 16 in CTRPv2).
As measurements at the beginning of the experiments were not available in both datasets, we estimated the starting value based on the fold change (3.2) that was observed in our screens for MDA-MB-231 cells. As this fold change is also representative of the doubling time of MDA-MB-231 7 , we assumed that our estimated start data is close to reality. The estimated values were then used in the GR and NDR computation.

AUTHOR CONTRIBUTION
A.G. and P.G. conceived this study and wrote the manuscript. A.G. devised the NDR metric and performed the computational analyses. P.G. designed and performed the experiments. K.W. and T.A. supervised the work and critically reviewed and revised the manuscript.