Introduction

Gene expression signatures are widely used in the field of translational medicine to define disease sub-types1, severity2 and predict treatment outcome3. Bridging this technology to early drug discovery was previously proposed years ago4,5 but its prohibitive costs limited this approach. The recent advancement of massively parallel gene expression technologies such as RASL-seq.6, DRUG-seq.7, QIAseq.8,9, PLATE-seq.10, or LINCS L100011 are now transforming the field of compound profiling, enabling larger scale profiling and screening experiments at a more affordable cost12,13,14,15,16,17.

In drug discovery, dose-response experiments enable researchers to compare the efficacy of various compounds to modulate biological processes of interest, finding doses for animal and human experiments and estimating windows to off-target and toxic effects. Multiple statistical methods are reported for the identification of individual genes with a dose dependent effect from dose-response gene expression data18,19,20,21,22,23. However, in the case of multivariate gene expression profiling there are no generally accepted methods to estimate the key pharmacological efficacy variables EC50 (compound concentration of half-maximal activating effect) and IC50 (compound concentration of half-maximal inhibitory effect) from multiparametric readouts.

Connectivity Map (CMap) established the concept that compounds with similar mode of actions (MOAs) are highly similar in their differential expression profiles over many genes4,11,24. We postulate that this concept can be applied for quantifying compound potencies based on compound/pathway specific gene expression signatures. This work aims at defining and comparing several multivariate statistical summaries to enable classical compound potency estimation. In this study, we focus mainly on methods measuring the similarity of gene-signature changes relative to a gene-signature induced by an active control compound, representing a defined phenotype of interest, e.g. a tool compound for a target or pathway of interest. The overall principal relies on assessing the similarity of a compound-induced gene-signature profile relative to the one generated by an active control compound; hence, the AC profile will anchor all other measurements in the form of a global reference.

The different similarity methods explored in this paper differ by their approach to assess either the direction of the effect (as example by the geometric angle (cosine) to the AC; referred as direction-based methods) and / or by how the magnitude of the effect is assessed (e.g. Euclidean distance to the NC, referred as magnitude-based methods). Combined, the two measures quantify the strength and direction of a phenotypic effect (see Fig. 1 and Table 1). Methods referred to as direction&magnitude-based combine both types of information into a single measure.

Figure 1
figure 1

Introduction to gene-signature quantification methods. (a) Within the manuscript, we consider methods measuring the similarity of gene-signature changes relative to an active control (AC) or a neutral control (NC). An AC signature is the gene expression signature which is representative of a phenotype of interest, typically induced by a compound or genetic treatment. The NC signature represents the background state without compound treatment. Different ACs (with different AC sigatures), e.g. representing different pathways might result in different EC50 or IC50 values for measured compounds. The effect of compounds on gene expression can be quantified with multiple approaches relative to AC and NC, e.g. quantifying the norm of the vector relative to NC (vector A = vec_norm in the manuscript), the effect size in the direction of AC ( | A | cos θ = scalar_projection_AC), or the angle between the compound vector and the AC vector(cos θ = cos_AC). The distance to the AC is another option (exemplified by dashed circle around AC). (b) Two main characteristics of signature similarity can be distinguished: similar changes in magnitude or similar changes in the direction of the gene expression. The magnitude can be interpreted biologically as the efficacy, while the direction emphasizes the direction of the change of the phenotype, e.g. different pathways might result in different directions of changes in gene expression. Note that (b) shows an effect in four-gene space and (a) only shows the effects in two-gene space for the sake of a better illustration of our concepts, even though all methods work on high dimensional spaces.

Table 1 Overview over gene-signature quantification methods.

For this study, two well-characterized biological pathways with multiple well-characterized ligands were selected: the beta-adrenergic receptor pathway for which we generated experimental biological data for this manuscript, and the EGFR pathway, which is publicly available through the LINCS L1000 project11. For the beta-adrenergic pathway we used cAMP EC50s as functional orthogonal readout25. For practical purposes, we had to measure a small set of biologically relevant genes, instead of the full transcriptome like in CMap. RNA-seq was used to determine a beta-adrenergic receptor specific gene-signature that was subsequently used to quantify compound potencies on the level of gene expression. The L1000 assay is a panel of ca. 1000 measured genes, which are used to infer the differential gene expression of a total of ca. 13k genes. This allowed us to benchmark our methods using all L1000 genes, and subsets thereof specific for EGFR signaling or cell proliferation. The IC50s calculated from gene expression were compared to compound potencies measuring the inhibition of cell growth rate (GR50)26. The two examples represent very different well understood biological systems with reference readouts upstream (cAMP) and downstream (EGFR) of the gene expression readouts and should therefore represent a good test case for gene expression potency measures.

Our results demonstrate that gene-signature-based compound EC50 and IC50 values estimated with multivariate gene-signatures are highly related to potencies inferred with relevant but independent reference readouts. Therefore, we expect that these methods will find a wide application in gene-signature based assays in the near future. All methods in Table 1 and an EC50 and IC50 fitting method are made available in the R-package mvAC50 on github [https://github.com/Novartis/mvAC50].

Results

Generation of the beta-adrenergic receptor dataset

Vitamin-D3 differentiated THP1 cells were chosen as an experimental model for its sensitivity to beta agonists over a large dynamic range of compound concentrations and the ease of measuring cAMP27. To identify a gene-signature for beta agonists, a series of RNA-seq experiments were performed on THP1 cells sampled at baseline and after four hours stimulation with adrenaline, noradrenaline or isoproterenol.

Genes differentially expressed over all three treatments were identified (absolute log fold change >2 and adjusted p-value <0.05), and prioritized for large fold change and high expression levels, for independent qPCR validation (Supplementary Fig. 1a). Our internal compound screening setup allows us to simultaneously multiplex the measurement of eight genes. Two independent sets of seven genes were defined from 14 qPCR validated genes (Supplementary Table 1, Supplementary Fig. 1b) with the eighth gene per set (TBP) serving as a baseline house keeper gene. For our analysis, we considered the two sets of genes as two independant signatures. Not all of these 14 identified genes produced a detectable signal in the QuantiGene Plex technology due to decreased sensitivity of this method compared to qPCR (Supplementary Fig. 1b). The two sets of genes contain respectively three (CD55, DOCK4, and NR4A1) and five genes (PDE4B, SGK1, THBS1, TOB1 and VEGFA) responding consistently (≥50% of technical replicates of NCs with mean rscore of genes >3 in both biological replicates) to 10 uM of isoproterenol.

Comparison of EC50s from single genes, gene-signatures, and cAMP

A total of 21 beta agonists (Supplementary Table 2) covering a wide range of potencies (<10 pM to ca. 5 uM), were chosen for this study. Other cAMP modulators were also included in this compound set: the histamine receptor H3 antagonist N-alpha-methylhistamine and the adenylyl cyclase activator forskolin. As additional control, we added the beta-1 antagonist CGP-20712A, which, as expected, failed to increase cAMP levels. All compounds were measured in dose-response mode in the cAMP assay and for both gene signatures. An overview of dose-response curves of the genes is shown in Supplementary Fig. 2. The gene-expression data, derived gene-signature scores, and fitted EC50s are presented in Supplementary Tables 3 and 4.

The relationship of EC50 values derived from genes and gene-signatures compared to cAMP-derived EC50s depends on the gene-signature methods used. Representative examples for method classes are shown in Fig. 2a. (all methods and genes are shown in Supplementary Fig. 3). The EC50s derived from direction-based methods cor_p_AC and cos_weight_AC are found almost entirely within a window of one log unit around the cAMP-derived EC50s, which is very close considering the different incubation times and the different locations of the readouts in the adrenergic signaling pathway (gene expression vs cAMP). In contrast, the EC50s derived from gene-signature methods containing magnitude information (scalar_projection_AC and vec_norm) and EC50s from the individual genes NR4A1 and THBS1 are almost all more than one log unit above the cAMP-derived EC50s. The ranking of cAMP potencies is not preserved as well (e.g. Spearman correlation for scalar_projection_AC to cAMP = 0.32). It is important to note that the correlation of gene signatures between compounds does not mean that they have similar EC50s, only that they have overlapping biology at some concentrations.

Figure 2
figure 2

Comparison of EC50s from gene-signatures, single-genes and cAMP for the beta agonists dataset. (a) Example of gene and gene-signature EC50s from representative methods compared to cAMP EC50s. Each point represents a compound. The dashed red lines indicates one log unit above and below the red line of equality. The shown gene and gene-signature EC50s are from signature one, except THBS1 from signature two. The shown data is from replicate two. Axes are log10 transformed. (b) Correlation of gene-signature and single-gene EC50s with cAMP EC50s. (c) PCA of the cAMP, gene, and gene-signature summary methods logged EC50s of all compounds in the dataset. Colors of (a, b, c) are according to the definition in (c). (d). Dose dependent change of the genes in the gene-signature (left panel, with y-axis values > 50 not shown, orange dashed lines at three rscores indicating significant changes from the background), compared with the dose dependent change in gene-signature summary score methods and cAMP for metaproterenol (right panel, boxes colored according concentrations shown in left panel, dashed grey line at 100% activity).

A performance overview of all genes and gene-signature methods is given in Fig. 2b. The similarity between gene or gene-signature derived EC50s with cAMP derived EC50s over all tested compounds is quantified by the Pearson correlation between logged EC50s. Most methods within one method-class performed equally well. While direction&magnitude and magnitude-based methods showed no significant difference to individual genes (TukeyHSD test with p-val <0.05), direction-based methods performed significantly better than the other methods with Pearson correlations ranging between 0.6 and 0.9. All other method classes showed mean Pearson correlations <0.5. The AC_similarity method performed significantly worse relative to others (only negative correlations).

The relationship between all gene-signature methods, single genes and cAMP EC50s is shown by a principle component analysis (PCA) projection of the dataset (Fig. 2c). Each data point represents the vector of logged EC50s calculated by one method (for one replicate and one gene-signature) of all compounds in the dataset, Methods generating similar EC50s are projected close to each other. The PCA projection confirms that direction methods cluster together with the cAMP EC50s, and all EC50s containing magnitude information cluster together (green dots are hard to see but cluster together with blue dots) with single gene EC50s. As mentioned above, the AC_similarity methods are outliers relative to the two major clusters.

Figure 2d visualizes the expression levels of the individual genes over compound concentrations (left panel) and the resulting dose-response curves of derived multivariate EC50 methods (right panel). Increasing concentrations of metaproterenol result in increasing expression of the genes of the gene-signature. While the shape of the gene-signature remains similar to the active control signature (isoproterenol [10 uM], red line), the magnitude of the metaproterenol signature exceeds the AC signature with increasing concentrations (left panel). The observed difference in gene expression magnitude between high concentrations of metaproterenol and the active control signature is only captured by metrics that make use of this information (Fig. 2d, right panel, green line). It is important to note that the difference between methods does not only lead to different maximal effect plateaus of the dose-response curve, but also to different EC50 values of the fitted curves.

The increase in gene expression beyond the active control also explains why AC_similarity methods cannot work in this scenario: the maximum similarity between compounds and AC signature is reached at identical magnitudes of both signatures. Both lower and larger magnitudes result in less similar signatures, resulting in bell shaped curves.

EGFR inhibitors dataset from L1000

For the L1000 EGFR (“Epidermal growth factor receptor”) inhibitor dataset, we selected a set of eight EGFR inhibitors measured in six-point dose-response in MCF10A cells after 3 h and 24 h incubation time11. As reference univariate readout, the corresponding growth rate inhibition GR50 measured after three days was used26. GR50 are the recommended potency measure for cell proliferation inhibition, as they are corrected for the background cell proliferation rate of a cell line26. As the LINC technology reported 12,717 genes, it was possible to test several gene-signatures: (1) a published EGFR signature28, and (2) a published cell proliferation gene-signature3, further referred to by the gene name “Targeting protein for Xklp2” (TPX2). As a third biologically unbiased gene-set, all genes from L1000 were used for comparison. We also investigated the performance of single gene measurements, for which we chose the 20 genes from each of the three signatures with the strongest response to the active control (gefitinib at 3.33 uM). All calculated IC50s are available in Supplementary Table 5.

Like with the beta agonist pathway data, gene-signature IC50s of the EGFR inhibitors corresponded well to the reference GR50s (Fig. 3a for representative readouts, all results in Supplementary Fig. 4-6). Results show a strong influence of the incubation time. At 24 h all shown gene-signature methods over all three gene-signatures resulted in IC50 vs GR50 correlations > = 0.88, except scalar_projection_AC and vec_norm with the TPX2 gene-signature resulting in slightly lower correlations each of 0.68. The individual single gene IC50s at 24 h incubation showed more variance, with Pearson correlations ranging from -0.36 with the TPX2 signature to 0.9 with the EGFR signature. The individual genes from the EGFR signature resulted in the highest median correlation of 0.88. Two very similar median correlations of 0.68 and 0.69 were found for the individual genes of the TPX2 signature and from all L1000 genes, confirming the lower biological relevance for the EGFR pathway of the latter signatures. Even though all three gene-signatures contained individual genes that correlated very well with the GR50s ( > 0.9), all of them also contained genes with correlations to GR50s < 0.5, few even around 0. It is not clear how one could reliably distinguish more relevant from less relevant genes in the absence of another orthogonal reference-readout like the GR50s.

Figure 3
figure 3

Comparison of gene and gene-signature IC50s to growth rate inhibition GR50s of EGFR inhibitors. (a) Pearson correlation of representative methods and 20 individual gene IC50s to GR50s. (b) Comparison of representative EGFR gene-signature IC50s in MCF10A vs. GR50s. The dashed red lines indicate one log unit above and below the red line of equality.

At 3 h incubation time, differences between methods and gene-signatures are more pronounced, showing highest correlations for direction-based methods with the EGFR signature (both above 0.75). Again individual genes show a wide distribution of results ranging from −0.38 for TPX2 to ca. 0.85 for all three gene-sets. Like with the beta agonists, the values of gene-signature IC50s are very close to the values from GR50s and more than 50% of the gene-signature IC50 values are within a one-log-unit window to the GR50s (Fig. 3b).

Discussion

The two main contributions of this work are: (1) the development and validation of an analytical framework for calculating compound potency based on multivariate readouts and (2) the provision of an open-source R-package to facilitate the application of our methods on new data by the scientific community.

With this work, we demonstrate that gene signature IC50s/EC50s are well correlated with compound potencies, both on the causative target (cAMP example) and downstream biological readouts (EGFR example). Therefore, we propose our method as a valid and novel alternative for the prioritization, optimization and development of novel drugs. We foresee our method to be impactful in situations where (1) causative targets are unknown and lead-optimization has to be done against a gene-signature phenotype, (2) in situations where gene-signature potencies are used as supportive information to main target potency assays (e.g. off-target/tox signals), or (3) in situations where the gene-signatures can be used as surrogate for an in vivo response.

The principal of this framework is to first summarize the information contained in multiple-genes into a single value and then pass it into a logistic function for potency estimation. The optimal metrics were selected based on their degree of concordance with compound potencies estimated with standard readouts (cAMP/GR50).

The fact that IC50/EC50 potency measurements are specific to a given biologic process (cAMP, gene expression, cell viability), and not a general property of the compound, is a potential challenge for comparing methods. However, choosing experimental models where gene expression is closely linked to pathway activation provides us confidence in our working model. The conservation of the compounds potency rank-order regardless of using gene expression or standard readouts supports our premise. Indeed, very close potency relationship (Pearson correlations up to 0.9) were observed for reference potency values (cAMP, GR50) upstream (cAMP in the EGFR pathway) and downstream (GR50 cell viability in the EGFR pathway) of the gene expression readout, and independent of very different compound incubation times of readouts. The assessments of optimal methods was not influenced by gene-signature composition. Indeed, all signatures used in this work were previously reported, or constructed independently of the screening datasets.

Of the five methodological classes of metrics: (1) direction-based, (2) distance based (magnitude) to the NC, (3) distance based (magnitude) to the AC, (4) magnitude and direction-based and (5) single genes, results show that magnitude-based methods to the AC clearly underperformed to other methods while direction-based methods performed consistently well in the two explored datasets. We did not find large differences in the performance of the methods within a single method class in these two datasets. Yet we recommend cos_weight_AC for direction-based methods due to its ability to down-weight signal with very small magnitude. To our surprise, adding information about the magnitude of the gene expression did not improve the results.

To this date, there is still very limited data available in the public domain that enables the comparison of multivariate EC50/IC50 with standard readouts, hence it is impossible to generalized current findings to future situations. Nonetheless, with the raise of novel sequencing methods that enable low to medium throughput compound screening based on hundreds to thousands of genes, the need for multivariate potency estimation will be strong.

Finally, our work enables the use of gene-signatures as screening readouts and biomarkers throughout all stages of research from early cell line experiments, to animal models and clinical studies. Using the same readout will in many cases contribute to increased biological relevancy at all stages of the drug discovery process. Similar multiplexed readouts like the data from cell painting or metabolomics29,30 might also benefit from our multiplexed potency methods.

The algorithms and datasets used in this publication are available in the R-package mvAC50 from https://github.com/Novartis/mvAC50.

Methods

THP1 cells

Human promonocytic THP-1 cells (TIB-202, ATCC) were cultured at 37 °C/CO2 in medium (Hepes (72400-054, Life Technologies), with 10% FBS (2-01F16-I, Amimed/Bioconcept), 1% Pen/Strep (15140-122, Life Technologies), 1 mM Sodium Pyruvate (11360-039, Life Technologies), 2mM L-Glutamine (25030-024, Life Technologies), 0.0 mM Mercaptoethanol (31350-010, Life Technologies)). Before compound treatment and for all experiments, the THP1 cells were differentiated with 100 nM Vitamin D3 (Biotrend Chemicals AG, Switzerland, Cat. No. BG0684) for 3 days at 37 °C/CO2.

cAMP HTRF assay

The assay was run using the Cisbio cAMP dynamic 2 Kit (62AM4PEB), in white 384well-plates BioCoat #354661, with 20,000 cells/well in 10 µL/well HBSS/HEPES/IBMX. Isoproterenol [10 uM] was used as active control. Cells were incubated with compounds for 20 min. at 37 °C in HBSS/HEPES, in the presence of the Phosphodiesterase (PDE) inhibitor IBMX. Then, cells were lysed and the amount of generated cAMP was quantified by HTRF (Homogeneous Time Resolved Fluorescence).

Beta agonists gene-signature

RNASeq experiments were done comparing untreated cells with a treatment with isoproterenol, adrenaline or noradrenaline for 4 h in THP1 cells.

qPCR was run in THP1 cells for 4 h incubation time with isoproterenol and formoterol at 1, 10 and 100 nM. Total RNAs were isolated with MagMAX™−96 Total RNA Isolation Kit (Ambion ref#AM1830), and cDNA was made using a cDNA Synthesis Kit (Applied Biosystems™ Ref#4368813) RT-PCRs were performed in 384-well plates on an AB7900HT cycler (Applied Biosystems) using specific TaqMan probes (Applied Biosystems). Housekeeper normalization was done relative to the one of the three genes GAPDH, PPIB or TBP, which had the most similar expression level to the gene of interest, according to our DMSO qPCR data. All measurements were done in quadruplicates.

QuantiGene Plex assay

Gene expression changes were measured using a customized QuantiGene Plex assay (Thermo Fisher Scientific).

Two different eight-gene-signatures were designed (obtained from Thermo Fisher Scientific), as the internal QuantiGene process was set up to handle custom-designed signatures of eight genes. Each of the eight-gene-signatures consisted of seven target genes responding to cAMP and one housekeeper gene (TBP).

Measurements were done in THP1 cells. Compounds were measured in six replicates on the same day on different plates, and the procedure was repeated on another day using three replicates on different plates (referred to as biological replicates in the manuscript).

For the assay, 100,000 cells were seeded in a volume of 20 uL in each well of a 384 well plate (Greiner PP V bottom 781280). Compounds were added in serial dilutions of 1:10 (200 nL volume added per well) with maximal compound concentrations of 100uM. After 4 h incubation, cells were lysed with QuantiGene lysis mixture (10 uL), and after 2 min, stored at −80 °C.

Targeted mRNA transcripts were captured to their respective beads by combining lysis mixture (5 uL), blocking reagent (2 uL), probe mix (1.125 uL), water (11.25 uL), and magnetic beads (0.3 uL; 500 beads/region/uL) and incubated overnight.

Signal amplification via branched DNA is added by sequential hybridization of 2.0 Pre Amplifier biotinylated label probe, and binding with Steptavidin-conjugated Phycoerythrin (SAPE). For this purpose, each 15 uL/well pre-amplifier, amplifier and label probe & SAPE were added after washing followed by 1 h incubation at 50 °C and multitron shaking 300 rpm 1 h.

The amount of RNA in 90 uL of probe was quantified using a Luminex Flexmap 3D instrument (Luminex). The identity of the mRNA is encoded by the hybridized Luminex beads, and the level of SAPE fluorescence is proportional to the amount of mRNA transcripts captured by the respective beads.

QuantiGene Plex data processing

The raw readout of the assay was processed as follows:

  1. 1.

    Fold change = 50 * log2 (mRNA count / median mRNA count for NC well)

  2. 2.

    Rscore = (Fold change for well – median Fold change for NC wells) / MAD (mRNA count for NC wells)

  3. 3.

    HKnorm = Rscore for well – HK_Rscore for well; with HK = housekeeper gene.

L1000 / GR50 dataset

EGFR inhibitors in MCF10A cells were selected as model system, because (1) they showed a strong GR50 dynamic range (Dose-response curves visualization http://www.grcalculator.org/grbrowser/.), and (2) were measured in six concentrations in L1000 (10 uM, 3.33 M, 1.11 uM, 0.37 uM, 0.12 uM, 0.04 uM).

The L1000 data was obtained in two files (GSE70138_Broad_LINCS_Level4_ZSVCINF_mlr12k_n78980x22268_2015-06-30.gct.gz and GSE70138_Broad_LINCS_Level4_ZSVCINF_mlr12k_n115209x22268_2015-12-31.gct.gz) from NCBI GEO (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE70138).

This version of the data contains the changed gene-expression normalized as z-scores relative to the DMSO controls on each plate, a similar normalization procedure to the one performed for the beta-agonists expression data. When multiple probes were measured for the same gene_symbol, the probe with the highest variance was kept, for each timepoint. The gefitinib treatment at 3.33 uM was defined as the active control of the experiment. Compounds, smiles, and inchi_key were downloaded from the LINCS webpage (http://lincs.hms.harvard.edu/db/datasets/20000/).

From the 12,727 genes in the L1000 dataset, two different subsets were selected based on published gene-signatures. An EGFR (entrez gene_id 1956) signature28 (“EGFR_UP.V1_UP” with 193 genes, “EGFR_UP.V1_DN” with 196 genes) was downloaded from msigdb31,32, of which a total of 381 genes could be mapped to the L1000 data. This gene-signature was derived from profiling of MCF-7 cell lines stably overexpressing ligand-activatable EGFR. A TPX2 (entrez gene_id 22974) signature (50 genes, of which 39 could be mapped to L1000) was taken from Farmer et al.3, representing a more general signature for cell proliferation.

The GR50 cell viability potency values after three days compound incubation time were also obtained from the LINCS consortium (http://lincs.hms.harvard.edu/db/datasets/20252/results). To make the data more comparable to the fitted IC50’s from the gene-signatures, compounds with flat GR50 dose-response curves were set to either one log unit above or below the highest or lowest tested concentration, depending whether their fitted GRInf value was larger or smaller than 0.5.

As the files from L1000 and the GR50s contained slightly different compound and cell line names, the names were set all to lowercase and whitespaces and “-“ were removed. Eight known EGFR inhibitors afatinib, neratinib, pelitinib, gefitinib, erlotinib, canertinib, lapatinib, and HG-5-88-01 overlapped between the two datasets. The two EGFR/ERBB2 dual inhibitors neratinib and afatinib were considered as EGFR inhibitors for this study (even though they are annotated as ERBB2 inhibitors in the LINCS nominal target annotation).

Dose-response (DRC) fitting

Four-point parametric logistic fits were calculated with an R function included in the mvAC50 R-package [https://github.com/Novartis/mvAC50]. The fitting algorithm in the R-package was adopted from our in-house HTS analysis software Helios33. The fits were constrained to A0 and Ainf (minimal and maximal fitted activities) between −50% and 500% of the active control effect, respectively, and a hill slope between 0.1 and 10. The IC50s or EC50s were constrained to one log unit above and below the experimentally measured range of concentrations, (for the beta agonists ranging from 0.00001 uM to 100 uM, and for the L1000 data ranging from 0.04 uM to 10 uM)

In the case of constant fits, IC50 or EC50 values one-log unit above or below the range of tested concentrations were assigned to the compounds to be able to use those data points as well in the correlation of calculated potencies to the reference potencies. Depending on whether the Amax of the constant fit was below or above 50%, a potency of either one log unit below or above the tested concentration range was assigned. Fitted AC50s with Ainf values <50% were set to one log unit above the highest tested concentration as well, assuming that the observed effect is not caused by the same mode of action as in the active control.

In parallel to the four-point parametric fit and constant fits, a nonparametric fit was also calculated and compared to the other fits, to allow for more unusual curve shapes, e.g. bell shaped curves. For these fits the reported potency is the concentration at which the fit crosses the line of 50% activity. The decision for the reported fit and potency was done as follows: If the non-parametric fit resulted in r2 < 0.5, the data was considered as not suitable for curve fitting and assigned as constant fit. If the curve had a bell-shape, the nonparametric potency was reported. If parametric fits had r2 < 0.5 or the absolute (amin-amax) <30, a constant fit was reported as well, where amin and amax correspond to A0 and Ainf within the measured concentration range. For the remaining curves (the majority) parametric potencies were reported.

cAMP EC50s were fitted with the same algorithm and settings, to ensure a higher consistency in the data. The fitted cAMP EC50s were in agreement with the fits generated by the biologists who ran the assays. For the GR50 dataset this approach was not feasible, as no raw data was available, and the GR50 algorithm was claimed to be superior to four-point parametric fits of the same data26.