Φ-score: A cell-to-cell phenotypic scoring method for sensitive and selective hit discovery in cell-based assays

Phenotypic screening monitors phenotypic changes induced by perturbations, including those generated by drugs or RNA interference. Currently-used methods for scoring screen hits have proven to be problematic, particularly when applied to physiologically relevant conditions such as low cell numbers or inefficient transfection. Here, we describe the Φ-score, which is a novel scoring method for the identification of phenotypic modifiers or hits in cell-based screens. Φ-score performance was assessed with simulations, a validation experiment and its application to gene identification in a large-scale RNAi screen. Using robust statistics and a variance model, we demonstrated that the Φ-score showed better sensitivity, selectivity and reproducibility compared to classical approaches. The improved performance of the Φ-score paves the way for cell-based screening of primary cells, which are often difficult to obtain from patients in sufficient numbers. We also describe a dedicated merging procedure to pool scores from small interfering RNAs targeting the same gene so as to provide improved visualization and hit selection.


Supplementary Materials
Applied model for the Φ-score Supplementary Fig. 1 graphically summarizes the different steps undertaken to calculate the Φ-score, and Supplementary software provides an implementation in R (http://www.r-project.org/) together with example data and a tutorial. Let denote the phenotypic value for cell (for example, GFP fluorescence of the cell). If is the rank of the phenotypic value of cell within the plate and is the number of cells in the plate: Then, this value is converted into a normal score by applying the inverse of the Gaussian cumulative distribution. Thus, the normal score for cell is = −1 ( / ), that is: The score for perturbation (e.g., drug, siRNA, CRISPR/Cas9, and microRNA mimic) is the average of over all cells exposed to this perturbation: We must take into account the variable number of cells in each well to make the comparison of the perturbations meaningful. We can estimate the variance of with the formula: where is summed over the wells to which the perturbation had been applied and is the number of cells within each of these wells. This formula results from the decomposition of the cell Gaussian scores according to:

= +
Here, accounts for the within-well variation and the accounts for the between well variation. In other words, two cells within the same well are on average more similar than two cells taken from two different wells under the null hypothesis. This model accounts for this effect because they share the same term only if the cells are in the same well. Thus, 2 is the variance within a given well and is estimated for each plate by the average of each well's variance. Equation (1) assumes that and are orthogonal and identically distributed for neutral perturbations. Because the are normal scores, we have 2 + 2 = 1, so that 2 is the only free model parameter. Then, the Φ-score for perturbation is defined by:

Φ =
The Φ-scores are converted to a uniform distribution to obtain P-values according to: where a high phenotypic values translates to one and a low phenotypic values translates to zero. When the phenotype effect is associated to a reduction in the phenotype effect (e.g., reduced GFP fluorescence), can be interpreted as a P-value. Conversely, in case of an increase in the phenotype effect, 1 − can be interpreted as a P-values. Because we are in a multiple testing context, the standard Benjamini-Hochberg procedure can be applied to control the false discovery rate.

Normalization for the Φ-score
The Φ-score procedure detailed in the previous paragraph assumes that most of the perturbations have no or little effect on the phenotype of interest. When this assumption is no longer true, a normalization using negative controls is herein proposed so that the negative control score remains close to zero. As a result mean and variances used to normalize the measurements are computed with the negative controls alone.
Thus, the , normal score for cell is modified in , through where subscript stands for cell , subscript for "normalized", and subscript for the score restricted on all negative controls in the plate. μ and σ are the mean and standard deviation of the cell normal scores of the negative controls. Due to this modification, , is only standardized when restricted to the negative controls. The intermediate score per perturbation , is modified as follows: Similarly, , is the average for well of the normalized cellular score , . , 2 is the variance of these values within the plate over the wells to which the negative controls have been applied: The normalized version for the variance , 2 is the weighted average of the variance in all wells ( ( ( ), )) of the normalized cellular scores: 4 , 2 = ∑ N ( ( ), ) / ∑ N Finally, the variance of , is calculated as follows: where ∅ , is the normalized Φ-score for perturbation P.

K-score
The K-score is another cell based score developed by Knapp et al. 15 based on the Kolmogorov-Smirnov test and is used for benchmarking using simulations.
Briefly, the K-score calculates an enrichment score for each perturbation . Two complementary running sums and ̅ are first calculated based on (the ranked cellular phenotypic values). When a cell exposed to the perturbation is encountered in , is increased by one; in contrast, ̅ is incremented when a cell is not exposed to .
Given as a position in the phenotypic ranked list R, as the number of cells associated with the perturbation and N as the total number of cells, for all = 1, … , , we have: Then, the enrichment score for is the maximal deviation from zero of the difference between these two running sums.

Merging siRNA scores targeting the same genes
The following section describes the merging procedure of individual siRNA scores targeting the same gene through a simple example. Let 1 = −8.5, 2 = +2.1, and 3 = −3.3 (the score of three different siRNAs targeting a given gene). First, we build a modified score ̃ using a lower-limit (any score between plus or minus the lower-limit is set to zero) and an upper-limit (any score exceeding plus or minus the upper-limit is set to +/-upper-limit). For this study, we arbitrarily chose a lower-limit of 3, which is high enough to get rid of small (offtarget or spatial) effects, and an upper-limit of 6 to avoid predominance of only one siRNA score on the final score. Thus, we obtain ̃1 = −6, ̃2 = 0, and ̃3 = −3.3. Here, two siRNAs out of three share the same phenotype, while the third has no effect. This corresponds to a sum of signs equal to -2 (-1, 0, -1). A bonus is added to separate the merged score of genes depending on the number of siRNA hits sharing the same phenotype (Supplementary Table   1). Here, a bonus of -3 is added, leading to a merged score = −12.3. Now, let ̃2 = +3.1.
There is no bonus as the sum of the signs is -1, leading to = −8.7. In contrast, if 2 = −3.1, the sum of the signs equals -3; with the bonus the sum becomes -6, leading to = −20.9.

Ontology enrichment
Due to the multiple testing issues for ontology enrichment (Online methods), we only consider enriched ontologies with P-values lower than 10 −3 when the whole list of ontologies is investigated. To set this threshold, we randomized the merged Φ-scores and Z-scores one hundred times, and recalculated the P-value with Fisher's exact test for each resampling and for each Molecular Function (MF) ontology of positive hits (merged score above 12). Only the ten most significant P-values were kept. As a consequence, the "hits" (487 for mΦ and 291 for mZ) vary, but their total number remains constant and leads to different P-values. The minimum P-value is 5.5 × 10 −5 for "random" Φ-score ontologies and 2.1 × 10 −4 for Z-score ontologies, with a median P-value of 10 −2 for the ten most significant P-values for both scores. In comparison, the ten most significant P-values for positive hits (Molecular Function, merged score above 12) ranged between 4.4 × 10 −14 and 2.2 × 10 −9 for the Φ-score and between 1.7 × 10 −6 and 5.2 × 10 −5 for the Z-score. Supplementary Figure 12 shows the result with only the most significant ontology instead of the first ten. This type of enrichment compared to random picking of the hits proves both the sensitivity and specificity of the scores and the superior performance of the Φ-score. negative population (population 1, low signal affected by noise) and a positive population (population 2, high signal also affected by noise). The proportion of each positive cell is 60%.
The mean and standard deviation are given for each population. tn.mu indicates the probability of transfection (the probability that each cell is affected by the perturbation). ef.mu indicates the efficiency of the perturbation (if not stated, ef.mu=30%, indicating that the initial fluorescence is multiplied by 0.7).