Supplementary Figure 4: Prediction performance of SplashRNA. | Nature Biotechnology

Supplementary Figure 4: Prediction performance of SplashRNA.

From: Prediction of potent shRNAs with a sequential classification algorithm

Supplementary Figure 4

(a) Precision-recall curves on the TILE data set, comparing leave-one-gene-out nested cross-validation predictions from SplashRNA (auPR: 0.696) and SplashmiR-30 (auPR: 0.699) against the alternative prediction tools DSIR (auPR: 0.594), seqScore (auPR: 0.526) and miR_Scan (auPR: 0.449).

(b) Score distribution of the mRas + hRAS set (DSIR + Sensor rules selected). The green line indicates the threshold (Online Methods, Supplementary Table 1).

(c) Prediction performance comparison of the indicated algorithms on the external mRas + hRAS Sensor data set (Supplementary Table 1). SplashRNA outperformed the other algorithms.

(d) Score distributions of the miR-E and UltramiR data sets. For the miR-E set, the threshold was set to 80 (green line, Online Methods ). The UltramiR set represents the distribution of log depletion scores of shRNAs tested in a cell-viability screen (Supplementary Table 1).

(e) SplashRNA and DSIR based re-ranking of shERWOOD selected UltramiR shRNAs targeting essential genes that were tested in a cell-viability screen. X-axis: mean SplashRNA or DSIR score for equally sized groups (purple and blue dots, 20 groups) of 39 shRNAs each. Y-axis: Percent of shRNAs in each group that were potent (Online Methods ). SplashRNA and DSIR were compared against the published minimum (Min), median (Med) and maximum (Max) shERWOOD algorithm performance on the same data set (green-brown dots).

(f) Retrospective potency prediction of shRNAs from a large-scale essential genes RNAi screen. The biological screen used 20-25 miR-E-like shRNAs per gene to identify essential genes. shRNA potency was quantified by assessing their log fold changes (Online Methods ). For each of the top 50 essential genes, all tested algorithms selected their top and bottom five sequences by prediction score. Log fold changes for all selected shRNA across the 50 genes were compared. SplashRNA achieved the most significant discrimination between top and bottom predictions (p = 1.8e-11, one-sided Wilcoxon rank sum test). seqScore (p = 2.3e-5) was used to generate the initial library of approximately 25 shRNAs per gene.

(g) Retrospective potency prediction of shRNAs from a large-scale toxin resistance and sensitivity RNAi screen. The biological screen used 25 miR-E-like shRNAs per gene to identify resistance and sensitivity genes. shRNA potency was quantified by assessing their log fold changes (Online Methods ). For each of the top 20 sensitivity genes, all tested algorithms selected their top and bottom five sequences by prediction score. Log fold changes for all selected shRNA across the 20 genes were compared. SplashRNA was the only algorithm to achieve significant discrimination between the top and bottom predictions at p < 0.01 (p = 4.8e-4, one-sided Wilcoxon rank sum test). Of note, SplashRNA also outperformed the other algorithms when selecting smaller or larger numbers of top sensitivity genes from the biological screen (data not shown). seqScore was used to generate the initial library of approximately 25 shRNAs per gene.

Back to article page