Prediction of potent shRNAs with a sequential classification algorithm

Pelossof, Raphael; Fairchild, Lauren; Huang, Chun-Hao; Widmer, Christian; Sreedharan, Vipin T; Sinha, Nishi; Lai, Dan-Yu; Guan, Yuanzhe; Premsrirut, Prem K; Tschaharganeh, Darjus F; Hoffmann, Thomas; Thapar, Vishal; Xiang, Qing; Garippa, Ralph J; Rätsch, Gunnar; Zuber, Johannes; Lowe, Scott W; Leslie, Christina S; Fellmann, Christof

doi:10.1038/nbt.3807

Brief Communication
Published: 06 March 2017

Prediction of potent shRNAs with a sequential classification algorithm

Raphael Pelossof¹^na1,
Lauren Fairchild^1,2^na1,
Chun-Hao Huang^3,4,
Christian Widmer^1,5,
Vipin T Sreedharan¹,
Nishi Sinha⁶,
Dan-Yu Lai⁶,
Yuanzhe Guan⁶,
Prem K Premsrirut⁶,
Darjus F Tschaharganeh³,
Thomas Hoffmann⁷,
Vishal Thapar³,
Qing Xiang⁸,
Ralph J Garippa⁸,
Gunnar Rätsch^1,9,
Johannes Zuber⁷,
Scott W Lowe^3,4,10,
Christina S Leslie¹ &
…
Christof Fellmann^6,11

Nature Biotechnology volume 35, pages 350–353 (2017)Cite this article

10k Accesses
97 Citations
26 Altmetric
Metrics details

Subjects

Abstract

We present SplashRNA, a sequential classifier to predict potent microRNA-based short hairpin RNAs (shRNAs). Trained on published and novel data sets, SplashRNA outperforms previous algorithms and reliably predicts the most efficient shRNAs for a given gene. Combined with an optimized miR-E backbone, >90% of high-scoring SplashRNA predictions trigger >85% protein knockdown when expressed from a single genomic integration. SplashRNA can significantly improve the accuracy of loss-of-function genetics studies and facilitates the generation of compact shRNA libraries.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Computational modeling of advancements in shRNA technology.**

**Figure 2: Benchmarking SplashRNA prediction performance.**

Genome organization around nuclear speckles drives mRNA splicing efficiency

Article 08 May 2024

Efficient gene knockout and genetic interaction screening using the in4mer CRISPR/Cas12a multiplex knockout platform

Article Open access 27 April 2024

Improving prime editing with an endogenous small RNA-binding protein

Article Open access 03 April 2024

Accession codes

Accessions

Gene Expression Omnibus

NM_008960

References

Fellmann, C. & Lowe, S.W. Nat. Cell Biol. 16, 10–18 (2014).
Article CAS PubMed PubMed Central Google Scholar
Guda, S. et al. Mol. Ther. 23, 1465–1474 (2015).
Article CAS PubMed PubMed Central Google Scholar
Grimm, D. et al. Nature 441, 537–541 (2006).
Article CAS PubMed Google Scholar
McBride, J.L. et al. Proc. Natl. Acad. Sci. USA 105, 5868–5873 (2008).
Article CAS PubMed PubMed Central Google Scholar
Baek, S.T. et al. Neuron 82, 1255–1262 (2014).
Article CAS PubMed PubMed Central Google Scholar
Zuber, J. et al. Nat. Biotechnol. 29, 79–83 (2011).
Article CAS PubMed Google Scholar
Fellmann, C. et al. Cell Rep. 5, 1704–1713 (2013).
Article CAS PubMed Google Scholar
Gu, S. et al. Cell 151, 900–911 (2012).
Article CAS PubMed PubMed Central Google Scholar
Watanabe, C., Cuellar, T.L. & Haley, B. RNA Biol. 13, 25–33 (2016).
Article PubMed PubMed Central Google Scholar
Fellmann, C. et al. Mol. Cell 41, 733–746 (2011).
Article CAS PubMed PubMed Central Google Scholar
Yuan, T.L. et al. Cancer Discov. 4, 1182–1197 (2014).
Article CAS PubMed PubMed Central Google Scholar
Knott, S.R.V. et al. Mol. Cell 56, 796–807 (2014).
Article CAS PubMed PubMed Central Google Scholar
Auyeung, V.C.C., Ulitsky, I., McGeary, S.E.E. & Bartel, D.P.P. Cell 152, 844–858 (2013).
Article CAS PubMed PubMed Central Google Scholar
Viola, P. & Jones, M. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 1, 511–518 (2001).
Google Scholar
Pelossof, R. Learning with Stochastic Focus of Attention PhD thesis, (Columbia Univ. 2011).
Leslie, C., Eskin, E. & Noble, W.S. Pac. Symp. Biocomput. 575, 564–575 (2002).
Google Scholar
Sonnenburg, S., Rätsch, G. & Rieck, K. Large scale learning with string kernels. Large-scale Kernel Machines. (eds. Bottou, L., Chapelle, O., DeCoste, D. & Weston, J.) 73–104 (MIT Press, Cambridge, MA 2007).
Vert, J.P., Foveau, N., Lajaunie, C. & Vandenbrouck, Y. BMC Bioinformatics 7, 520 (2006).
Article PubMed PubMed Central Google Scholar
Kampmann, M. et al. Proc. Natl. Acad. Sci. USA 112, E3384–E3391 (2015).
Article CAS PubMed PubMed Central Google Scholar
Matveeva, O.V., Nazipova, N.N., Ogurtsov, A.Y. & Shabalina, S.A. Front. Genet. 3, 163 (2012).
Article CAS PubMed PubMed Central Google Scholar
Morgens, D.W., Deans, R.M., Li, A. & Bassik, M.C. Nat. Biotechnol. 34, 634–636 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kampmann, M., Bassik, M.C. & Weissman, J.S. Proc. Natl. Acad. Sci. USA 110, E2317–E2326 (2013).
Article CAS PubMed PubMed Central Google Scholar
Hart, T., Brown, K.R., Sircoulomb, F., Rottapel, R. & Moffat, J. Mol. Syst. Biol. 10, 733 (2014).
Article PubMed PubMed Central Google Scholar
Spies, N., Burge, C.B. & Bartel, D.P. Genome Res. 23, 2078–2090 (2013).
Article CAS PubMed PubMed Central Google Scholar
Derti, A. et al. Genome Res. 22, 1173–1183 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lianoglou, S., Garg, V., Yang, J.L., Leslie, C.S. & Mayr, C. Genes Dev. 27, 2380–2396 (2013).
Article CAS PubMed PubMed Central Google Scholar
Yi, R., Doehle, B.P., Qin, Y., Macara, I.G. & Cullen, B.R. RNA 11, 220–226 (2005).
Article CAS PubMed PubMed Central Google Scholar
Boudreau, R.L., Martins, I. & Davidson, B.L. Mol. Ther. 17, 169–175 (2009).
Article CAS PubMed Google Scholar
Sigoillot, F.D. et al. Nat. Methods 9, 363–366 (2012).
Article CAS PubMed PubMed Central Google Scholar
Khvorova, A., Reynolds, A. & Jayasena, S.D. Cell 115, 209–216 (2003).
Article CAS PubMed Google Scholar
Reynolds, A. et al. Nat. Biotechnol. 22, 326–330 (2004).
Article CAS PubMed Google Scholar
Schwarz, D.S. et al. Cell 115, 199–208 (2003).
Article CAS PubMed Google Scholar
Huesken, D. et al. Nat. Biotechnol. 23, 995–1001 (2005).
Article CAS PubMed Google Scholar
Saetrom, P. & Snøve, O. Biochem. Biophys. Res. Commun. 321, 247–253 (2004).
Article CAS PubMed Google Scholar
Filhol, O. et al. PLoS One 7, e48057 (2012).
Article CAS PubMed PubMed Central Google Scholar
Taxman, D.J. et al. BMC Biotechnol. 6, 7 (2006).
Article PubMed PubMed Central Google Scholar
Sonnenburg, S. et al. J. Mach. Learn. Res. 11, 1799–1802 (2010).
Google Scholar
Huber, W. et al. Nat. Methods 12, 115–121 (2015).
CAS PubMed PubMed Central Google Scholar
Lawrence, M. et al. PLoS Comput. Biol. http://dx.doi.org/10.1371/journal.pcbi.1003118 (2013).
Dow, L.E. et al. Nat. Protoc. 7, 374–393 (2012).
Article CAS PubMed PubMed Central Google Scholar
Platt, R.J. et al. Cell 159, 440–455 (2014).
Article CAS PubMed PubMed Central Google Scholar
Hochedlinger, K., Yamada, Y., Beard, C. & Jaenisch, R. Cell 121, 465–477 (2005).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank J.A. Doudna, G.J. Hannon, L.E. Dow and S.N. Floor for continuous support and valuable discussions. We gratefully acknowledge assistance and support from A. Banito, V. Sridhar, L. Faletti, C.C. Chen and S. Tian. C.F. was supported in part by a K99/R00 Pathway to Independence Award (K99GM118909) from the National Institutes of Health (NIH), National Institute of General Medical Sciences (NIGMS). C.F. is a founder of Mirimus Inc., a company that develops RNAi-based reagents and transgenic mice. This work was also supported in part by grant CA013106 (S.W.L.). S.W.L. is a founder and member of the scientific advisory board of Mirimus Inc., the Geoffrey Beene Chair of Cancer Biology at MSKCC and an investigator of the Howard Hughes Medical Institute. J.Z. is a member of the scientific advisory board, and P.K.P. is a founder and employee of Mirimus Inc. C.S.L. was supported in part by NHGRI U01 grants HG007033 and HG007893 and NCI U01 grant CA164190. A375 cells were a kind gift from Neal Rosen, MSKCC.

Author information

Raphael Pelossof and Lauren Fairchild: These authors contributed equally to this work.

Authors and Affiliations

Computational Biology Program, Memorial Sloan Kettering Cancer Center, New York, New York, USA
Raphael Pelossof, Lauren Fairchild, Christian Widmer, Vipin T Sreedharan, Gunnar Rätsch & Christina S Leslie
Tri-Institutional Training Program in Computational Biology and Medicine, New York, New York, USA
Lauren Fairchild
Department of Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, New York, USA
Chun-Hao Huang, Darjus F Tschaharganeh, Vishal Thapar & Scott W Lowe
Cell and Developmental Biology Program, Weill Graduate School of Medical Sciences, Cornell University, New York, New York, USA
Chun-Hao Huang & Scott W Lowe
Department of Computer Science, Machine Learning Group, Berlin Institute of Technology, Berlin, Germany
Christian Widmer
Mirimus Inc., Woodbury, New York, USA
Nishi Sinha, Dan-Yu Lai, Yuanzhe Guan, Prem K Premsrirut & Christof Fellmann
Research Institute of Molecular Pathology, Vienna Biocenter, Vienna, Austria
Thomas Hoffmann & Johannes Zuber
RNAi Core, Memorial Sloan Kettering Cancer Center, New York, New York, USA
Qing Xiang & Ralph J Garippa
Department of Computer Science, ETH Zurich, Zurich, Switzerland
Gunnar Rätsch
Howard Hughes Medical Institute and Memorial Sloan Kettering Cancer Center, New York, New York, USA
Scott W Lowe
Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California, USA
Christof Fellmann

Authors

Raphael Pelossof
View author publications
You can also search for this author in PubMed Google Scholar
Lauren Fairchild
View author publications
You can also search for this author in PubMed Google Scholar
Chun-Hao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Christian Widmer
View author publications
You can also search for this author in PubMed Google Scholar
Vipin T Sreedharan
View author publications
You can also search for this author in PubMed Google Scholar
Nishi Sinha
View author publications
You can also search for this author in PubMed Google Scholar
Dan-Yu Lai
View author publications
You can also search for this author in PubMed Google Scholar
Yuanzhe Guan
View author publications
You can also search for this author in PubMed Google Scholar
Prem K Premsrirut
View author publications
You can also search for this author in PubMed Google Scholar
Darjus F Tschaharganeh
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar
Vishal Thapar
View author publications
You can also search for this author in PubMed Google Scholar
Qing Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Ralph J Garippa
View author publications
You can also search for this author in PubMed Google Scholar
Gunnar Rätsch
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Zuber
View author publications
You can also search for this author in PubMed Google Scholar
Scott W Lowe
View author publications
You can also search for this author in PubMed Google Scholar
Christina S Leslie
View author publications
You can also search for this author in PubMed Google Scholar
Christof Fellmann
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.P., L.F., C.S.L. and C.F. conceived and designed the study, and developed the data integration framework. R.P., L.F., and C.W. built the algorithm, and carried out the model training and computational validation. C.-H.H., N.S., D.-Y.L., Y.G., P.K.P., D.F.T., T.H., J.Z., S.W.L. and C.F. generated the biological data sets and validated knockdown potency. R.P., L.F., C.W. and V.T.S. built the web page. V.T. and G.R. assisted with study design and advised on algorithmic development. Q.X. and R.J.G. helped with validation of predictions. R.P., L.F., C.-H.H., T.H., J.Z., S.W.L., C.S.L. and C.F. analyzed data and wrote the manuscript.

Corresponding authors

Correspondence to Christina S Leslie or Christof Fellmann.

Ethics declarations

Competing interests

C.F. is a founder of Mirimus Inc., a company that develops RNAi-based reagents and transgenic mice. S.W.L. is a founder and member of the scientific advisory board of Mirimus Inc. J.Z. is a member of the scientific advisory board of Mirimus Inc. P.K.P. is a founder and employee of Mirimus Inc. R.P. and L.F. have filed intellectual property on SplashRNA.

Integrated supplementary information

Supplementary Figure 1 Data set generation.

(a-f) Generation of the M1 (miR-30, 20,400 shRNAs) Sensor assay data set (Supplementary Table 2, Online Methods).

(a) Schematic of our previously published Sensor assay that enables large-scale functional assessment of shRNA potency (Online Methods).

(b) Library complexity over Sensor assay sort cycles. Shown are normalized read numbers (parts per million, ppm) in both duplicates for each shRNA represented within the initial libraries (Vector) and the pools after the indicated sorts (Sort 3, 5).

(c) Correlation of reads per shRNA between the two replicates before sorting (left panel), after Sort 5 (middle panel) and between the initial and endpoint population (right panel; shown for one representative replicate). r, Pearson correlation coefficient.

(d) Correlation of Sensor score and reads per shRNA in the vector libraries, showing that the score is independent of the initial shRNA representation. r, Pearson correlation coefficient.

(e) Enrichment or depletion of 17 control shRNAs after Sort 5. All controls have been used in previous Sensor assays (e.g. TILE, mRas + hRAS) and are classified into a strong, intermediate and weak class according to their knockdown potency assessed by immunoblotting.

(f) Rank correlation of 325 performance control shRNAs. 65 shRNAs per gene targeting mouse Bcl2, Kras, Mcl1, Myc and Trp53 that had previously been tested as part of the TILE data set were chosen as supplemental controls to assess Sensor assay performance for weak, intermediate and strong shRNAs. The individual shRNA ranks between TILE and M1 were highly correlated (325 shRNAs, Spearman rank correlation coefficient rho: 0.63; gene-specific correlation coefficients are also reported), even though the TILE and M1 data sets were generated several years apart, using mostly different equipment, reagents and operators.

(g) Generation of the miR-E reporter assay data set (Supplementary Table 2, Online Methods). Normalized reporter knockdown values of miR-E shRNAs assessed one-by-one in an RNAi reporter assay. The shRNAs were tested in 42 individual batches, each including several control shRNAs for data scaling (miR-E Ren.713, miR-30 Pten.1524) and quality control (miR-E Pten.1523, miR-E Pten.1524). Background fluorescence of the parental chicken cell line (ERC) and maximal fluorescence of the batch-specific reporter cell line (ERC cells expressing the shRNA target reporter) were also measured. All shRNAs were grouped into either a positive or negative class. A threshold value of 80 was chosen as a cutoff, based on the performance of miR-30 Pten.1524 and miR-E Ren.713.

(h) Nucleotide representation of positive shRNAs from the indicated data sets. Shown are the nucleotides one to eight of the guide strand (starting in the center), including the entire seed region. Unbiased TILE (miR-30) set, showing a diversified nucleotide composition (left panel). Preselected M1 (miR-30, DSIR + Sensor rules selected) set, showing a biased nucleotide representation (middle panel). Preselected miR-E + UltramiR set, showing a different nucleotide bias due to the altered shRNA backbone. More shRNAs starting with a C were found to be potent (compared to TILE, p = 0.002, Fisher’s exact test), indicating less restrictive sequence requirements when using the miR-E backbone.

Supplementary Figure 2 Kernel selection and data integration.

(a) Schematic of the first support vector machine (SVM) classifier that serves to eliminate non-functional sequences and prioritize shRNAs that are likely to be potent.

(b) Schematic of the kernel representation used by SplashRNA. A weighted degree kernel is calculated across the entire guide sequence, while two spectrum kernels are calculated across nucleotides 1-15 and 16-22, respectively.

(c) TILE score distribution (Online Methods ). We set a potency threshold separating the negative from the positive class at the minimal point between the two modes of the distribution (green line, for thresholds see Supplementary Table 1).

(d) Testing of multiple kernel combinations in a leave-one-gene-out nested cross-validation setting on the TILE data set found that the combination of a weighted degree kernel over positions 1-22 and two spectrum kernels at positions 1-15 and 16-22 (allKernels) yields the best performance. Spec1 is a spectrum kernel over positions 1-15. Spec2 is a spectrum kernel over positions 16-22. Spec1_spec2 is a combination of spec1 and spec2. Wdk is a weighted degree kernel over positions 1-22. Wdk_spec1 is a combination of wdk and spec1. Wdk_spec2 is a combination of wdk and spec2. All_kernels is a combination of wdk, spec1 and spec2.

(e) M1 score distribution (Supplementary Table 1, Online Methods). Cutoffs (green lines) were calculated by fitting Gaussian distributions to the modes and setting thresholds at 5% false positive rate (FPR) and 5% false negative rate (FNR).

(f) Incorporation of M1 positives, negatives or both into the TILE training set was tested in a nested leave-one-gene-out cross-validation setting. Inclusion of M1 negatives deteriorated performance on the TILE data set, whereas inclusion of the M1 positives alone improved performance. Note: TILE+M1pos = Splash_miR-30, the miR-30 classifier.

(g) Score distribution for the shERWOOD miR-30 set (Supplementary Table 1, Online Methods). We set the threshold at an arbitrary cutoff of zero (green line).

(h) Incorporation of M1 positives into the TILE training set improved performance on the external shERWOOD data set. Note: TILE+M1pos = Splash_miR-30, the miR-30 classifier.

Supplementary Figure 3 Calibration of the sequential SVM classifier SplashRNA.

(a) Precision-recall trade-off between the two classifiers Splash_miR-30 and Splash_miR-E. Selection of alpha (α) and theta (θ) hyperparameters leads to varied performance (area under the precision-recall curve, auPR) on the TILE miR-30 (x-axis) and miR-E + UltramiR (y-axis) sets. Each line represents a setting of alpha; points on the line represent distinct theta values. The circle indicates the alpha and theta choices for the final sequential classifier (SplashRNA: α = 0.6, θ = 1.1). The dashed line represents the performance of the convex linear classifier without a threshold at every alpha. Note that the performance of a sequential classifier equals or exceeds that of a linear combination since one can set the threshold (θ) to a small enough value such that all examples are evaluated by both classifiers.

(b) Performance on the TILE set, varying the value for theta with alpha set to 0.6. The insert shows a zoom in of the first 15% of the precision-recall.

(c) Performance on the miR-E + UltramiR set, varying the value for theta with alpha set to 0.6.

Supplementary Figure 4 Prediction performance of SplashRNA.

(a) Precision-recall curves on the TILE data set, comparing leave-one-gene-out nested cross-validation predictions from SplashRNA (auPR: 0.696) and Splash_miR-30 (auPR: 0.699) against the alternative prediction tools DSIR (auPR: 0.594), seqScore (auPR: 0.526) and miR_Scan (auPR: 0.449).

(b) Score distribution of the mRas + hRAS set (DSIR + Sensor rules selected). The green line indicates the threshold (Online Methods, Supplementary Table 1).

(c) Prediction performance comparison of the indicated algorithms on the external mRas + hRAS Sensor data set (Supplementary Table 1). SplashRNA outperformed the other algorithms.

(d) Score distributions of the miR-E and UltramiR data sets. For the miR-E set, the threshold was set to 80 (green line, Online Methods ). The UltramiR set represents the distribution of log depletion scores of shRNAs tested in a cell-viability screen (Supplementary Table 1).

(e) SplashRNA and DSIR based re-ranking of shERWOOD selected UltramiR shRNAs targeting essential genes that were tested in a cell-viability screen. X-axis: mean SplashRNA or DSIR score for equally sized groups (purple and blue dots, 20 groups) of 39 shRNAs each. Y-axis: Percent of shRNAs in each group that were potent (Online Methods ). SplashRNA and DSIR were compared against the published minimum (Min), median (Med) and maximum (Max) shERWOOD algorithm performance on the same data set (green-brown dots).

(f) Retrospective potency prediction of shRNAs from a large-scale essential genes RNAi screen. The biological screen used 20-25 miR-E-like shRNAs per gene to identify essential genes. shRNA potency was quantified by assessing their log fold changes (Online Methods ). For each of the top 50 essential genes, all tested algorithms selected their top and bottom five sequences by prediction score. Log fold changes for all selected shRNA across the 50 genes were compared. SplashRNA achieved the most significant discrimination between top and bottom predictions (p = 1.8e-11, one-sided Wilcoxon rank sum test). seqScore (p = 2.3e-5) was used to generate the initial library of approximately 25 shRNAs per gene.

(g) Retrospective potency prediction of shRNAs from a large-scale toxin resistance and sensitivity RNAi screen. The biological screen used 25 miR-E-like shRNAs per gene to identify resistance and sensitivity genes. shRNA potency was quantified by assessing their log fold changes (Online Methods ). For each of the top 20 sensitivity genes, all tested algorithms selected their top and bottom five sequences by prediction score. Log fold changes for all selected shRNA across the 20 genes were compared. SplashRNA was the only algorithm to achieve significant discrimination between the top and bottom predictions at p < 0.01 (p = 4.8e-4, one-sided Wilcoxon rank sum test). Of note, SplashRNA also outperformed the other algorithms when selecting smaller or larger numbers of top sensitivity genes from the biological screen (data not shown). seqScore was used to generate the initial library of approximately 25 shRNAs per gene.

Supplementary Figure 5 Transcript selection.

(a) Distribution of shRNA potency in functionally distinct transcript regions. Shown is the potency distribution of shRNAs in the unbiased TILE data set that target the 5’UTR, CDS or 3’UTR. Since these shRNAs were evaluated using the Sensor assay, their targets are not subject to alternative cleavage and polyadenylation (ApA) and/or splicing events.

(b) AU content of potent and weak miR-30 shRNAs from the unbiased TILE set. Potent shRNAs tend to have a higher proportion of A/U nucleotides (p < 2.2e-16, two-sided Kolmogorov-Smirnov test).

(c) AU content of functionally distinct transcript regions in the human genome. Shown are the AU densities in 5’UTR, CDS and 3’UTR.

(d) AU content in mouse transcripts.

(e) Alternative cleavage and polyadenylation (ApA) prevents potent shRNAs from inhibiting their putative target gene. Immunoblotting of Pten in NIH/3T3s transduced at single-copy with LEPG expressing the indicated shRNAs. Nine top predictions targeting the CDS or the 3’UTR after early ApA sites were compared alongside controls for their ability to suppress mouse Pten. Actb was used as loading control.

(f) Comparison of knockdown efficiency and annotation of ApA sites. Shown are potent Pten shRNA predictions and their position (start, end) on the mouse genome (mm9). KD indicates a qualitative degree of the knockdown observed in immunoblotting analyses of NIH/3T3s (e). ApA indicates previously published positions on the mouse genome (mm9) of ApA sites (alternative 3’ ends) identified in NIH/3T3 and mouse ES cells by 3P-Seq. 2P-Seq shows the quantification of transcript expression levels measured by 2P-Seq. All shRNAs and ApA sites are ordered according to their position along the mouse genome.

Supplementary Figure 6 Extensive validation of de novo SplashRNA predictions.

(a-f) Western blot validation of de novo SplashRNA predictions. All shRNAs were expressed using LEPG at single-copy conditions. β-Actin (Actb, ACTB) was used for normalization.

(a) Immunoblotting of Pbrm1 in NIH/3T3s (median KD: 97%, median SplashRNA score: 1.7).

(b) Immunoblotting of Rela in NIH/3T3s (median KD: 90%, median SplashRNA score: 1.1).

(c) Immunoblotting of Bcl2l11 in NIH/3T3s (median KD: 97%, median SplashRNA score: 0.7).

(d) Immunoblotting of Axin1 in NIH/3T3s (median KD: 95%, median SplashRNA score: 1.3).

(e) Schematic of the multiple human NF2 transcript variants. NF2 has nine variants with an intersection of only 198 nucleotides, excluding the 5’UTR, rendering the prediction task especially difficult due to limited sequence space.

(f) Predicting miR-E shRNAs for extremely short transcripts. Immunoblotting of NF2 in A375s transduced with the indicated shRNAs targeting all nine NF2 variants (median KD: 89%, median SplashRNA score: 0.6).

(g) Comparison of SplashRNA and DSIR predictions against CRISPR-Cas9 mediated suppression of Cd9 in mouse embryonic fibroblasts (MEFs). Shown are normalized (relative to the indicated controls) median anti-Cd9-APC fluorescence intensities of RRT-MEFs and CRT-MEFs expressing the indicated shRNAs or sgRNAs (Online Methods ). The six top-scoring predictions from DSIR + Sensor rules (DSIR) or SplashRNA (ordered according to their respective scores) were compared to six sgRNA sequences (Supplementary Table 2). *, Cd9.1137 is the top prediction from both algorithms and was plotted twice for clarity. While DSIR predictions triggered Cd9 knockdown with variable efficacy, SplashRNA predictions consistently induce strong Cd9 suppression, closely approaching knockout conditions.

(h) Transfer function of SplashRNA score versus protein knockdown for all 62 de novo predicted shRNAs validated by immunofluorescence (Supplementary Table 2). Green triangles indicate the minimum knockdown for 80% of the predictions for a given SplashRNA score bin. Bins were defined to have a width of 0.5 with the leftmost bin starting at 0.25. For the bin centered on SplashRNA score = 1, 80% of predictions showed at least 86% protein knockdown. The expected knockdown for the top 80% of predictions (e.g. 4/5 shRNAs) increases with the SplashRNA score. Overall, 91% of predictions with a SplashRNA score >1 showed more than 85% protein knockdown.

(i) Uncropped images of Pten (Figure 2d) and Bap1 (Figure 2e) western blots, and their respective β-Actin controls. Pten predicted molecular weight (MW): 47 kDa; MW validated by Cell Signaling Technology: 54 kDa. Bap1 predicted MW: 80 kDa; MW validated by Bethyl Laboratories: 80-95 kDa. β-Actin MW validated by Sigma-Aldrich: 42 kDa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pelossof, R., Fairchild, L., Huang, CH. et al. Prediction of potent shRNAs with a sequential classification algorithm. Nat Biotechnol 35, 350–353 (2017). https://doi.org/10.1038/nbt.3807

Download citation

Received: 21 October 2016
Accepted: 18 January 2017
Published: 06 March 2017
Issue Date: April 2017
DOI: https://doi.org/10.1038/nbt.3807

This article is cited by

Ubiquitin ligase subunit FBXO9 inhibits V-ATPase assembly and impedes lung cancer metastasis
- Liang Liu
- Xiaodong Chen
- Yaping Xu
Experimental Hematology & Oncology (2024)
LncRNA Malat1 suppresses pyroptosis and T cell-mediated killing of incipient metastatic cells
- Dhiraj Kumar
- Sreeharsha Gurrapu
- Filippo G. Giancotti
Nature Cancer (2024)
PAF1c links S-phase progression to immune evasion and MYC function in pancreatic carcinoma
- Abdallah Gaballa
- Anneli Gebhardt-Wolf
- Martin Eilers
Nature Communications (2024)
SOX17 enables immune evasion of early colorectal adenomas and cancers
- Norihiro Goto
- Peter M. K. Westcott
- Ömer H. Yilmaz
Nature (2024)
ROCK1/2 signaling contributes to corticosteroid-refractory acute graft-versus-host disease
- Kristina Maas-Bauer
- Anna-Verena Stell
- Robert Zeiser
Nature Communications (2024)