Supplementary Figure 1: Data set generation. | Nature Biotechnology

Supplementary Figure 1: Data set generation.

From: Prediction of potent shRNAs with a sequential classification algorithm

Supplementary Figure 1

(a-f) Generation of the M1 (miR-30, 20,400 shRNAs) Sensor assay data set (Supplementary Table 2, Online Methods).

(a) Schematic of our previously published Sensor assay that enables large-scale functional assessment of shRNA potency (Online Methods).

(b) Library complexity over Sensor assay sort cycles. Shown are normalized read numbers (parts per million, ppm) in both duplicates for each shRNA represented within the initial libraries (Vector) and the pools after the indicated sorts (Sort 3, 5).

(c) Correlation of reads per shRNA between the two replicates before sorting (left panel), after Sort 5 (middle panel) and between the initial and endpoint population (right panel; shown for one representative replicate). r, Pearson correlation coefficient.

(d) Correlation of Sensor score and reads per shRNA in the vector libraries, showing that the score is independent of the initial shRNA representation. r, Pearson correlation coefficient.

(e) Enrichment or depletion of 17 control shRNAs after Sort 5. All controls have been used in previous Sensor assays (e.g. TILE, mRas + hRAS) and are classified into a strong, intermediate and weak class according to their knockdown potency assessed by immunoblotting.

(f) Rank correlation of 325 performance control shRNAs. 65 shRNAs per gene targeting mouse Bcl2, Kras, Mcl1, Myc and Trp53 that had previously been tested as part of the TILE data set were chosen as supplemental controls to assess Sensor assay performance for weak, intermediate and strong shRNAs. The individual shRNA ranks between TILE and M1 were highly correlated (325 shRNAs, Spearman rank correlation coefficient rho: 0.63; gene-specific correlation coefficients are also reported), even though the TILE and M1 data sets were generated several years apart, using mostly different equipment, reagents and operators.

(g) Generation of the miR-E reporter assay data set (Supplementary Table 2, Online Methods). Normalized reporter knockdown values of miR-E shRNAs assessed one-by-one in an RNAi reporter assay. The shRNAs were tested in 42 individual batches, each including several control shRNAs for data scaling (miR-E Ren.713, miR-30 Pten.1524) and quality control (miR-E Pten.1523, miR-E Pten.1524). Background fluorescence of the parental chicken cell line (ERC) and maximal fluorescence of the batch-specific reporter cell line (ERC cells expressing the shRNA target reporter) were also measured. All shRNAs were grouped into either a positive or negative class. A threshold value of 80 was chosen as a cutoff, based on the performance of miR-30 Pten.1524 and miR-E Ren.713.

(h) Nucleotide representation of positive shRNAs from the indicated data sets. Shown are the nucleotides one to eight of the guide strand (starting in the center), including the entire seed region. Unbiased TILE (miR-30) set, showing a diversified nucleotide composition (left panel). Preselected M1 (miR-30, DSIR + Sensor rules selected) set, showing a biased nucleotide representation (middle panel). Preselected miR-E + UltramiR set, showing a different nucleotide bias due to the altered shRNA backbone. More shRNAs starting with a C were found to be potent (compared to TILE, p = 0.002, Fisher’s exact test), indicating less restrictive sequence requirements when using the miR-E backbone.

Back to article page