(a) Composition of the compact library, in terms of previously measured relative activities in the large-scale screen (dark purple), or predicted relative activities assigned by the CNN model ensemble (light purple). Perfectly matched sgRNAs, which by definition have relative activities of 1.0, comprise 20% of the library but were not included in the histogram. (b) Distribution of mismatch positions and types for singly-mismatched sgRNAs in the compact library, for previously measured (dark purple) and CNN-imputed (light purple) sgRNAs. (c) Heatmap showing the distribution of mutated positions for doubly-mismatched sgRNAs in the compact library. (d) Comparison of growth phenotypes measured in each K562 replicate screen 4- and 7-days post-transduction. Data from Day 7 were used for all subsequent analyses. n = 25,518 sgRNAs; r2 = squared Pearson correlation coefficient. (e) Comparison of growth phenotypes measured in each HeLa replicate screen 6- and 8-days post-transduction. Data from Day 8 was used for all subsequent analyses. n = 25,518 sgRNAs; r2 = squared Pearson correlation coefficient. (f) Comparison of growth phenotypes of original (perfectly matched) sgRNAs in HeLa and K562 cells (γ, expressed as the average of two replicate screens). n = 4,810 sgRNAs; r2 = squared Pearson correlation coefficient. (g) Measured vs. predicted relative activities of CNN-imputed sgRNAs in K562 cells (left) and HeLa cells (right). A small number of points beyond the y-axis limits were excluded to more clearly display the bulk of the distribution. n = 6,147 sgRNAs; r2 = squared Pearson correlation coefficient. (h) Comparison of sgRNA composition and model error for the large-scale and compact libraries. The CNN-imputed guides had substantially higher predicted activities than those for the large-scale validation set; higher predicted activity was generally associated with higher model error for the validation (red) and imputed (blue) sgRNA sets, consistent with the discrepancy in model performance on each set. (i) Distribution of the number of intermediate-activity mismatched sgRNAs targeting each gene in the compact library. The number of genes with at least 2 intermediate activity sgRNAs is indicated above each histogram; sgRNA activities were quantified for 1907 and 1442 genes in K562 and HeLa cells, respectively. Note that here activities are aggregated by gene as opposed to by series, as was done in Supplementary Fig. 2i. (j) Comparison of phenotypes measured in replicate screens after 12 days of growth in the drug screen. n = 25,518 sgRNAs; r2 = squared Pearson correlation coefficient. (k) Comparison of vehicle- (γ) and lovastatin-treatment (τ) growth phenotypes for all sgRNAs in the compact library. Knockdown of HMG-CoA reductase (HMGCR) greatly sensitizes cells to lovastatin, compared to knockdown of other genes such as tubulin (TUBB). n = 25,518 sgRNAs.