SPROUT’s performance in ranking guides within a gene based on predicted repair outcome. (a) Schematic of the guide ranking experiment. Assuming a gene with three potential guides (Guide 1, Guide 2, Guide 3), SPROUT ranks the guides based on likelihood to produce a single nucleotide insertion (or deletion). In this illustration the algorithm predicts that Guide 2 produces the most number of reads with 1-bp insertion/deletion. (b) Guide ranking performance on T cells. The algorithm was trained on 435 genes and tested on the remaining 108 genes. Kendal tau (between [−1,1]) measures the rank correlation (higher is better and zero indicates no correlation), “SPROUT (# genes)” indicates the number of genes for which SPROUT predicted exactly the correct ranking across all the guides, and “Rnd Shuffle (# genes)” indicates the number of genes predicted correctly by naïve guessing. (c) Guide ranking performance on HEK293. (d) Guide ranking performance on K562. (e) Guide ranking performance on HCT116. For parts (C,D,E) the model was trained on all T cell genes and tested on 28 genes from these other cell types. (f) For each gene, we order the target sites from the most likely to introduce frame-shift outcome to the least likely, using SPROUT predictions. The table reports the fraction of genes where SPROUT correctly predicts the top target site, where SPROUT correctly predicts the complete ordering of all the target sites in the gene, as well as the correlation between the SPROUT prediction and the experimental validations. The same metrics for random prediction are reported as baselines. Bootstrap mean and standard deviation are shown in the table.