(a) Average fraction of indel mutant reads with insertion in target sites grouped by their nucleotide type at location -1 (adjacent to the cut site from the 5’ side). Presence of C or G at location −1 is significantly correlated with higher deletion proportion (p < 0.004, two-sided t-test) and presence of A or T is significantly correlated with higher insertion proportion (p < 10−6, two-sided t test) consistently across all cell types. We show the results for T cells (left, 1,521 sites) and the aggregate results for HEK293, K562 and HCT116 (right, 96 sites). The results in this supplementary figure are normalized for the background distribution of nucleotide types—we divided by the number of occurrences of each nucleotide in computing the average fraction of indel mutant reads with insertion. As additional controls, we also performed the same analysis at the −2, +1 and +2 locations and did not find significant differences in insertion fraction by the nucleotide type (p > 0.1, two sided t-text). (b) Average fraction of indel mutant reads with insertion conditioned on the nucleotide at position +3 (the last nucleotide before e.g. 5’ of the PAM sequence). The presence of A at +3 is correlated with higher fraction of insertions. The analyses here differ from and complement Fig. 2e. The SPROUT importance scores of 2E captures the nonlinear model’s overall prediction as to the impact of each nucleotide and position. The results here ignore the effects of other positions and plots the conditional insertion fractions. Even though the methods are different, both the feature importance scores and the conditional fractions give consistent biological findings. The mean and standard error of the mean (SEM) are shown in the tables.