Clonal dynamics limits detection of selection in tumour xenograft CRISPR/Cas9 screens

Transplantable in vivo CRISPR/Cas9 knockout screens, in which cells are edited in vitro and inoculated into mice to form tumours, allow evaluation of gene function in a cancer model that incorporates the multicellular interactions of the tumour microenvironment. To improve our understanding of the key parameters for success with this method, we investigated the choice of cell line, mouse host, tumour harvesting timepoint and guide RNA (gRNA) library size. We found that high gRNA (80–95%) representation was maintained in a HCT116 subline transduced with the GeCKOv2 whole-genome gRNA library and transplanted into NSG mice when tumours were harvested at early (14 d) but not late time points (38–43 d). The decreased representation in older tumours was accompanied by large increases in variance in gRNA read counts, with notable expansion of a small number of random clones in each sample. The variable clonal dynamics resulted in a high level of ‘noise’ that limited the detection of gRNA-based selection. Using simulated datasets derived from our experimental data, we show that considerable reductions in count variance would be achieved with smaller library sizes. Based on our findings, we suggest a pathway to rationally design adequately powered in vivo CRISPR screens for successful evaluation of gene function.


Supplementary Figures
A B     failed in all datasets due to presence of too many zeroes, while the upper quartile method failed frequently when the upper quartile was 0. All normalisation methods were performed using the (simulated) NTC gRNA counts only, but corrected for the total (all gRNA) library size.Normalisation factors were calculated from combined datasets of simulated small and large tumours, the comparator plasmid sample, and 9 HCT116/54C cell counts and 9 samples of gRNA counts from HCT116/54C cells.

Supplementary Tables
The following supplementary tables can be found in the supplementary excel file: Table S1 Details of all gRNA reads for samples in the pilot study Table S2 Details of NTC gRNA reads for samples in the pilot study Table S3 Details of all gRNA reads for samples in the larger study Table S4 Details of NTC gRNA reads for samples in the larger study Table S5 Sample size estimates at power 0.8 in simulated datasets with varying numbers of gRNAs expressed genes or gRNAs) across samples.The non-differential features will be in the centre of the distribution of count ratios.Relative normalisation factors between samples can be robustly estimated from the central non-differential features using a median or trimmed mean of count ratios.
In the MPM method, we defined two possible stages of clipping or trimming outlying counts and ratios when calculating normalisation factors.First, the influence of outlying high counts can be reduced by clipping the top and bottom a% of counts in each sample -at each end, counts above or below the a-th percentile are replaced with the count at the a-th percentile at that end.The motivation is this step is to reduce the influence of large outlying counts and is analogous to 'A' trimming in the TMM method, except all counts are retained through the use of clipping, as opposed to trimming, each end.Although trimming is performed at each end, for highly zero-inflated data, trimming at the lower end often has no effect as the lower a-th count is also zero.
The pairwise log-fold-change (M) between samples i and j is then calculated across gRNAs from a-clipped counts r' gi , with m% of valid ratios discarded from each end to calculate the trimmed mean: This is analogous to 'M' trimming in the TMM method.The maximum of m=0.5 uses the median M value.Note that only gRNAs that have zero counts (before aclipping) in both samples are excluded from this calculation; gRNAs with a zero count in one sample are retained with their a-clipped counts given a pseudocount of c pseudo to avoid taking the log of zero.Although the use of a pseudo-count for zeroes has some weaknesses [3], zero counts are informative in terms of the ordering of two samples -a zero count in one sample is an indication the count is lower in the sample with a zero than the sample without a zero.Excluding all zero counts thus biases the count ratio upwards for samples with many zeroes.Therefore, we prefer to retain zero counts when calculating normalisation factors.The use of small pseudocounts is also consistent with the approximation of zeros as sampling zeros which have a finite but small actual abundance [4].The gRNAs that have zero counts in both samples provide no information about the ordering or ratio of the two samples, and is better to exclude these from the calculation as the tendency would otherwise be to bias the count ratio towards 1 (c pseudo :c pseudo ).
The normalisation factor s i for sample i on a log scale is then calculated as the mean of M values to all other samples: The second term scales the normalisation factors to be relative to arbitrary sample k (i.e.s k = 1).Note in MPM, normalisation factors are estimated from all pairwise ratios, as in GMPR.MPM with a=0, and m=0.5 is similar to GMPR except ratios where only one count is zero are retained in the calculation by setting the zero to a pseudocount; GMPR does not use pseudocounts and excludes all zeroes.Therefore, MPM can be seen as a generalisation of the GMPR method with additional trimming options and the use of a pseudocount.The normalisation factor is a correction factor for library sizes, with normalised library size calculated as: When normalised library sizes are used in place of unnormalised library sizes, log-ncpm becomes a measure of relative counts across samples.As the gRNA library contains 1000 non-targeting controls (NTCs), which can be assumed to be neutral, and thus non-differential on average, the size factors can be calculated using only NTC gRNAs i.e.  ∈ { } to give log-ncpm normalised to neutral gRNAs; note that full library size R is used in all calculations even when r' g is subsetted to NTC gRNAs.

Binomial thinning simulations
To compare normalisation methods, we bootstrapped (sampled with replacement) the gRNA counts from the plasmid sample to a total of 1.6 ×10 13 reads.This bootstrapped data contained a median count 1.2×10 8 per gRNA, necessary to allow individual gRNA counts of up to several million following binomial thinning.
Next, we used binomial thinning using the seqgendiff package in R [5] to generate simulated datasets each containing 10 thinned columns (samples) mimicking small tumour and large tumour samples.In each dataset, a random signal was applied to expand/reduce each count to account for stochastic clonal expansion/reduction, while fixed signals were applied across each row to all columns in a dataset to account for systematic differences in all samples due to underlying cell fitness in NTC gRNAs and guide-specific effects with a general dropout effect.The effect sizes of the fixed signals were varied slightly for each column.The random and fixed signals were randomly sampled from a normal distribution with parameters given in Table S6; separate parameters were used to simulate small tumours and large tumours.In addition, each column was further uniformly thinned to give a certain total number of reads.This uniform thinning causes many of the low counts to drop to zero and effectively zeroinflates the counts.These simulated datasets resembled the counts from our large and small tumour samples in terms of gRNA total counts, representation and count inequality.The key assumption being made in these simulations is that the combined signal applied to NTC gRNAs has a mean of 0 despite potentially large standard deviation that causes outlying high count values accompanied by many zeroes.Applied as uniform thinning for each column.
A total of 100 simulated small tumour and 100 large tumour datasets with 10 samples each were generated.To ascertain the accuracy of each normalisation method tested, we appended the original plasmid counts and the counts of 9 original HCT116/54C cell samples together with 8 simulated small tumours, and three different sets of simulated large tumours (n=10, n=10 and n=8) to generate a combined simulated dataset of 46 columns (samples).Each normalisation method was run for each combined dataset, with normalisation factors calculated from the NTC gRNAs only (apart from the "total count" method) but corrected for total library size.Next, for each simulated sample, the expected log 2 binomial thinning factor for null signals relative to the original plasmid counts (accounting for bootstrap expansion and combined thinning effects) was compared to the calculated log 2 normalisation factor determined by each normalisation method, with the two values expected to be equal.This was performed for a total of 100 random combined datasets to give n=800 (100 dataset × 8 samples) normalisation factor comparisons for simulated small tumours and n=2800 (100 dataset × 28 samples) comparisons for simulated large tumours.As an alternative method for evaluation of normalisation factors, we also compared the normalised size factors (normalised library size relative to comparator sample) to the ratio of summed simulated:comparator counts for gRNAs with null signal in the simulated sample (null signal was defined as the absolute combined random/fixed signals being < 0.05); both evaluation methods produced similar results so only the results of the first are shown.We also generated 100 simulations of large tumours (n=10) with increased uniform thinning to decrease total counts to [4e6, 5e6], as well as n=100 simulated large tumours with no uniform thinning, which increased total counts to the range [4e7, 6e9]; normalisation factors for these simulated tumours were evaluated in a dataset containing only the 10 simulated columns and the comparator (plasmid) sample.

Estimates of level of negative selection
To estimate the overall level of negative selection present in our tumour samples, we used gRNAs targeting known sets of common essential genes (CEG; n=3891 gRNAs) and non-essential genes (NEG; n=5028 gRNAs) from [6].As there was a high prevalence of zero counts in the tumour samples, we combined   S6), while with increased total counts [4e7, 6e9], a larger pseudocount (0.05) tended to produce slightly better results although the influence of the pseudocount was generally reduced in large simulated libraries with fewer zero counts (Figure S6).Nevertheless, a pseudocount of 0.02 generally performed well regardless of library size.
Although we originally expected that clipping/trimming of outlying counts would be beneficial in normalising the high variance/zero-inflated counts of the large tumours, our simulations generally suggested the opposite.A likely explanation is that the use of a log-scale in our MPM calculations greatly reduces the influence of very high counts.A high degree of averaging, first over all valid counts (non-zero in one sample) and then over all pairwise combinations is sufficient to allow reasonable accurate normalisation factors to be calculated despite high count variance.The key determinant in data with many zeroes is that pseudocount used, which can be understood as the unobserved 'average' count where the observed values are zero.A pseudocount of 0.02 worked well in simulated data with counts in the same range as our data, and was reasonably robust to different library sizes.In addition, the pseudocount only tends to have a noticeable effect in samples with many zeroes, such as simulated large tumours, with the large amount of averaging on a log scale sufficient to dampen its effect in simulated small samples.The use of clipping/trimming, conversely, tends to bias the normalisation factors is various ways, and normalisation is generally better using the simple mean on a log-scale.
We compared MPM (a=0, m=0) normalisation to the commonly used TMM and RLE normalisation methods, as well as the GMPR and the straightforward total count and upper quartile normalisation methods (Figure S7).For TMM and RLE, we also tested the methods with zeroes set to pseudocounts values of 0.5 (commonly used) or 0.02 (preferred MPM parameter) prior to running normalisation.Although the use of pseudocounts with these methods is similar to MPM normalisation, in addition to the differences in averaging methods, MPM differs in that pseudocounts are only used when only one sample has a zero count, which is possible since pairwise comparisons are performed; RLE and TMM do not perform all pairwise comparisons and thus cannot selectively exclude pairwise zeroes from the calculation when pseudocounts are used.For small tumours, MPM and RLE performed well with either a pseudocount of 0.02 or 0.5, as did GMPR (Figure S7A).TMM, regardless of pseudocount used tended to produce upwards biased normalisation factors and the total count and upper quartile performed very poorly.In simulated large tumours, MPM with a pseudocount of 0.02 clearly performed better than all other methods, with most of these producing upward biased normalisation factors in the presence of many zero counts.

DFigure
Figure S1: (A) Lorenz curves to show distribution of NTC gRNA read counts from

FigureFigure S3 :
Figure S2: (A) Heat map showing the number of the top 100 gRNAs with the

Figure S4 :
Figure S4: Plots of calculated MPM normalisation factors against expected values

Figure S5 :
Figure S5: Plots of calculated MPM normalisation factors against expected values

Figure S7 :
Figure S7: Plots of calculated normalisation factors against expected values in

Figure S8 :
Figure S8: Plots of log 2 fold-change of common essential genes (CEG) compared gRNAs into bins of 25 gRNAs, with binning performed separately for CEGs and NEGs and only retaining full bins.The binned gRNAs were further subsampled by 80% to give 124 CEG bins and 161 NEG bins.For each sample, we then calculated the mean log 2 fold-change for CEGs vs NEGs as�� 2 ( )� − � 2 ( )�� − �� 2 (   )� − � 2 (   )��,where count is the count of a bin and replaced with a pseudo-value of 0.5 if zero (percentage of zeroes was on average <0.2% in large tumours due to binning, and less than 5% in all samples).This value estimates the fold-change depletion of CEGs compared to NEGs over that observed in the inoculum.The estimate was averaged across 100 simulated (binned/subsampled) datasets and all samples in each group (small or large HCT116/54C tumours).To obtain comparable CEG vs NEG log 2 fold-change values for in vitro samples, the process (inoculum) replaced by the counts of the corresponding cell sample collected at an earlier timepoint.As the periods of culture were different for each cell sample, the average across cell samples was estimated assuming a constant rate of depletion over time by fitting a slope-only linear regression with log2 fold-change for CEGs vs NEGs as the dependent variable and culture period (between collection and the corresponding earlier comparator timepoint) as the explanatory variable.

Table S6
Parameters used for binomial thinning simulations to compare normalisation methods.Where parameters are given as ranges, a value was sampled from a uniform distribution within the range for each column (random signal/fixed effect sizes/total reads) or dataset (fixed signals mean/sd).
larger than they actually were; this property can be seen as the normalisation factors tending to asymptote towards a value determined by the pseudocount.