Accurate determination of CRISPR-mediated gene fitness in transplantable tumours

Eirew, Peter; O’Flanagan, Ciara; Ting, Jerome; Salehi, Sohrab; Brimhall, Jazmine; Wang, Beixi; Biele, Justina; Algara, Teresa; Lee, So Ra; Hoang, Corey; Yap, Damian; McKinney, Steven; Bates, Cherie; Kong, Esther; Lai, Daniel; Beatty, Sean; Andronescu, Mirela; Zaikova, Elena; Funnell, Tyler; Ceglia, Nicholas; Chia, Stephen; Gelmon, Karen; Mar, Colin; Shah, Sohrab; Roth, Andrew; Bouchard-Côté, Alexandre; Aparicio, Samuel

doi:10.1038/s41467-022-31830-2

Download PDF

Article
Open access
Published: 04 August 2022

Accurate determination of CRISPR-mediated gene fitness in transplantable tumours

Nature Communications volume 13, Article number: 4534 (2022) Cite this article

4849 Accesses
1 Citations
12 Altmetric
Metrics details

Subjects

Abstract

Assessing tumour gene fitness in physiologically-relevant model systems is challenging due to biological features of in vivo tumour regeneration, including extreme variations in single cell lineage progeny. Here we develop a reproducible, quantitative approach to pooled genetic perturbation in patient-derived xenografts (PDXs), by encoding single cell output from transplanted CRISPR-transduced cells in combination with a Bayesian hierarchical model. We apply this to 181 PDX transplants from 21 breast cancer patients. We show that uncertainty in fitness estimates depends critically on the number of transplant cell clones and the variability in clone sizes. We use a pathway-directed allelic series to characterize Notch signaling, and quantify TP53 / MDM2 drug-gene conditional fitness in outlier patients. We show that fitness outlier identification can be mirrored by pharmacological perturbation. Overall, we demonstrate that the gene fitness landscape in breast PDXs is dominated by inter-patient differences.

Clonal dynamics limits detection of selection in tumour xenograft CRISPR/Cas9 screens

Article Open access 08 September 2023

Clonal fitness inferred from time-series modelling of single-cell cancer genomes

Article 23 June 2021

Non-genetic determinants of malignant clonal fitness at single-cell resolution

Article 08 December 2021

Introduction

Triple negative breast cancers (TNBC) are a heterogenous group of diseases, defined by the absence of detectable estrogen and progesterone receptors (ESR1, PGR), non-amplified/normally expressed Her2 receptor (ERBB2), and characterized by aggressive clinical behaviour and poor prognosis. Somatic mutations, cell of origin and germline variation contribute to the as yet poorly defined phenotypic variation of TNBC¹. Large-scale sequencing studies have provided the basis for molecular approaches to identify TNBC subsets with similar underlying biology. These include subsets defined by transcript profile, including a basal expression subtype^2,3, by the activity of recurrently dysregulated signaling pathways, by activity and deficiency in DNA repair pathways (notably double strand break repair)⁴, by combined expression and copy number profile^5,6, by clonal diversity and kinetics⁷, and by immune and stromal features in the tumour environment⁸. Insights from these classification approaches are already guiding patient stratification for treatment with novel targeted agents⁹. Nonetheless, the heterogeneity of patient response highlights the need for more direct and quantitative approaches to understand the fitness of genes within the complex genomic and epigenomic backgrounds that characterize these cancers.

Gene fitness estimates can be made at medium to high throughput by quantification of cellular growth phenotypes under defined perturbations, for example treatment with drug panels, transcript inhibition through RNA interference¹⁰ or gene targeting by CRISPR-KO or CRISPRi¹¹. In CRISPR-based approaches, integration of a vector including a single guide RNA (sgRNA) delivers both a targeted genetic perturbation in trans and a heritable genetic marker that can be used to identify the perturbation in the cell’s progeny. This underlies pooled library screening approaches, in which cell populations transduced with diverse libraries of CRISPR guides are jointly propagated under competitive conditions, and the relative growth of subsets bearing each perturbation quantified by next generation sequencing. This approach has been widely applied to cancer cell lines propagated in vitro, identifying genes broadly essential for cell growth and survival, genes with differential fitness in lines of particular origin or mutational background, and combinatorial gene-gene and/or gene-drug interactions^12,13.

Extending these approaches to in vivo models such as patient-derived xenografts (PDX) offers the potential to characterize gene fitness in physiologically-relevant settings and in mutationally and clonally complex tumour backgrounds. However, aspects of xenograft biology pose notable challenges. Firstly, tissue transplantation introduces a cellular survival bottleneck, with tumours representing the clonal progeny of limited numbers of input cells^14,15. This affects statistical strength by putting a ceiling on the number of independent datapoints (clonally-expanding cell populations) sampled. Secondly, there is considerable heterogeneity in the cellular output of transplanted cells that successfully propagate to the final tumour, with relative clone sizes spanning 3-4 orders of magnitude in breast cancer PDX models^16,17. This can add considerable noise when assaying signals from introduced perturbations, since final clone populations sizes compound their effects with the cellular growth potential of each transplanted cell.

A strategy to address heterogeneity of cellular output is to incorporate degenerate cellular barcodes (unique molecular identifiers, UMIs) into experimental vectors. Each transduced starting cell then receives both a functional perturbation via the sgRNA and a genetic marker via the UMI, which are passed to the cell’s clonal progeny^18,19. Statistical approaches informed by joint sgRNA-UMI counts can improve signal detection in the screen, for example by reducing confounding effects of clones of outlying size. The approach has been applied to pooled screens in 3D organoid culture²⁰ and mouse tumour models^21,22,23, which are characterized by heterogenous cellular output.

Extending this concept, we develop here joint UMI and sgRNA-encoding vectors and use these in conjunction with a Bayesian statistical model informed by clone size distribution to make quantitative inference of gene fitness in PDXs. We demonstrate statistical power to resolve in vivo gene fitness differences and driving pathway mechanisms in a diverse set of 181 CRISPR-transduced tissue xenotransplants from 21 different breast PDX lines.

Results

Factors influencing quantitative gene fitness measurements in vivo

We sought to develop a rigorous quantitative approach to estimation of fitness via in vivo population competition experiments. It is well known that the total progeny arising from in vivo proliferation of single tumour cells in primary human transplant experiments is highly variable^16,17. The resulting distributions of clone sizes have a heavy tailed form, in which small numbers of cells may contribute disproportionately to the total population of cells as the system evolves in time. In experiments which rely on sampling from clonal populations of this form, the accuracy and precision of estimates is strongly related to the numbers of clones sampled, with loss of precision when sampling is sparse.

The strength of these effects was evident from a simulation in which clones were randomly sampled repeatedly from a published breast cancer PDX data set¹⁶ (Fig. 1a, left panel). As the sample size falls below ~500–1000 clones, the heavy-tailed size distribution results in increasingly greater uncertainty over the mean clone size (Fig. 1a, right panel, 90% credible interval (CI) of mean estimate). The effects of sparse sampling can be partially offset by re-scaling the extreme outliers (winsorization, Fig. 1a, right panel), however the general form persists.

**Fig. 1: Quantitative assessment of gene fitness in vivo.**

We thus reasoned that rigorous quantification of fitness requires (i) a method of estimating clone size for each perturbation in the population (in this case, for each sgRNA molecule introduced into a single cell during tumour transplantation); (ii) appropriate sizing of the sgRNA library, given the clone numbers and variability in clone size in the tissues/tumours being sampled; (iii) minimizing distortion of clone number and size during experimental manipulation; and (iv) a statistical model for estimating fitness and variance from multiple measurements of a sample population, accounting for the form of the clone size distribution.

Population size encoding of CRISPR-transduced transplant cell progeny through UMI tags

First, to measure sgRNA clone sizes, we introduced a degenerate barcode sequence (unique molecular identifier, UMI)²⁴ into a Cas9-containing lentiCRISPRv2 vector (Figs. 1b and S1a, b). In this arrangement, each vector molecule is coded with a unique sgRNA-UMI sequence that can be read through as a single molecule in sequencing. We verified the compatibility of UMI encoding with CRISPR-sgRNA function. Plasmids containing a 27 nucleotide UMI sequence insert either upstream or downstream of an sgRNA targeting enhanced green fluorescent protein (eGFP) demonstrated comparable silencing efficiency of the target protein as the unmodified vector when transfected into an eGFP-expressing cell line (Fig. S1c). Downstream UMI-containing plasmid packaged efficiently into lentiviruses (typical functional titres 0.5 × 10⁸–3 × 10⁸ infectious units/mL). Pooled CRISPR competitive growth assays with the modified vector carried out in 2D culture in breast cell lines (luminal-type MCF7 and normal-derived 184htert TP53^–/– BRCA2^–/–) showed expected depletion of sgRNAs targeting known essential and growth-promoting genes (e.g. PLK1, PLK4, CHEK1, MTOR, MYC) (Fig. S1d). Together, these indicate that the addition of the UMI sequence is compatible with normal function of the lentivirally-delivered CRISPR-Cas9 system.

Second, a key initial factor to avoid biased starting conditions is the size of the sgRNA library relative to the number of clones and variability in clone size. We estimated that transplants of 1 × 10⁶–1 × 10⁷ cells would yield tumours of ~1 × 10⁴–1 × 10⁶ clones, given reported breast PDX clone-initiating frequencies of 1 per 1 × 10¹–1 × 10⁴ cells^16,17. This suggests that screens employing libraries of ~10–1000 different guides can be adequately powered to achieve clone numbers of 500–1000 per guide.

Based on this, we cloned a library of sgRNAs targeting growth factor receptors, hormone receptors, key members of PI3-kinase, integrin, DNA damage, RAS-MAPK, cell-cycle control, Notch, Wnt, Hedgehog, Hippo, hypoxia and autophagy pathways (168 sgRNAs targeting 56 genes in triplicate plus 24 non-targeting sgRNAs, 192 sgRNAs total, Table S1) into the vector. Targeted amplicon sequencing confirmed a broad joint distribution of sgRNAs and UMIs in the library (Fig. S1e, f).

Details of transduction and PCR sequencing of libraries are in Methods. Briefly, PDX tissue was disaggregated quickly to small organoids, avoiding lengthy enzymatic dissociation processes that can reduce viable cell numbers. Following transduction with virus for four hours ex vivo, tissue was transplanted into recipient mice, allowed to grow tumours of at least 0.1 cm³ in size over a period of weeks, then harvested for DNA extraction (Fig. 1b). For most experiments we used a viral dose of 8–10 × 10⁶ infectious units per transplant (range 2.8–24.6 × 10⁶ infectious units in entire study, Table S2). A PCR amplicon spanning both sgRNA and UMI was sequenced (Illumina MiSeq or NextSeq) to a depth of ~2 × 10⁶ reads per tumour. sgRNA and UMIs were extracted from sequence data, filtered for sequence quality, with all reads with matched sgRNA and UMI vector sequences considered as individual clones of an sgRNA-UMI introduced into a cell. To reduce PCR and sequencing error inflation of sgRNA-UMI clone measurements, a round of in silico clone aggregation was carried out using the UMI-Tools Directional algorithm²⁵.

Heterogeneity of clone size in sgRNA-UMI PDX transplants

To explore the clonal features of transduced PDX tumours, we first enumerated the number of unique sgRNA-UMI clones measured in transplants from breast cancer PDX lines derived from 21 different patients (Fig. S2). We observed a wide range of clone numbers in individual transplants (90% range: 0.006–0.424 × 10⁶ clones per million reads; 33–2210 per sgRNA). This equated to a greater than 10-fold range in the median in different breast PDX lines (Fig. 2a), consistent with variability found in previous analyses of clone initiation in breast PDX^16,17.

**Fig. 2: Transplant clone diversity in breast cancer PDXs.**

Comparison of replicate transplants further indicated that sgRNA-UMI clone number is a reproducible property of PDX lines (Fig. 2a). Within our panel of PDX lines, TNBCs and hormone receptor positive subsets included medium and high clone number examples, whereas clone number was low (fewer that 10⁵ sgRNA-UMIs per 10⁶ reads) in the three hormone receptor negative Her2 positive lines (Fig. 2a).

We reasoned that low sgRNA-UMI clone number in certain PDX lines could reflect heavy-tailed clone size distributions, with sizeable proportions of total tumour cell mass represented by small proportions of transplant clones. We therefore compared sgRNA-UMI clone number with two different measures of diversity; Shannon diversity index (SDI), a composite entropy measure of evenness of clone size distribution and total clone number; and the area under the Lorenz curve plot of cumulative reads against cumulative clones, a function of clone size distribution alone (Fig. 2b, c, Supplementary Methods 1.2). We found clone number to be positively correlated with SDI and with homogeneity in clone size (Fig. 2b), with tumour composition ranging from extremes of large numbers of more evenly-sized clones (high entropy) to small numbers of more heterogeneously-sized clones (low entropy). We did not observe outcomes with large numbers of evenly-sized clones. These relationships were maintained independent of whether cells received a neutral sgRNA or a gene-targeting sgRNA (Fig. 2c), indicating that clone number and size distribution are intrinsic properties of the PDX transplant cellular regenerative process, not solely consequences of introduced gene perturbations.

We found that extended propagation time in vivo led to a decrease in overall diversity (Fig. 2d). To maintain favourable sampling diversity, we therefore harvested tumours at relatively small size (~0.1–0.15 cm³, typically, 3–8 weeks’ growth in vivo) in subsequent competitive fitness experiments. Transplant clone diversity did not show a correlation with the extent of genomic clonal diversity within PDXs, as measured by the number of mutational clusters inferred from bulk WGS by Bayesian population model PyClone-VI²⁶ (Fig. 2e).

Finally, to examine biological features contributing to sgRNA-UMI clone diversity, we used gene set enrichment analysis²⁷ on bulk RNAseq to identify pathways with differential activity in PDX lines with higher compared with lower SDI. Within the Reactome Hallmark pathway set (www.reactome.org), PDX lines with higher diversity were characterized by elevated activation of cell cycle-related pathways, notably G2M checkpoint and E2F targets (Fig. 2f). In contrast, PDX lines with lower diversity showed elevated activation of allograft rejection and inflammatory pathways, notably interleukins and interferons (Fig. 2f). This suggests interaction between transplanted tumour cells and the host innate immune system as a possible mechanism to explain the lower number of engrafting transplant clones in these PDX lines. Elevated cytokine response might also reflect response to endogenous DNA damage in the low sgRNA-UMI diversity subset. However, we did not find a correlation between sgRNA-UMI diversity and the overall burden of somatic single nucleotide variants in PDX lines to support this interpretation (Fig. S3a). Similarly, we did not observe strong correlation with genome-wide mutational signatures²⁸, with only the SBS8 signature²⁹ showing a significantly non-zero regression slope coefficient (p = 0.008) with sgRNA UMI diversity but with low overall coefficient of determination (r² = 0.33) (Fig. S3b). Thus, measured clone diversity in library transplant experiments appears to be driven by a number of biological features, including proliferating-cell content of the transplanted material and activity of immune-pathway cytokines in tumour cells.

Taken together the key elements required for population fitness measurements were identified as, appropriately scaled perturbation library size; short transduction times with minimal tumour manipulation; early harvesting of tumour transplants; and evaluation of intrinsic clone diversity capacity of each PDX.

Fitness inference informed by UMI distributions and a Bayesian hierarchical mixture model

We anticipated that PDX line differences in cellular regeneration properties would translate into differences in the statistical power of pooled CRISPR screens, notably a reduction in power in PDX lines generating tumours comprising fewer clones. To control for PDX specific effects we developed a modeling approach using jointly measured sgRNA-UMI data to infer the fitness impact of sgRNA perturbations introduced synchronously into a PDX. We developed a hierarchical Bayesian model to jointly analyze the observed UMI and sgRNA read count data, which allowed us to quantify the uncertainty in fitness estimates via credible intervals obtained from the posterior distribution (Fig. 3a, Supplementary Methods 1.5). To account for differences in the prevalence of sgRNAs within each library (Fig. S1f), sequence read count data from tumours propagated in mice (termed final counts, D^j_t) are modelled along with read counts from the plasmid vector library used for packaging into lentivirus used to transfect (termed initial counts, I_t). To increase statistical power, we included the ability to jointly analyze multiple replicates within the modeling framework. Winsorization (which here revalues the top 2% of the posterior distribution to the 98th centile point) was used to produce robust estimators that reduce the impact of clones of outlying size (Fig. S4c). The model returns the posterior distribution of the winsorized mean of cellular output as the sgRNA fitness estimator (Fig. 3b, left 3 panels represent replicate transplants from a high diversity PDX line, right panel shows the grouped analysis, Supplementary Data 1). Fitness in this modeling approach is thus a measure of the relative expected cellular output of cells transduced with different guides, after lowering of the impact of outlying large clones through winsorization.

**Fig. 3: Bayesian model for UMI-informed fitness inference in pooled screens.**

We used a Bayesian goodness of fit procedure to evaluate the performance of different generative functions modeling distributions of this type (Supplementary Methods 1.5.6). The procedure measures the percentage of sgRNAs for which observed mean clone size falls within the modeled 90% posterior credible interval (expected to be close to 90% of the sgRNAs if the fit is good). Likelihoods based on mixture models (e.g. sum of two negative binomial distributions with different means and dispersions) resulted in improved goodness-of-fit when applied to datasets representing a range of sgRNA-clone diversity (Figs. 3c and S4a). In contrast, likelihoods based on a single distribution functions (e.g. Poisson, negative binomial, beta negative binomial, Yule-Simon) had lower flexibility to fit to the form of empirical distributions, resulting in lower coverage of observed means within modeled credible intervals. While goodness-of-fit and marginal likelihoods varied with the use of different generative functions (Supplementary Methods 1.5.6), we note that the median Bayesian fitness estimates were broadly similar, with the exception of the poorly-fitting Poisson model (Fig. S4b). For subsequent analysis, we chose the mix-NB model (mixture of two negative binomial distributions). This distribution has defined moments, making model interpretation more straightforward, and also has faster execution time compared with mix-BNB, which has slightly higher marginal likelihood and overall goodness-of-fit. We also note that the better fit of mixture models is consistent with the presence of more than one biologically distinct cell type in the initiating transplants.

For comparison, we also generated sgRNA-UMI-informed fitness estimates and confidence intervals using a frequentist methodology based on a ratio statistic and the Delta method (Supplementary Methods 1.4). While Bayesian and frequentist methodologies generated broadly similar rankings of guide fitness in single data set analyses (Fig. S4b, right panels), the Bayesian approach was better suited to leveraging of replicates in grouped analyses in which data and a subset of the model parameters were shared between multiple data sets from biological replicate transplants (Fig. 3b, right panel; note the narrower posterior distributions compared with single data set distribution in three left panels).

While our modeling approach assumes that each transduced cell receives a single viral integration, it is known that transduction of cell populations with viruses results in a subset of cells receiving two or more integration events. Multiple integration events may lead to some overestimation of clone number, since sequences from the progeny of these cells are counted in more than one clone. Nonetheless, this effect should affect guides uniformly and thus be neutral for relative fitness estimates. We investigated this with a model version in which initial cell transduction frequency is assumed to follow a Poisson distribution (Supplementary Methods 1.5.4). This model showed a negligible increase in marginal likelihood in our data sets relative to the single infection mix-NB model (Supplementary Methods 1.5.7), implying that a multiple infection model does not bring significant benefits in fitting the data. We also carried out transduction experiments in which parallel transplants received viral doses varying over a four-fold range. Fitness estimates were well correlated (r² 0.766–0.808, Fig. S5a), and only small variation in credible interval width was observed over the viral dosing range (median 95% interval 0.219, 0.209, 0.208 at doses of 5, 10, 20 × 10⁶ infectious units). For simplicity, we therefore settled on the single infection model version for subsequent analyses.

We used variants of the signaling library that included negative control guides that are non-genome targeting (no matching sequence in human genome) or non-gene-targeting (matching sequence in the human genome, at least 5 kbp distant from any coding region). Comparison of these offers an estimate of the magnitude of non-target-specific growth inhibition from Cas9-induced genome cutting, since the latter but not the former controls are expected to target any locus. Consistent with this, non-genome-targeting control guides registered higher fitness compared with non-gene-targeting controls (Fig. 5a, ctl and ctl_t, respectively), with a mean difference in net growth of 0.30. A copy number dependent non-gene-specific negative effect on cellular proliferation on CRISPR cutting has been previously documented in cell line studies¹².

We next examined the relationship between clonal diversity in PDX tumours and the power to resolve fitness differences between sgRNA targets. Using the mean 95% Bayesian credible interval width as a proxy for the resolution of a pooled PDX screen, we observed a strong inverse correlation between clonal diversity and resolution (Fig. 3d, left panel). This was also reflected in a greater reproducibility of fitness ranking between replicate transplants from higher diversity compared with lower sgRNA-UMI diversity tumours (Fig. 3d, right panel). The inverse relationship between clone diversity and screen resolution was also seen when reducing UMI numbers by sub-sampling clones from single datasets, or by increasing number by combining biological replicate datasets (Fig. 3e). As a graphical representation of fitness differences resolved at significance between sgRNAs, we generated partially ordered sets (posets), visualized using Hasse diagrams, which show minimal sets of edges connecting sgRNA nodes with non-overlapping 95% Bayesian credible intervals (fitness a < b, and such that there are no c such that a < c < b) (Fig. S6). Here, sampling of fewer clones resulted in progressively simpler graphs resolving fitness differences between fewer sgRNA pairs (Figs. 3f and S6).

Overall, these observations demonstrate that the power to resolve in vivo fitness differences between genes is critically dependent on clone diversity, derived from the number of transplant clones in a regenerated tumour and the heterogeneity in their sizes. These are biological properties of PDXs, varying considerably between lines derived from different patient tumours.

Intrinsic between-patient variation dominates primary breast cancer gene fitness in vivo

Having established Bayesian fitness measures from sgRNA-UMI clones in vivo, we set out to examine sources of variation of single gene fitness among different patient tumours for the functions of PI3-kinase, integrin, DNA damage, RAS-MAPK, cell-cycle control, Notch, Wnt, Hedgehog, Hippo, hypoxia and autophagy pathways represented in the sgRNA library. We conducted multiple technical replicate transplant experiments (n = 3–6) for 15 patients (10 TNBCs) from the 21 patient series (6 excluded due to low sgRNA-UMI diversity), contrasting different passages and anatomical sites of transplant of individual patient tumours.

First, to assess gene fitness in different transplant sites, we compared replicate transplants from 6 different PDX lines grown in separate groups of mice in subcutaneous or inguinal mammary fat pad sites (Fig. 4a). To facilitate comparison between experiments and sgRNA libraries, fitness values were normalized within each dataset such that the 70th centile value among gene-targeting guides was set to a neutral level of 1 (Supplementary Methods 1.6). We applied a stringent set of criteria to identify targeting guides showing statistical and biological significant difference between sites, namely: fitness differential that is non-zero (by 95% posterior credible interval); targeting guides that show statistically significant fitness differential relative to the variability in neutral guides; rank differential that is non-zero (by 95% posterior credible interval); and at least two guides targeting the same gene satisfying these three criteria (Supplementary Methods 1.8). Representative comparison examples are shown in fitness and rank space, with guides satisfying the difference criteria highlighted (Fig. 4a). We did not observe strong site-specific fitness differences, although in four of six patients EGFR exhibited modestly lower fitness (~0.4) in the mammary fat pad site than subcutaneous site (Figs. 4a, left two panels and S7). Related pathway ERBB family receptors exhibited overall neutral differential fitness, with one patient each exhibiting a growth effect for ERBB2 and ERBB3 (Fig. S7). Thus, for all but EGFR, site of transplant had little or no effect on the measured fitness ranks.

**Fig. 4: Between-patient variation dominates in vivo gene fitness.**

Polyclonal primary breast cancers evolve in patients and in PDX transplants³⁰. To assess the variation in fitness with serial transplant passaging, we compared an early passage (2nd–3rd mouse generation since initiating transplant from patient) with a later passage (6–12th) for three PDX lines (intervening time 15, 18 and 23 months in vivo). We observed relative stability in the rank ordering of guides, implying that biological drivers at early passages remain functional over significant periods of time in vivo (Figs. 4b and S8). Notable outliers included TP53 in C1368, a TNBC line in which the all three TP53-targeting guides showed positive fitness at passage 2 (consistent with wild-type TP53 status in this line) and neutral fitness at passage 10 (Fig. 4b, right two panels). We examined mutation status and copy number in pseudobulk clonally merged genomes derived from DLP + scWGS³¹ sequenced at the early and late passages. In spite of passage-related copy number clonal evolution, we did not find somatic mutations in or closely related genes to explain this functional shift. We also noted that guide fitness profiles were well correlated between transplants of the same tumour material harvested after varying propagation periods in vivo (r² 0.759–0.892, Fig. S5b), indicating that time-dependent fitness within a passage generation was not a strong feature in these experiments.

In contrast with the relative stability in fitness profiles between technical replicates, or variations in site and passage, we observed considerably greater fitness variation when comparing PDX lines originating from different patients (Fig. 4c). These comparisons indicated different functional drivers within the TNBC subset of lines, as well as showing the expected differences in hormonal signaling in ER + lines compared with TNBC (Fig. S9). Quantifying the proportion of genes satisfying the multiple difference criteria across the entire data set (5399 pairwise comparisons) showed a progression of increasing fitness variability from technical replicates or site comparisons to passage comparisons to inter-patient comparisons (Fig. 4d, left panel, Supplementary Methods 1.8). A similar ordering was seen using the Spearman rank correlation of guide fitness in the comparisons (Fig. 4d, right panel), with between-patient variation dominating the fitness landscape. Taken together, by far the greatest source of variation in fitness among genes assessed in vivo arises from intrinsic differences between primary breast cancers.

We next assessed the variation in gene and patient fitness of core genes from PI3-kinase, integrin, DNA damage, RAS-MAPK, cell-cycle control, Notch, Wnt, Hedgehog, Hippo, hypoxia and autophagy pathways. In all series, guides targeting genes with known essential biological function (e.g. PLK1, PLK4, CHEK1) registered overall negative fitness (i.e. lower net growth when targeted with CRISPR), though the extent varied between PDX series (Fig. 5a, upper panel). A similar fitness pattern was observed in strong growth-promoting genes such as MYC, MTOR and CDK12. The fitness of these large effect size genes was concordant with those identified as broadly essential in breast cancer cell lines propagated in vitro (Fig. 5a, middle panel shows gene-sensitive proportion of 25 breast cell lines from Cancer DepMap data set³², Fig. S10). Among moderate effect size genes we observed differences in fitness ranks between in vivo (PDX) and in vitro (DepMap) datasets for PDK1 (oxidative phosphorylation), RAF1, CA9 (hypoxia), ALK, SRC, LEF1 (Wnt), YAP1 (Hippo) and FLT1 (VEGF pathway) (Fig. 5a, lower panel). BRCA2, which has near neutral fitness in cell lines, exhibits a more pronounced negative fitness in primary PDXs. Taken together, these indicate functional differences between cell lines (many derived from metastatic and/or post chemotherapy treated tissues) and primary breast cancers grown in vivo.

**Fig. 5: Inter-patient differences dominates the variability in fitness in PDXs.**

Beyond high and moderate fitness genes, many other signaling genes were characterized by neutral fitness in many series, however with notably outlying non-neutral fitness (negative or positive) in specific series. Series-specific gene outliers included MET in C3278 and C1471; PARP1 in C0331; PIK3CA / PIK3CB / PIK3R1 in C1383 and C1471; FGF receptors in C1368, C3037 and C1373; and EGFR/ERBB family receptors in C1379, C1383, C1392 and C1373 (Fig. S9). Among RAS-MAPK pathway genes tested, RAF1 sensitivity was observed in around half of PDX series. We did not observe strong fitness outliers in RAS genes (KRAS, HRAS, NRAS) (Figs. 5a and S9) in primary PDX, however cell lines exhibited greater median RAS pathway negative fitness overall, indicating potential constitutive differences between primary PDX and cell lines in RAS signaling. RAS genes typically gain oncogenic function through hotspot base substitution mutations, which are rare in breast cancer and absent in our PDX panel, though frequent in tumours types such as colon, lung and pancreatic ductal adenocarcinoma.

To exclude large expression or copy number effects on fitness, we next compared fitness measures with transcript expression level and genomic copy number of the targeted genes (Fig. 5b, c). These data were derived from bulk RNAseq and WGS from additional tumours in the same patient PDX series. We noted a number of genes in which fitness outliers occurred in series with highest levels of transcript expression or copy number. These included ESR1 and ERBB2, which are established single gene biomarkers predictive for response to anti-estrogen or HER2-targeting therapy. Furthermore, MDM2 fitness was most negative in series C2553, in which copy number amplification is associated with elevated expression (see section below). For the majority of genes, though, fitness did not show a clear correlation with either transcript expression or copy number. Cell signaling pathways are complex, with regulatory characteristics including functional degeneracy, thresholding, feedback and cross-pathway interaction. It is not surprising therefore that functional readouts such as gene fitness do not in general reduce to simple measures of state, highlighting the utility of making such measurements empirically.

Validation of fitness outliers with allelic series pathway mapping and pharmacological inhibition

To illustrate the reproducibility and validation of primary in vivo fitness measures we adopted the standard approach of constructing a de-novo allelic series, in which a series of guides target multiple gene regions and also different genes within a pathway. In our initial transplant experiments, we observed several PDX lines with sensitivity to APH1A (Fig. 6a). This gene is a member of the gamma-secretase complex required for proteolytic cleavage of Notch receptor. We therefore selected two negative fitness (C0331, C1368) and one neutral (C2271) line for detailed fitness mapping on the Notch pathway. The Notch pathway has been implicated in breast cancer, but defining the fitness of Notch pathway genes requires phenotypic assays of cell-cell contacts in 3D or in vivo. Genome sequencing revealed that the APH1A-sensitive TNBC line C1368 contains a somatic translocation between NOTCH3 and MEMO1P4, resulting in a NOTCH3 transcript missing the PEST domain-containing C-terminus exon (Fig. 6b). The PEST domain of Notch normally negatively self-regulates survival through targeting the Notch receptor protein for degradation. PEST domain deletion or mutation extends the duration of Notch signaling and has been observed in several cancer types³³. We constructed a 128-sgRNA library targeting 26 Notch pathway genes, including 4 sgRNAs targeting the NOTCH3 PEST domain and 4 sgRNAs targeting other exons in the gene (Fig. 6b, c, Table S1). Consistent with the genomic aberration, C1368 was highly sensitive in vivo to sgRNAs targeting non-PEST 5’ regions of NOTCH3, whereas sgRNAs targeting the PEST3 domain were functionally neutral (Fig. 6d, e). Sensitivity to APH1A was mirrored in negative fitness values for multiple other members of the gamma-secretase (NCST, PSEN2, PSENEN) and the ADAM10/17 cleavage complexes (Fig. 6d). Negative fitness was observed to Notch downstream signaling transcription factors HES and HEY and co-activators MAML1 and MAML2, in conjunction with positive fitness in transcriptional co-repressor NCOR2 (Fig. 6d). sgRNAs targeting extra-cellular Notch ligands DLL1/4, DTX1 and JAG1/2 showed neutral fitness (Fig. 6d). These results are consistent with a dominant, cell-autonomous activation from the fusion gene. Comparison with the fitness landscapes in C2271 and C0331 confirmed the heightened relative sensitivity of C1368 to NOTCH3, as well as to the downstream pathway effector HES1 (Fig. 6f, left and middle panels). In contrast the fitness landscapes of C2271 and C0331 differed mostly in extracellular signaling ligands DLL1 and JAG1/2, as well as transcriptional coactivator MAML2 (Fig. 6f, right panel). Taken together, the allelic and pathway fitness measurements reflect qualitative and quantitative differences in Notch signaling, with ligand mediated effects in wild type tumours contrasting with the PEST-domain-lacking NOTCH3-MEMO1P4 translocation product, which dominantly activates pathway signaling in C1368.

**Fig. 6: Notch pathway allelic series highlights driver mechanism in a NOTCH3-rearranged PDX line.**

In a second validation approach, we set out to test whether fitness phenotypes in our pooled screens are predictive of drug sensitivity in specific patient backgrounds. We thus performed short-term toxicity assays using cells dissociated from nine different PDX lines and propagated ex vivo as 3D organoids. We used a panel of seven compounds, targeting MDM2, MTOR/PI3K, FGR receptor signaling and MYC pathways. Drug-response curves were obtained over a 0.01–100 mM concentration range (Fig. 7a). Drug treatments differ considerably in biophysical mechanism of functional inhibition compared with sgRNA/CRISPR. Nevertheless, in spite of differences in time frame (96 h vs 3–10 weeks), system (in vitro culture vs in vivo xenograft) and perturbation modality (drug vs gene editing), we found several examples in which PDX line specific gene fitness outliers in the pooled CRISPR setting (Fig. S9) correspond with drug sensitivity in the organoid drug assays (Fig. 7a, PDX with fitness outliers in the drug target shown as solid dose-response curves in upper panel; labeled in comparisons with half-inhibitory drug dose in lower panel). Notable examples are MDM2 gene fitness with sensitivity to inhibitors Nutlin-3a and Idasanutlin in PDX line C2553, MTOR with inhibitors Rapamycin and Everolimus in C1379, FGFR2 with FGF receptor tyrosine kinase inhibitors Erdafitinib and Infigratinib in C1368 and MYC with BET inhibitor JQ-1 in C1379 (Fig. 7a).

**Fig. 7: Conditional *TP53* fitness in an *MDM2*-amplified PDX line.**

To further demonstrate that in vivo fitness measurements have quantitative value, we probed pharmacologically an MDM2 fitness outlier. The PDX line most sensitive to in vitro pharmacological MDM2 inhibition (C2553) previously showed outlying fitness in MDM2 and close to neutral fitness in TP53 in the in vivo CRISPR screens (Fig. 7b). This TP53 wild-type ER + PDX line contains a 6-copy amplification in MDM2 (Fig. 7c), resulting in the highest transcript expression of this gene among our PDX series (Fig. 5b, rightmost bar in MDM2 panel). MDM2 is a negative regulator of TP53, shortening its survival by targeting for proteasomal degradation. The CRISPR and pharmacological sensitivities are consistent with attenuated TP53 activity resulting from inhibitions by MDM2. We hypothesized that pharmacological MDM2 inhibition would reactivate wildtype TP53, and so cause a shift from neutral to positive fitness in TP53. To test this, we carried out in vivo assays using the signaling library with mice randomized to receive 100 mg/kg Idasanutlin or vehicle control for 14 days, starting from a small tumour size of ~0.05 cm³. As predicted, all three sgRNAs targeting TP53 exhibited a shift towards positive fitness in Idasanutlin-treated compared with vehicle-treated mice, with all three guides satisfying the multiple difference criteria (mean fitness difference 0.27) (Fig. 7d). We also noted a shift under drug treatment towards negative fitness in sgRNAs targeting autophagy-promoting gene ATG4B (two of three guides, mean fitness difference −0.40), consistent with reports that nutlin class drugs can upregulate the autophagy pathway³⁴. Furthermore, hedgehog pathway receptor SMO showed a shift towards lower fitness on drug treatment (two of three guides, mean fitness difference −0.26), consistent with reports that this pathway can up-regulate MDM2 expression³⁵. Together these data provide additional evidence for an epistatic relationship between TP53 and MDM2 in primary breast cancers, and highlight that complex conditional drug-gene interactions can be dissected in PDX using the UMI-encoded fitness approach.

Discussion

Measurements of gene fitness in replicating cells have proven to be a powerful means to understand biological functions in cancer model systems. However, understanding of the parameters that make robust quantification of fitness possible in physiologically growing tumours has been lacking. Importantly, PDXs allow functional study of an extended range of patient genotypes in many tumour types, including breast, where the study of gene fitness has been limited by the difficulty of cell line and long-term organoid generation. This is an especially acute issue for primary breast cancers, the majority of studies having been conducted with metastatic cell lines in vitro. Here we show how UMI-encoded sgRNA perturbation libraries can be used to quantify gene fitness in vivo in breast PDXs. By modeling the number and size distribution of sgRNA-UMI clones in PDX in a hierarchical Bayesian mixture framework, matching library size and length of tumour growth, reproducible estimates of gene fitness can be obtained.

Bayesian modeling of clone size distributions, in combination with winsorization of the distribution tail, has the benefit of reducing potentially confounding influences on fitness estimates of outlying large clones. This is of particular relevance to PDX models, in which the clone size distribution has a heavy-tailed form. Even in less clone-size heterogenous datasets generated from cell line culture, the use of median clone depletion measures has been shown to improve signal in CRISPR screens^18,19. Our modeling approach could also be applied to quantify fitness in organoid cultures, in which clone size heterogeneity along with more limited biomass present a challenge to the library size that can be used quantitatively^23,36,37,38. The approach is also applicable to other pooled perturbation modalities, such as screening with libraries of RNA interference constructs. In common with many CRISPR approaches which measure net proliferative output, the fitness estimates principally reflect cell-autonomous effects of gene perturbation. There may in addition be contributions from cell-cell interaction and from inter-clone competition for limiting resources, which would require additional experimental and modeling approaches to disentangle.

We find that the resolution with which gene fitness can be ascertained is critically dependent on features of the regenerative process underlying xenograft tumour growth, encoded in UMI distributions. Notably, statistical strength is increased in PDX lines in which larger numbers of transplanted cells contribute to xenografts, and in which there is lower heterogeneity in the clonal cellular contribution of these cells. Moreover, we find that these are reproducible properties of individual PDX lines. In practical terms, we can assess sgRNA panels at medium throughput (~100–200 guides in parallel), resolving in vivo gene fitness differences of moderate size (~0.2–0.5) in PDX lines which exhibit medium to high sgRNA-UMI diversity (typically, >500 UMIs per guide, SDI > 9). This level of throughput makes it feasible to carry out unbiased studies, screening several thousand genes spread over a number of libraries, to search for genes with fitness effects in a specific patient tumour genotype. In lower sgRNA-UMI diversity PDX lines, we are powered to detect fitness differences where the gene perturbation phenotype is strong (e.g. known essential genes). In all cases, resolution may be improved by pooling of biological replicates, a key feature of the Bayesian model. Importantly, in many PDX lines this allows addressing biological questions through negative or positive selection approaches. Positive selection, in which the perturbed phenotype can be orders of magnitude greater than neutral, has been successfully applied to in vivo model systems, to uncover novel tumour suppressor mechanisms^21,22,39. In contrast, negative selection phenotypes, such as the reduced growth when a driver gene is inhibited, or conditionally lower gene fitness under drug treatment, have been more challenging to quantify at medium throughput in vivo. The UMI-informed Bayesian framework provides internal measures to assess interpretability and guide the design of suitable powered studies. At the library level, a relatively even representation of sgRNAs is desirable, to avoid wider statistical uncertainty in fitness estimates in low prevalence guides. Modeling of UMI distributions also lowers the potentially confounding impact of clones of outlying size, which may reflect developmental potential of rare cells within tumours, or chance combinatorial phenotypes in cells transduced by more than one guide. The size of gene libraries that can be screened is thus a product of line-intrinsic single cell output and transplant biomass, modified by transfection conditions and length of time in vivo. This approach would feasibly allow screens of most cellular core modules for example, by constructing libraries of ~1000 core genes⁴⁰ and utilizing high diversity lines, with pooling from replicate transplants.

Transcriptome analysis suggests a variety of biological pathways underlie the differential clonal composition. These include elevated activity of cell cycle pathways in high sgRNA-UMI diversity lines, which may be indicative of a progenitor rather than highly differentiated developmental phenotype. In addition, interleukin and interferon cytokine pathways were elevated in low sgRNA-UMI diversity PDX lines, suggestive of a cellular bottleneck related to interaction between certain xenografted tumours and the host innate immune system (which, unlike adaptive immunity, is intact in the NRG/NSG strains used). Alternatively, elevated cytokine expression may reflect endogenous response to high rates of DNA damage, though we did not observe a correlation between sgRNA-UMI diversity and overall mutational burden, nor with genome-wide signatures of DNA damage²⁸, to support this interpretation.

The quantitative assessment of gene fitness in vivo admitted detailed assessment of oncogenic driver mechanisms via allele series and drug-gene fitness. We observed outlying sensitivity to gamma secretase cleavage component APH1A in C1368, a line bearing a NOTCH3-MEMO1P4 translocation resulting in the loss of the auto-regulatory PEST domain. Secondary screening with a Notch pathway-focused library confirmed growth dependence on the canonical Notch pathway, with this translocated Notch product rather than alternative Notch ligands as principal driver. This contrasted with two tumours that differ in ligand mediated Notch pathway signaling. We also observed outlying sensitivity to MDM2 in C2553, a PDX line amplified at this locus. MDM2 is a negative regulator of TP53, with amplification observed in ~4% of human tumours⁴¹. Consistent with this regulatory interaction, we showed that treatment of mice xenografted with this PDX with MDM2 inhibitor Idasanutlin induced a shift to positive TP53 fitness, indicating a reversion towards growth-inhibitory TP53 function. This illustrates the utility of the approach to probe drug-gene interactions in specific tumour genotypes. Furthermore, quantification of gene fitness shifts conditional on drug treatment offers a means to screen for evolutionary double bind steering opportunities, in which phenotypic adaptation to an initial therapy results in collateral sensitivity to a second mode of intervention^42,43.

Notably, the main source of gene fitness variation in the 181 transplant experiments described arises not from biological replicates, variations in site of transplant or differences between passages of tumour, but rather the intrinsic differences between patients. Although site transplants were largely neutral, fitness measurements of EGFR nevertheless exhibited a modest site-of-transplant effect, with greater negative fitness in the mammary fat pad, indicating greater dependence on EGFR signaling in the mammary microenviroment. Among patients, even for high negative fitness genes such as MYC, PLK1, CDK12, CHEK1, the variation in measured gene fitness can be as high as 7–10 fold between patients. The high clinical variability in responses and phenotypes in breast cancer patients is well known, and our results emphasize that intrinsic differences between patients demand the study of gene fitness in many patient tumour backgrounds to identify breast cancer sub-type fitness landscapes for any gene or pathway.

We assessed the gene and patient fitness variation of core genes from PI3-kinase, integrin, DNA damage, RAS-MAPK, cell-cycle control, Notch, Wnt, Hedgehog, Hippo, hypoxia and autophagy pathways. While the relative fitness of large effect size genes such as MYC, CHEK1, PLK1 was conserved with that described in recent metastatic cell line CRISPR screens, some differences were observed in moderate effect size genes, notably PDK1 (oxidative phosphorylation), BRCA2, RAF1, CA9 (hypoxia), ALK, SRC, LEF1 (Wnt), YAP1 (Hippo) and FLT1 (VEGF pathway).

Taken together, we present a method and resource for assessment of tumour gene fitness in vivo, that will be applicable to other tumour types and transplant models, enhancing the understanding of tumour biology and drug responses.

Methods

Biospecimen collection and ethical approval

De-identified tumour tissues from women aged 35–86 undergoing surgery or diagnostic core biopsy were collected with direct informed written consent for inclusion with University of British Columbia (UBC) Research Ethics Board (REB) under approved study protocols: REB-H16-01625 titled B-PRECISE Biobank, REB-H11-01887 titled Predictive models of drug action in breast cancer, REB-H20-00170 titled Linking clonal genomes to tumour evolution and therapeutics, Animal Care certificate REB-A07-0524, Biohazard Approval Certificate B11-0043, McGill University Health Centre (MUHC) REB -SUR99-780⁴⁴ and accordance to Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans-TCPS2. Respect for persons, concern for welfare and justice of all participants have been upheld. Participants did not receive compensation. De-identified tumour tissue is defined as biological material with direct patient identifiers removed and replaced with a code (https://ethics.research.ubc.ca/). Only privacy guardians have a level of access linking any participant’s code to their personal information and maintain the ethical duty of confidentiality. The number of patient-derived xenograft lines used in the study (n = 21) was chosen to ensure coverage of initiating patient tumours with a variety of histological subtypes (HR+, HR + Her2 + , Her2 + , TNBC), site (breast or metastatic) and including treatment-naive and previously therapy-treated at the time of biopsy. Only tumours that successfully engrafted in mice were used, which represents some bias in favour of more aggressive tumours within each histological subtype.

Tissue processing

Primary surgical and core biopsy tissue samples were transported from the operating room on ice in cold 1:1 v/v Dulbecco’s Modified Eagle Medium/Ham’s F12 Medium (DMEM-F12, Corning, NY, USA). Fat was trimmed away using scalpels. A small piece of tumour tissue was removed using scalpels and fixed in 10% formalin buffered saline (Fisher Scientific, Kalamazoo, MI, USA) for histological analysis. Additional small fragments from different portions of the tissue were collected together, flash frozen in liquid nitrogen and stored at −80 °C for nucleic acid extraction. The remaining tissue was minced finely with scalpels, then mechanically disaggregated for one minute using a Stomacher 80 Biomaster (Seward Limited, Worthing, UK) in 1–2 mL cold DMEM-F12. Aliquots from the resulting suspension of cells and organoids were used for xenotransplants.

For routine passaging, xenograft-bearing mice were euthanized when the size of the tumours approached 1 mL in volume, adding together the sizes of individual growths when more than one was present. The tumour material was excised aseptically, then processed as described for primary tissue. Serially transplanted aliquots represented 0.1–0.3 % of the xenograft tumour volume.

For in vivo screen endpoints, tumours grown from lenti-CRISPR-transduced PDX material were harvested at a volume of 0.1–0.2 mL. The excised tumours were weighed, then chopped with scalpels. An aliquot weighing 0.05–0.06 g was flash frozen in liquid nitrogen or on dry ice, then stored at −80 °C to be used for nucleic acid extraction. The remaining tissue was frozen in a 47:47:6 v/v mixture of Dulbecco’s Modified Eagle Medium (DMEM, Corning): fetal calf serum (FCS, Sigma, St Louis, MO, USA): dimethyl sulfoxide (DMSO, Sigma).

Xenografting

Animals

Female immuno-compromised, NOD/SCID/IL2r^−/− (NSG) and NOD/Rag1^−/−Il2r^−/− (NRG) mice were bred and housed at the Animal Resource Centre at the British Columbia Cancer Research Centre. Housing was maintained in a 18–25 °C temperature range and 20–70% humidity range, with a 12 hour daylight cycle (on at 6:00am, off at 6:00 pm). Surgery was carried out on mice between the ages of 5–12 weeks. All experimental procedures were approved by the University of British Columbia Animal Care Committee (protocol numbers A15-0248 and A19-0298).

Subcutaneous transplants

Disaggregated cells and organoids were resuspended in 100–200 mL of a 1:1 v/v mixture of ice-cold DMEM: Matrigel (BD Biosciences, San Jose, CA, USA) and kept on ice until transplantation. Mice were lightly anesthetized with isoflurane, then the cell/organoid suspension was injected under the skin on the flank using a pre-cooled 1 mL syringe and 21 gauge needle.

Mammary fat pad transplants

Mice were anesthetized using isoflurane, and administered Meloxicam non-steroidal anti-inflammatory drug (5 mg/kg). The skin in the flank close to one of the #4 inguinal fat pads was shaved, cleaned with 70% isopropyl alcohol and Hibitaine soap, and a line block of 0.25% bupivacaine hydrochloride (Marcaine, AstraZeneca, Cambridge, UK) applied. The skin was opened by making a 3–5 mm skin incision in the blocked area. The fat pad was eased from the skin using blunt dissection, then exteriorized with forceps. Innocula of organoids suspended in 60–70 µL of a 50:40:10 v/v mixture of cold Matrigel: DMEM: trypan blue (Sigma) were injected into the fat pads using a pre-cooled 1 mL syringe and 21 gauge needle. The blue dye in the medium was used for visual verification that the injected bolus was contained within the fat. The fat pad was returned and the incision closed with 1 or 2 absorbable sutures.

Propagation of CRISPR-transduced tissue

Following transplantation, mice were monitored regularly for the appearance of palpable tumours. Once palpable, tumours were measured three times per week using calipers, and the size estimated using the formula 0.52 × length × width². The animals were euthanized when tumours reached target size of 0.1– 0.2 mL. The institutionally-approved humane endpoint tumour size of 1 mL was not exceeded in any experiment. The tumours were excised aseptically, and transferred to Falcon tubes containing 5 mL ice-cold Hanks Balanced Salt Solution (Corning) supplemented with 2% FCS (HF) before tissue processing.

Generation and use of dual sgRNA-barcode lenti-CRISPR viral vector

Design of barcode and sgRNA pools

Barcode design used a partially degenerate DNA sequence NNATCNNGATSSAAANNGGTNNAACNN, where N = A/C/T/G, S = C/G, randomly incorporated during oligo synthesis (>4 × 10⁶ permutations)²⁴. Forward (NNATCNNGATSSAAANNGGTNNAACNN) and reverse (NNGTTNNACCNNTTTSSATCNNGATNN) barcode sequence oligos pools flanked by EcoRI and NheI restriction sequences were ordered from Integrated DNA Technologies (IDT, Coralville, IA, USA), and mixed together. A second pair of forward and reverse barcode sequence oligo pools flanked by PacI and KpnI restriction sequences was ordered (IDT) and mixed together.

20 nucleotide sgRNA sequences were designed using the CHOPCHOP web tool (version 3, https://chopchop.cbu.uib.no/)⁴⁵ specifying the PAM motif NGG, or selected from the published Avana and Brunello libraries^12,46. Three different sgRNAs were chosen per targeted gene, based on predicted efficiency score⁴⁶, and where possible targeting exons present in all common splice variants. Preference was also given to sgRNAs with 40-70% G-C content, and those previously identified as generating hits in cell lines screens (genomeCRISPR.org). sgRNA libraries also included a set of 20–25 non-targeting controls; either “non-targeting”, with no matching sequences in the human genome (from previously published lists¹³), or “non-gene targeting”, with matching sequences in presumed non-coding regions at least 5k base pairs distant from the nearest gene encoding region. The sgRNA sequences were obtained as complementary forward and reverse oligos from Integrated DNA Technologies, flanked by overhangs for cloning (forward: CACCG preceding sgRNA sequence; reverse: AAAC preceding and C following complementary sgRNA sequence). Individual forward and reverse oligos were mixed, then pooled as libraries for cloning.

Generation of dual barcode-sgRNA lentiCRISPR vector

The lentiCRISPRv2 plasmid (Addgene plasmid # 52961, a gift from Dr Feng Zhang) was initially modified by replacing the puromycin resistance gene with a dsRed-Express2 reporter gene, to enable integration detection by flow cytometry. Barcode and sgRNA pools were then incorporated in successive steps.

To facilitate efficient barcode insertion, we inserted a 60-bp random sequence filler between a pair of restriction sites either closely upstream (PacI/KpnI) or downstream (EcoRI/NheI) of the sgRNA locus in the plasmid. The plasmid was digested with one of the restriction enzymes (PacI or EcoRI), run through an agarose gel and gel purified using Monarch Gel Extraction kit (New England Biolabs, Ipswich, MA, USA) to isolate the digested plasmid backbone. The filler sequence was ligated into the plasmid with T4 ligase. The plasmid was amplified by heat-shock transformation into Stbl3 bacteria, and transformants cultured on Luria-Bertani-ampicillin-agar (LB-amp-agar) solid medium overnight. A single colony was picked and further cultured with LB-amp liquid medium, and plasmid extracted using Monarch Miniprep Kit (New England Biolabs). Presence of the filler was verified with Sanger sequencing. For barcode pool insertion, filler-containing lentiCRISPRv2 plasmids were digested with a restriction enzyme pair (PacI/KpnI or EcoRI/NheI), gel purified, and ligated with a pool of generate dsDNA oligonucleotides containing the barcode pool. Barcoded plasmids were electroporated into ElectroMAX DH10B T1R bacteria according to the manufacturer’s instructions and cultured overnight in 200 ml LB-amp. The barcode-containing plasmid was extracted using PureLink HiPure Plasmid Filter Maxiprep Kit (Thermo Fisher Scientific, Waltham, MA, USA), and frozen for future use as recipient for sgRNA pools.

Pooled sgRNA oligonucleotides were incorporated into BsmBI/Esp3I (Thermo Fisher Scientific) digested lentiCRISPRv2 plasmids at 3:1 insert-to-vector molar ratio with Quick Ligase (New England Bio labs). Ligated vectors were purified with NEB Monarch PCR & DNA Cleanup Kit (New England Biolabs) and subsequently transformed into electrocompetent ElectroMAX DH10B T1R E. coli (Thermo Fisher Scientific) using a 1 mm cuvette in a BioRad GenePulser Xcell System at 1800V, 25 µF and 200 ohms. Electroporated cells were incubated overnight in 200 mL of BD Difco LB broth Miller (Thermo Fisher Scientific) with ampicillin at 37 °C. Overnight culture was collected by spinning in a Beckman Coulter Avanti JXN-26 high-speed centrifuge at 6000 rpm for 10 min and the plasmids were extracted using a PureLink HiPure Plasmid Filter Maxiprep Kit (Thermo Fisher Scientific). Purified vectors were validated by targeted sequencing of guide inserts on Illumina MiSeq 2 × 150bp runs.

Sequence diversity of sgRNA and barcodes was validated by targeted sequencing of a PCR amplicon containing both sgRNA and barcode (as described below in Sequencing Library Preparation). This showed 90th/10th centile guide count ratio of 3.94, 5.27 and 3.01 for libraries Signaling1, Signaling2 and Notch respectively, which are in the expected range for pooled cloning.

Testing for any impact of UMI incorporation on efficacy of CRISPR-mediated knockdown was carried out by cloning an sgRNA targeting eGFP into the lentiCRISPRv2 vector. Then eGFP-expressing HEK293T cells (a clonal line derived from parental HEK293T cells) were transduced with plasmid vector using TransIT-Lenti Transfection Reagent (Mirus Bio, Madison WI, USA) following manufacturer’s instructions. After 8 days’ culture, cells were harvested with trypsin/EDTA and incubated with propidium iodide (Sigma) for identification of viable cells. The knockdown proportion (eGFP low) was determined by flow cytometry (LSRFortessa, BD Biosciences, San Jose, CA, USA) using FlowJo software version 10.7.1 (Beckton Dickinson).

Lentiviral packaging and titration

HEK293T packaging cells (ATCC) were transfected using TransIT-Lenti Transfection Reagent following manufacturer’s instructions. Briefly, 3 plasmids containing sgRNA-barcoded lentiviral vector (20 µg), PAX2 (15 µg) and VSV-G (10 µg) were dissolved in 4.5 mL Opti-MEM Reduced Serum Medium (Thermo Fisher Scientific), and 135 µL transfection reagent added. After 30 minutes the mixture was added dropwise to HEK293T cells at 70–90% confluent culture in a 150 mm dish. Cells were cultured in DMEM supplemented with 10% FCS and 10 mM Hepes (STEMCELL Technologies), with a change to fresh medium the following morning. Virus-containing supernatant was harvested four times, between 24 and 72 h following transduction. Harvests were pooled and pelleted by ultracentrifugation (Beckman Coulter Optima XE-90, spun at 107,000 × g for 90 min), and resuspended in DMEM (~1.2 mL final volume per 6 × 150 mm plates’ culture). Functional viral titre was determined by transducing a limiting dilution series of viral doses into 184hTERT-L9 cells (a clonal line derived from parental 184hTERT cells, a gift from Martha Stampfer, University of Berkeley, CA, USA). Cells were harvested 3 days following transduction, and the proportion of dsRed-expressing cells determined by flow cytometry (LSRFortessa) using FloJo software version 10.7.1. Titre was calculated using Poisson statistics for those dilutions that resulted in 5–25% dsRed+ cells. Typical functional titres of 0.5–3 × 10⁸ infectious units per ml were obtained, comparable with non-UMI-containing vectors.

Ex-vivo lentiviral transduction of PDX-derived cells

Mice were euthanized, then PDX tumours were removed aseptically. The tissue was minced with scalpels and mechanically disaggregated with a Seward Stomacher 80 paddle blender (Seward, Worthing, UK) for 1 minute in cold DMEM. A suspension of small organoids and cells was separated from larger tissue fragments by drawing into a P1000 pipette tip, then transferred to an Eppendorf tube. After centrifugation (5 minutes at 1200 rpm), the size of the pellet was estimated visually from the Eppendorf tube volume scale. The material was then divided into aliquots of suitable size for transplantation into individual mice (each ~0.025–0.030 cm³ pellet volume, typically 3–6 aliquots per originating 1 cm³ excised tumour volume). Red blood cell lysis was performed by incubating in ice-cold 0.8% w/v ammonium chloride solution (STEMCELL Technologies) for 10 minutes and washing with HF. Each aliquot was resuspended in 0.25 ml pre-warmed DMEM supplemented with 5% FCS and lentivirus at the desired dose (typically, 7–10 × 10⁶ infectious units, Table S2). Aliquots were transferred to individual wells of a pre-warmed 24-well non-tissue-culture treated plate and incubated at 37 °C for 4 hours. Following incubation, well contents were transferred to Eppendorf tubes and residual virus removed by washing 3 times with cold HF. Aliquots intended for subcutaneous transplantation were resuspended in 150 µL ice-cold 1:1 v/v mixture of Matrigel: DMEM. Aliquots intended for mammary fat transplantation were resuspended in 65 µL ice-cold 0.45: 0.45: 0.1 v/v mixture of matrigel: DMEM: trypan blue. The aliquots were kept on ice for up to 2 hours awaiting transplantation into mice.

Transduction efficiency was estimated by flow cytometry following in vitro culture. Wells of a 24-well plate were coated by incubating at 37 °C for 1 hour with a 1:60 dilution of collagen I (Sigma) in PBS. Aliquots of virally-transduced tissue were added to coated wells in 1 mL DMEM supplemented with 5% FCS. After 3 days, single cell suspensions were prepared by incubating in trypsin/EDTA (Sigma), then dispase (STEMCELL Technologies) supplemented with DNaseI (Sigma), then passing through a 40 micron filter. Cell pellets were resuspended in ice-cold HF supplemented with 4‘,6-diamidino-2-phenylindole (DAPI, Sigma). Transduced cell proportions were determined by flow cytometry (LSRFortessa) as the fraction of dsRed+ cells within a viable (DAPI-) gate.

Drug assays

Ex vivo cytotoxicity assays in organoid culture

Freshly harvested PDX tissue was dissociated to single cells using a gentleMACS dissociator with human dissociation kit (Miltenyi Biotec, Germany) and incubated at 37 °C with constant agitation. Red blood cells were lysed using ice-cold 0.8% w/v ammonium chloride solution (STEMCELL Technologies). Cells were filtered through a 70 micron filter before being resuspended in 1:1 mix of Matrigel: breast cancer organoid media (composition as specified in⁴⁷). 3000 cells per 20 µL were seeded into each well in a white 384 well plate (Corning) using an Integra Assist Plus automated pipettor. Plates were incubated at 37 °C for 15 minutes to allow Matrigel to solidify before an additional 10 µL breast cancer organoid culture medium was added to each well. 60 µL PBS was added to surrounding wells to minimize evaporation artifacts. Organoids were allowed to form for 3 days, after which 10 µL of 3× concentrated drug diluted in culture medium to account for the cell: Matrigel mix. Organoids were incubated with drugs for 96 hours before being lysed, and viability measured using CellTitreGlo 3D Viability Assay (Promega, Madison, WI, USA) following manufacturer’s instructions, with luminescence measured by microplate reader (SpectraMax 3, Molecular Devices, Sunnyvale, CA, USA). Drug response curves and EC50 concentrations were fitted with the R tidydrc package (a wrapper for the drc package), using four parameter log-logistic function LL.4 (https://cran.r-project.org/web/packages/drc/drc). Organoid PDX-drug assay data were filtered to remove a minority that failed to fit with the LL.4 model (1/63 assays), fit with positive slope (fit parameter b < 0, 1/63 assays), or where the dose-response fit covered a lower than 20% range over the drug doses tested (4/63 assays). Drugs used were Idasanutlin (Medkoo Biosciences, Morrisville, NC, USA), Nutlin-3a (MedChemExpress, Monmouth Junction, NJ, USA), Infigratinib (MedChemExpress), Erdafitinib (MedChemExpress), JQ-1 (MedChemExpress), Rapamycin (Selleck Chemicals, Houston, TX, USA) and Everolimus (Selleck Chemicals).

In vivo drug administration

Vehicle was prepared by dissolving 20 mg/ml Klucel hydroxypropyl cellulose average Mw 100,000 (Sigma), 1 mg/ml Tween-80 (Sigma), 0.9 mg/ml methylparaben (Sigma) and 0.1 mg/ml propylparaben (Sigma) in sterile water, and kept refrigerated. Mice bearing replicate PDX tumours were randomized into control and drug-treated groups when the average tumour size was ~0.05 ml. For 14 consecutive days mice received a dose by oral gavage either of vehicle alone (control group) or 100 mg/kg Idasanutlin (MedKoo Biosciences) suspended in vehicle (drug-treated group). Mice were euthanized on the day of the final drug dose, four hours after drug dosing, and tumours were harvested for processing.

CRISPR screen sequence generation

Sequencing library preparation

Flash frozen xenograft tumour samples were defrosted. Genomic DNA was extracted using the QIAAmp DNA Mini kit (Qiagen), following manufacturer’s protocol for extraction from tissues, with the exception that double the volume of reagents was used prior to loading onto DNA columns (buffers AL and ATL, proteinase K, ethanol). DNA yield was quantified using QUBIT fluorometer and dsDNA BR Assay Kit (Thermo Fisher Scientific). Aliquots containing 8–10 mg DNA were set aside for sequencing library preparation.

Nextera adaptor sequences were appended to the 5’ end of the sgRNA region primers with the addition of single nucleotides to allow downstream barcoded adaptor attachment by PCR. Multiplex PCR was performed using a T100 Thermal Cycler and Platinum Multiplex PCR Master Mix (Bio-Rad Laboratories, Hercules, CA, USA). Enzymatic cleanup of the PCR product was carried out using ExoSAP-IT PCR Product Cleanup Reagent (Thermo Fisher Scientific). Barcode PCR was completed using T100 Thermal Cycler, Nextera XT Index Kit v2 (Bio-Rad Laboratories), and FastStart High Fidelity PCR System (Sigma). The PCR products (approximately 400 bp, including adaptors) were size selected and cleaned using SPRIselect beads (Beckman Coulter) with a 1:1.2 library:beads ratio. The DNA size was analyzed using a High Sensitivity DNA Kit (Agilent Technologies, Santa Clara, CA, USA) on an 2100 Bioanalyzer (Agilent Technologies) and the DNA was quantified using dsDNA BR Assay Kit (Invitrogen, Carslbad, CA, USA) on a Qubit 4 Fluorometer (Invitrogen). Samples were normalized and pooled to create a library and each library was denatured to 2 nM or 4 nM. Sequencing runs pooling up to 12 sample libraries were carried out on a MiSeq benchtop sequencer (Illumina, San Diego, CA, USA, running Illumina Control Software, currently version 4.0.0.1769) according to manufacturer’s protocol for 2 × 151 bp pair-end runs using MiSeq Reagent Kits v2 300 Cycle (Illumina). Sequencing runs pooling 13–47 samples libraries were carried out on a NextSeq sequencer (Illumina) according to manufacturer’s protocol for 2 × 151 bp pair-end runs using NextSeq 500 System Mid-Output Kit 300 Cycle (Illumina).

Extraction of sgRNA and UMI sequences

FASTQ files were first filtered to remove paired reads for which either forward or reverse reads had a lower average phred quality score than 30. The forward read, containing the sgRNA sequence, was trimmed to a substring from 24 to 43 nucleotides downstream from invariant vector sequence TCTTGTGGAAAGGACGAAACACCG. sgRNAs were identified within the substring by exact string matching against the set of known 20 nucleotides targeting sequences in the library used, or failing that a string match with no more than 1 nucleotide mismatch. The reverse read, containing the UMI sequence, was trimmed to a 50 nucleotides substring down stream of invariant vector sequence AGCCTCACTGGCCGTCGTTTTACA. UMI sequences were identified within the substring by exact string matching to the sequence NNGTTNNACCNNTTTSSATCNNGATNN, complementary to the UMI reference, or failing that a string match with no more than 1 nucleotide mismatch. Stringency of FASTQ filtering was relaxed to allow for up to 3 nucleotides mismatches for model selection analysis. To standardize dataset depth, and reduce the confounding effect of duplicate reads in more deeply sequenced datasets, extracted sequences were downsampled to a maximum of 2 × 10⁶ sgRNA-UMI reads. Reads with the same combination of sgRNA and UMI sequence were considered a separate clone, and the numbers of downsized reads matching that sgRNA-UMI sequence the clone size.

Previous studies have documented deviations in UMI sequences from the sequences originally introduced by transduction, presumed to arise through errors in replication, PCR or optical sequence reading²⁵. To adjust for this, we carried out merging of clones with closely similar sgRNA-UMI sequence, using the UMI-Tools Directional algorithm²⁵. For each sgRNA in turn, UMI sequence clones are represented as a network of nodes, with edges drawn connecting pairs of nodes where: (i) UMI sequences differ at exactly 1 position, and (ii) n_a > 2n_b − 1, where n_a and n_b are the larger and smaller clone sizes for the pair. Any cluster of interconnected nodes, in which each node can be connected to any other by series of uninterrupted edges, is then merged into a single clone. The merged clone has size equal to the sum of constituent node clone sizes, and is assigned the UMI of the constituent node with greatest clone size. Comparison of downstream fitness estimates made with or without pre-processing with UMI-Tools indicated that clone merging has a very small effect on median fitness estimates but results in somewhat wider dispersions consistent with the implied reduction in number of independent datapoints modeled.

Whole genome and transcriptome sequence data analysis

Sequence data generation

For each patient PDX line in the study, tissue aliquots were collected from (i) a sample PDX tumour; (ii) the patient tumour that initiated the PDX line; and (iii) normal tissue (saliva or blood buffy coat) from the patient. Genomic DNA was extracted using the QiaAmp or Allprep kits (Qiagen). Library construction and sequencing was carried out using Illumina HiSeq2500 whole genome shotgun v4 chemistry with paired end 125 bp reads, targeting an average genome-wide coverage of 80 for PDX and patient samples and 40 for normal samples.

RNA was extracted from a tissue aliquot from a sample PDX tumour (from the same PDX line but not necessarily the same tumour as used for the PDX DNA aliquot), using the Allprep kit (Qiagen). ssRNAseq libraries were prepared and sequenced on a HiSeq2500 sequencer, pooling 3 samples per sequencing lane.

Alignment

Reads were aligned to the hg19 reference genome using bwa mem v0.7.6a. Duplicates were marked with picard MarkDuplicates v2.18.14 (http://broadinstitute.github.io/picard/). RNAseq libraries were aligned using the STAR pipeline⁴⁸ version 2.4.2a (https://github.com/alexdobin/STAR).

Bioinformatic removal of mouse reads

Whole genome and whole transcriptome sequence libraries from PDXs were depleted for putative contaminating mouse reads using species filter Biobloom⁴⁹, retaining reads that were unambiguously human-classified. To avoid false-positive downstream mutation calls when comparing with other library types, tumour and normal libraries WGS were also processed with this filter. Human-classified reads were realigned to the hg19 reference genome using bwa mem⁵⁰ for downstream variant calling.

Single nucleotide variant and indel calling

Somatic indels were identified using Strelka v1.0.14⁵¹, and both germline and somatic SNVs were called using MutationSeq v4.2.0⁵². Both algorithms were applied using default parameters. GENCODE release 19 was used to annotate variants with gene name, while the predicted effects of SNVs were annotated using SnpEff 4.0e. Additionally variants were annotated with ClinVar (https://www.ncbi.nlm.nih.gov/clinvar)⁵³ and COSMIC (https://cancer.sanger.ac.uk/cosmic)⁵⁴ variant database entries having exact sequence matches. A subset of predicted high impact somatic variants was selected as those with (i) mutationSeq probability > 0.85, and (ii) variant allele frequency > 0.3, and (iii) predicted as high impact by at least one of SnpEff (any of the terms: stop_gained, splice_donor_variant, splice_acceptor_variant, missense_variant, stop_lost) or ClinVar (any of the terms: pathogenic, likely pathogenic). For BRCA1 and BRCA2 genes, predicted high impact variants in the germline were also included.

Breakpoint calling

Breakpoint prediction was conducted using deStruct⁵⁵ version 0.4.3 (https://github.com/amcpherson/destruct/) to generate per cell breakpoint counts. Breakpoints were filtered for a minimum of 5 split reads, and with predicted sequences anchored by at least 250 nucleotides on either side of the predicted breakpoint.

Copy number calling

TITAN version 1.0.10⁵⁶ was run on WGS data to infer logR copy number values. 10 separate TITAN runs were made with parameters 1, 2, 3, 4 or 5 subclonal clusters and initializing ploidy of 2 or 4. The best fitting run was selected for each dataset, based on lowest S_Dbw Validity Index score.

Single cell sequencing with DLP+

The sample genome-wide single cell copy number profile displayed for PDX series C2553 was prepared with the Direct Library Preparation (DLP+) protocol as described in³¹.

Mutation signature analysis

We derived mutation signature probabilities using the MMCTM method, given signatures previously inferred from a set of high grade serous ovarian carcinomas and triple negative breast cancers^28,57. The PDX sample datasets were clustered together with the other datasets to form clusters defined solely by signature activity using hierarchical clustering with Euclidean distance and Ward’s linkage method.

Inference of mutational clonal clusters

We inferred the number of genomic clonal clusters in PDXs by applying population structure model PyClone-VI (https://github.com/Roth-Lab/pyclone-vi)²⁶ to mutation data derived from bulk WGS. Model inputs for single nucleotide variants were from MutationSeq v4.2.0⁵². Model inputs for major and minor allele copy number were from TITAN version 1.0.10⁵⁶, using the best fitting run based on S_Dbw Validity Index score.

Differential pathway analysis

To identify pathways differentially active in high vs low UMI diversity PDX lines, we first ordered PDX lines based on the average SDI per million reads of tumours generated from those lines. We selected a group of 6 lines with lowest diversity (C3466, C2438, C1557, C1375, C1373, C0468) and a group of 6 lines with highest diversity (C3278, C1368, C0331, C2271, C2191 and C3037). We next generated a set of genes differentially expressed between the two groups. For this, raw RNAseq read counts were TMM-normalized using edgeR (https://bioconductor.org/packages/release/bioc/html/edgeR.html). Limma-voom (https://bioconductor.org/packages/release/bioc/html/limma.html) was used to adjust for heteroskedasticity in count data, and generate log2-fold gene expression differences and p-values by linear modelling. A subset of genes with p-value no greater than 0.1 was passed to GSEA²⁷ for differential pathway analysis, using the Reactome 2016 Hallmark set of curated pathways (https://reactome.org/), and a false discovery rate of 0.05.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Raw data from targeted sequencing, whole genome sequencing and RNAseq have been deposited in the NCBI Sequence Read Archive (SRA) under BioProject ID PRJNA842677. Derived UMI-sgRNA count files have been deposited at Zenodo https://doi.org/10.5281/zenodo.6584802. Mutation annotation was carried out using ClinVar (https://www.ncbi.nlm.nih.gov/clinvar) and COSMIC (https://cancer.sanger.ac.uk/cosmic). Source data are provided with this paper.

Code availability

Fitness modeling used the Blang probabilistic programming language. The model code and sample datasets are available on Github (https://github.com/aroth85/humi_pipeline) (10.5281/zenodo.6596290)⁵⁸. Blang code is available at https://github.com/UBC-Stat-ML/nowellpack (https://doi.org/10.5281/zenodo.6595131)⁵⁹. Pre- and post-processing of results used custom scripts in Python and R version 3.6.3. Mutation-calling algorithms mutationSeq, TITAN, deStruct and signature inference MMCTM are published, and cited in the text. The CHOPCHOP webtool (version 3, https://chopchop.cbu.uib.no) was used for CRISPR guide design.

References

Xu, H., Eirew, P., Mullaly, S. C. & Aparicio, S. The Omics of Triple-Negative Breast Cancers. Clin. Chem. 60, 122–133 (2014).
Article CAS PubMed Google Scholar
Perou, C. M. et al. Molecular Portraits of Human Breast Tumours. Nature 406, 747–752 (2000).
ADS CAS PubMed Google Scholar
van’t Veer, L. J. et al. Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer. Nature 415, 530–536 (2002).
Article Google Scholar
Cancer Genome Atlas Network. Comprehensive Molecular Portraits of Human Breast Tumours. Nature 490, 61–70 (2012).
Article ADS CAS Google Scholar
Curtis, C. et al. The Genomic and Transcriptomic Architecture of 2,000 Breast Tumours Reveals Novel Subgroups. Nature 486, 346–352 (2012).
Article CAS PubMed PubMed Central Google Scholar
Pereira, B. et al. The Somatic Mutation Profiles of 2,433 Breast Cancers Refines Their Genomic and Transcriptomic Landscapes. Nat. Commun. 7, 11479 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Shah, S. P. et al. The Clonal and Mutational Evolution Spectrum of Primary Triple-Negative Breast Cancers. Nature 486, 395–399 (2012).
Article ADS CAS PubMed Google Scholar
Teschendorff, A. E., Miremadi, A., Pinder, S. E., Ellis, I. O. & Caldas, C. An Immune Response Gene Expression Module Identifies a Good Prognosis Subtype in Estrogen Receptor Negative Breast Cancer. Genome Biol. 8, R157 (2007).
Article PubMed PubMed Central CAS Google Scholar
Lord, C. J. & Ashworth, A. BRCAness Revisited. Nat. Rev. Cancer 16, 110–120 (2016).
Article CAS PubMed Google Scholar
Burleigh, A. et al. A Co-Culture Genome-Wide RNAi Screen with Mammary Epithelial Cells Reveals Transmembrane Signals Required for Growth and Differentiation. Breast Cancer Res. 17, 4 (2015).
Article PubMed PubMed Central CAS Google Scholar
Shalem, O. et al. Genome-Scale CRISPR-Cas9 Knockout Screening in Human Cells. Science 343, 84–87 (2014).
Article ADS CAS PubMed Google Scholar
Meyers, R. M. et al. Computational Correction of Copy-Number Effect Improves Specificity of CRISPR Cas9 Essentiality Screens in Cancer Cells. Nat. Genet. 49, 1779–1784 (2017).
Article CAS PubMed PubMed Central Google Scholar
Hart, T. et al. High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities. Cell 163, 1515–1526 (2015).
Article CAS PubMed Google Scholar
Al-Hajj, M., Wicha, M. S., Benito-Hernandez, A., Morrison, S. J. & Clarke, M. F. Prospective Identification of Tumorigenic Breast Cancer Cells. Proc. Natl. Acad. Sci. USA 100, 3983–3988 (2003).
Article ADS CAS PubMed PubMed Central Google Scholar
Eirew, P. et al. A Method for Quantifying Normal Human Mammary Epithelial Stem Cells with in Vivo Regenerative Ability. Nat. Med. 14, 1384–1389 (2008).
Article CAS PubMed Google Scholar
Nguyen, L. V. et al. DNA Barcoding Reveals Diverse Growth Kinetics of Human Breast Tumour Subclones in Serially Passaged Xenografts. Nat. Commun. 5, 5871 (2014).
Article ADS CAS PubMed Google Scholar
Merino, D. et al. Barcoding Reveals Complex Clonal Behavior in Patient-Derived Xenografts of Metastatic Triple Negative Breast Cancer. Nat. Commun. 10, 766 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Michlits, G. et al. CRISPR-UMI: Single-Cell Lineage Tracing of Pooled CRISPR–Cas9 Screens. Nat. Methods 14, 1191–1197 (2017).
Article CAS PubMed Google Scholar
Schmierer, B. et al. CRISPR/Cas9 Screening Using Unique Molecular Identifiers. Mol. Syst. Biol. 13, 945 (2017).
Article PubMed PubMed Central Google Scholar
Esk, C. et al. A Human Tissue Screen Identifies a Regulator of ER Secretion as a Brain-Size Determinant. Science 370, 935–941 (2020).
Article ADS CAS PubMed Google Scholar
Rogers, Z. N. et al. A Quantitative and Multiplexed Approach to Uncover the Fitness Landscape of Tumor Suppression in Vivo. Nat. Methods 14, 737–742 (2017).
Article CAS PubMed PubMed Central Google Scholar
Rogers, Z. N. et al. Mapping the in Vivo Fitness Landscape of Lung Adenocarcinoma Tumor Suppression in Mice. Nat. Genet. 50, 483–486 (2018).
Article CAS PubMed PubMed Central Google Scholar
Michels, B. E. et al. Pooled In Vitro and In Vivo CRISPR-Cas9 Screening Identifies Tumor Suppressors in Human Colon Organoids. Cell Stem Cell 26, 782–792.e7 (2020).
Article CAS PubMed Google Scholar
Gerrits, A. et al. Cellular Barcoding Tool for Clonal Analysis in the Hematopoietic System. Blood 115, 2610–2618 (2010).
Article CAS PubMed Google Scholar
Smith, T., Heger, A. & Sudbery, I. UMI-Tools: Modeling Sequencing Errors in Unique Molecular Identifiers to Improve Quantification Accuracy. Genome Res. 27, 491–499 (2017).
Article CAS PubMed PubMed Central Google Scholar
Gillis, S. & Roth, A. PyClone-VI: Scalable Inference of Clonal Population Structures Using Whole Genome Data. BMC Bioinforma. 21, 571 (2020).
Article Google Scholar
Subramanian, A. et al. Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Funnell, T. et al. Integrated Structural Variation and Point Mutation Signatures in Cancer Genomes Using Correlated Topic Models. PLoS Computational Biol. 15, e1006799 (2019).
Article CAS Google Scholar
Alexandrov, L. B. et al. Signatures of Mutational Processes in Human Cancer. Nature 500, 415–421 (2013).
Article CAS PubMed PubMed Central Google Scholar
Eirew, P. et al. Dynamics of Genomic Clones in Breast Cancer Patient Xenografts at Single-Cell Resolution. Nature 518, 422–426 (2015).
Article ADS CAS PubMed Google Scholar
Laks, E. et al. Clonal Decomposition and DNA Replication States Defined by Scaled Single-Cell Genome Sequencing. Cell 179, 1207–1221.e22 (2019).
Article CAS PubMed PubMed Central Google Scholar
Dempster, J. M. et al. Agreement between Two Large Pan-Cancer CRISPR-Cas9 Gene Dependency Data Sets. Nat. Commun. 10, 5817 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, K. et al. PEST Domain Mutations in Notch Receptors Comprise an Oncogenic Driver Segment in Triple-Negative Breast Cancer Sensitive to a -Secretase Inhibitor. Clin. Cancer Res. 21, 1487–1496 (2015).
Article PubMed CAS Google Scholar
Borthakur, G. et al. MDM2 Inhibitor, Nutlin 3a, Induces P53 Dependent Autophagy in Acute Leukemia by AMP Kinase Activation. PloS one 10, e0139254 (2015).
Article PubMed PubMed Central CAS Google Scholar
Abe, Y. et al. Hedgehog Signaling Overrides P53-Mediated Tumor Suppression by Activating Mdm2. Proc. Natl. Acad. Sci. USA 105, 4838–4843 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Ringel, T. et al. Genome-Scale CRISPR Screening in Human Intestinal Organoids Identifies Drivers of TGF- Resistance. Cell Stem Cell 26, 431–440.e8 (2020).
Article CAS PubMed Google Scholar
Planas-Paz, L. et al. YAP, but Not RSPO-LGR4/5, Signaling in Biliary Epithelial Cells Promotes a Ductular Reaction in Response to Liver Injury. Cell Stem Cell 25, 39–53.e10 (2019).
Article CAS PubMed Google Scholar
Han, K. et al. CRISPR Screens in Cancer Spheroids Identify 3D Growth Specific Vulnerabilities. Nature 580, 136–141 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Chow, R. D. et al. AAV-Mediated Direct in Vivo CRISPR Screen Identifies Functional Suppressors in Glioblastoma. Nat. Neurosci. 20, 1329–1341 (2017).
Article CAS PubMed PubMed Central Google Scholar
Subramanian, A. et al. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell 171, 1437–1452.e17 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kato, S. et al. Analysis of MDM2 Amplification: Next-Generation Sequencing of Patients With Diverse Malignancies. JCO Precision Oncol. 2018, PO.17.00235 https://doi.org/10.1200/PO.17.00235 (2018).
Acar, A. et al. Exploiting Evolutionary Steering to Induce Collateral Drug Sensitivity in Cancer. Nat. Commun. 11, 1923 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Basanta, D., Gatenby, R. A. & Anderson, A. R. A. Exploiting Evolution to Treat Drug Resistance: Combination Therapy and the Double Bind. Mol. Pharmaceutics 9, 914–921 (2012).
Article CAS Google Scholar
Savage, P. et al. Chemogenomic profiling of breast cancer patient-derived xenografts reveals targetable vulnerabilities for difficult-to-treat tumors. Commun Biol. 3, 310 (2020).
Labun, K., Montague, T. G., Gagnon, J. A., Thyme, S. B. & Valen, E. CHOPCHOP v2: A Web Tool for the next Generation of CRISPR Genome Engineering. Nucleic Acids Res. 44, W272–W276 (2016).
Article CAS PubMed PubMed Central Google Scholar
Doench, J. G. et al. Optimized sgRNA Design to Maximize Activity and Minimize Off-Target Effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191 (2016).
Article CAS PubMed PubMed Central Google Scholar
Sachs, N. et al. A Living Biobank of Breast Cancer Organoids Captures Disease Heterogeneity. Cell 172, 373–386.e10 (2018).
Article CAS PubMed Google Scholar
Dobin, A. et al. STAR: Ultrafast Universal RNA-Seq Aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Chu, J. et al. BioBloom Tools: Fast, Accurate and Memory-Efficient Host Species Sequence Screening Using Bloom Filters. Bioinformatics 30, 3402–3404 (2014).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and Accurate Short Read Alignment with Burrows–Wheeler Transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Saunders, C. T. et al. Strelka: Accurate Somatic Small-Variant Calling from Sequenced Tumor-Normal Sample Pairs. Bioinformatics 28, 1811–1817 (2012).
Article CAS PubMed Google Scholar
Ding, J. et al. Feature-Based Classifiers for Somatic Mutation Detection in Tumour-Normal Paired Sequencing Data. Bioinformatics 28, 167–175 (2012).
Article CAS PubMed Google Scholar
Landrum, M. J. et al. ClinVar: Public Archive of Interpretations of Clinically Relevant Variants. Nucleic Acids Res. 44, D862–D868 (2016).
Article CAS PubMed Google Scholar
Tate, J. G. et al. COSMIC: The Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 47, D941–D947 (2019).
Article CAS PubMed Google Scholar
McPherson, A., Shah, S. P. & Sahinalp, S. C. deStruct: Accurate Rearrangement Detection Using Breakpoint Specific Realignment (Cold Spring Harbor Laboratory, 2017).
Ha, G. et al. TITAN: Inference of Copy Number Architectures in Clonal Cell Populations from Tumor Whole-Genome Sequence Data. Genome Res. 24, 1881–1893 (2014).
Article CAS PubMed PubMed Central Google Scholar
Funnell, T. et al. The impact of mutational processes on structural genomic plasticity in cancer cells. bioRxiv. https://doi.org/10.1101/2021.06.03.446999 (2021).
Roth, A. Roth-Lab/humi_pipeline: version 0.1.0. https://doi.org/10.5281/zenodo.6596290 (2022).
Bouchard-Côté, A., Salehi, S. & Roth, A. UBC-Stat-ML/nowellpack: version humi-v1.2. https://doi.org/10.5281/zenodo.6595131 (2022).

Download references

Acknowledgements

This project was generously supported by the BC Cancer Foundation at BC Cancer. SA holds the Nan and Lorraine Robertson Chair in Breast Cancer and is a Canada Research Chair in Molecular Oncology (950- 230610). Additional funding provided by the Terry Fox Research Institute Grant 1082, Canadian Cancer Society Impact Program Grants 705617 and 705636, CIHR Grant FDN-148429, Breast Cancer Research Foundation award (BCRF-18-180, BCRF-19-180, BCRF-20-180 and BCRF-21-180), the Cancer Research UK Grand Challenge Program 25050, Canada Foundation for Innovation (40044). We thank Morag Park, McGill University for the generous gift of PDX material. We thank members of our labs for critical reading of the manuscript.

Author information

Jazmine Brimhall & Justina Biele
Present address: AbCellera Biologics Inc., Vancouver, BC, Canada
Corey Hoang
Present address: British Columbia Institute of Technology, Vancouver, BC, Canada

Authors and Affiliations

Department of Molecular Oncology, BC Cancer, Vancouver, BC, Canada
Peter Eirew, Ciara O’Flanagan, Jerome Ting, Sohrab Salehi, Jazmine Brimhall, Beixi Wang, Justina Biele, Teresa Algara, So Ra Lee, Corey Hoang, Damian Yap, Steven McKinney, Cherie Bates, Esther Kong, Daniel Lai, Sean Beatty, Mirela Andronescu, Elena Zaikova, Andrew Roth & Samuel Aparicio
Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
Tyler Funnell, Nicholas Ceglia & Sohrab Shah
Department of Medical Oncology, BC Cancer, Vancouver, BC, Canada
Stephen Chia & Karen Gelmon
Department of Diagnostic Radiology, BC Cancer, Vancouver, BC, Canada
Colin Mar
Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada
Andrew Roth & Samuel Aparicio
Department of Computer Science, University of British Columbia, Vancouver, BC, Canada
Andrew Roth
Department of Statistics, University of British Columbia, Vancouver, BC, Canada
Alexandre Bouchard-Côté

Authors

Peter Eirew
View author publications
You can also search for this author in PubMed Google Scholar
Ciara O’Flanagan
View author publications
You can also search for this author in PubMed Google Scholar
Jerome Ting
View author publications
You can also search for this author in PubMed Google Scholar
Sohrab Salehi
View author publications
You can also search for this author in PubMed Google Scholar
Jazmine Brimhall
View author publications
You can also search for this author in PubMed Google Scholar
Beixi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Justina Biele
View author publications
You can also search for this author in PubMed Google Scholar
Teresa Algara
View author publications
You can also search for this author in PubMed Google Scholar
So Ra Lee
View author publications
You can also search for this author in PubMed Google Scholar
Corey Hoang
View author publications
You can also search for this author in PubMed Google Scholar
Damian Yap
View author publications
You can also search for this author in PubMed Google Scholar
Steven McKinney
View author publications
You can also search for this author in PubMed Google Scholar
Cherie Bates
View author publications
You can also search for this author in PubMed Google Scholar
Esther Kong
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Lai
View author publications
You can also search for this author in PubMed Google Scholar
Sean Beatty
View author publications
You can also search for this author in PubMed Google Scholar
Mirela Andronescu
View author publications
You can also search for this author in PubMed Google Scholar
Elena Zaikova
View author publications
You can also search for this author in PubMed Google Scholar
Tyler Funnell
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Ceglia
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Chia
View author publications
You can also search for this author in PubMed Google Scholar
Karen Gelmon
View author publications
You can also search for this author in PubMed Google Scholar
Colin Mar
View author publications
You can also search for this author in PubMed Google Scholar
Sohrab Shah
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Roth
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Bouchard-Côté
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Aparicio
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.E., S.Sa., S.Sh., A.R., A.B.C., S.A. conceptualized the experiments and quantitative intuition and wrote the manuscript. AR, ABC defined and implemented statistical models. P.E., J.T., J.B., B.W., J.Bi., D.Y., C.O., C.H., T.A., S.R.L. performed experiments. P.E., D.L., S.B., S.M., M.A., E.Z., T.F., N.C., performed data analysis. S.C., K.G., C.M., C.B., E.K. consented patients and obtained biopsy materials. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Alexandre Bouchard-Côté or Samuel Aparicio.

Ethics declarations

Competing interests

S.A .and S.S. declare founding shares of Canexia Healthcare. S.A. declares founding shares of Inflex Ltd. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Ramon Diaz-Uriarte, Pablo Herrera Nieto and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description to Additional Supplementary Information

Supplementary Data

Reporting Summary

Source data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Eirew, P., O’Flanagan, C., Ting, J. et al. Accurate determination of CRISPR-mediated gene fitness in transplantable tumours. Nat Commun 13, 4534 (2022). https://doi.org/10.1038/s41467-022-31830-2

Download citation

Received: 06 April 2021
Accepted: 01 July 2022
Published: 04 August 2022
DOI: https://doi.org/10.1038/s41467-022-31830-2

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.