Functional genomics approaches can overcome limitations—such as the lack of identification of robust targets and poor clinical efficacy—that hamper cancer drug development. Here we performed genome-scale CRISPR–Cas9 screens in 324 human cancer cell lines from 30 cancer types and developed a data-driven framework to prioritize candidates for cancer therapeutics. We integrated cell fitness effects with genomic biomarkers and target tractability for drug development to systematically prioritize new targets in defined tissues and genotypes. We verified one of our most promising dependencies, the Werner syndrome ATP-dependent helicase, as a synthetic lethal target in tumours from multiple cancer types with microsatellite instability. Our analysis provides a resource of cancer dependencies, generates a framework to prioritize cancer drug targets and suggests specific new targets. The principles described in this study can inform the initial stages of drug development by contributing to a new, diverse and more effective portfolio of cancer drug targets.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Data and analyses are included in the published article and supplementary data 1, 2 and 3 are available from FigShare (https://figshare.com/projects/CRISPRtargetID/60146). The gene fitness scores of the cell lines, raw counts of the sgRNA data, and processed data and results are available from the project Score web portal: https://score.depmap.sanger.ac.uk.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Garraway, L. A. Genomics-driven oncology: framework for an emerging paradigm. J. Clin. Oncol. 31, 1806–1814 (2013).
Zehir, A. et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 23, 703–713 (2017).
Hay, M., Thomas, D. W., Craighead, J. L., Economides, C. & Rosenthal, J. Clinical development success rates for investigational drugs. Nat. Biotechnol. 32, 40–51 (2014).
Koike-Yusa, H., Li, Y., Tan, E.-P., Del Castillo Velasco-Herrera, M. & Yusa, K. Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR–guide RNA library. Nat. Biotechnol. 32, 267–273 (2014).
Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).
van der Meer, D. et al. Cell Model Passports—a hub for clinical, genetic and functional datasets of preclinical cancer models. Nucleic Acids Res. 47, D923–D929 (2019).
Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).
Hart, T. et al. High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell 163, 1515–1526 (2015).
Hart, T. et al. Evaluation and design of genome-wide CRISPR/SpCas9 knockout screens. G3 (Bethesda) 7, 2719–2727 (2017).
Tzelepis, K. et al. A CRISPR dropout screen identifies genetic vulnerabilities and therapeutic targets in acute myeloid leukemia. Cell Rep. 17, 1193–1205 (2016).
Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CRISPR–Cas9 system. Science 343, 80–84 (2014).
McDonald, E. R. III et al. Project DRIVE: a compendium of cancer dependencies and synthetic lethal relationships uncovered by large-scale, deep RNAi screening. Cell 170, 577–592 (2017).
Massacesi, C. et al. PI3K inhibitors as new cancer therapeutics: implications for clinical trial design. OncoTargets Ther. 9, 203–210 (2016).
Brown, K. K. et al. Approaches to target tractability assessment — a practical perspective. MedChemComm 9, 606–613 (2018).
Viswanathan, V. S. et al. Dependency of a therapy-resistant state of cancer cells on a lipid peroxidase pathway. Nature 547, 453–457 (2017).
Chu, W. K. & Hickson, I. D. RecQ helicases: multifunctional genome caretakers. Nat. Rev. Cancer 9, 644–654 (2009).
Cortes-Ciriano, I., Lee, S., Park, W.-Y., Kim, T.-M. & Park, P. J. A molecular portrait of microsatellite instability across multiple cancers. Nat. Commun. 8, 15180 (2017).
Haugen, A. C. et al. Genetic instability caused by loss of MutS homologue 3 in human colorectal cancer. Cancer Res. 68, 8465–8472 (2008).
Perry, J. J. P. et al. WRN exonuclease structure and molecular mechanism imply an editing role in DNA end processing. Nat. Struct. Mol. Biol. 13, 414–422 (2006).
Kamath-Loeb, A. S., Welcsh, P., Waite, M., Adman, E. T. & Loeb, L. A. The enzymatic activities of the Werner syndrome protein are disabled by the amino acid polymorphism R834C. J. Biol. Chem. 279, 55499–55505 (2004).
Ketkar, A., Voehler, M., Mukiza, T. & Eoff, R. L. Residues in the RecQ C-terminal domain of the human Werner Syndrome helicase are involved in unwinding G-quadruplex DNA. J. Biol. Chem. 292, 3154–3163 (2017).
Chan, E. M. et al. WRN helicase is a synthetic lethal target in microsatellite unstable cancers. Nature https://doi.org/10.1038/s41586-019-1102-x (2019).
Saydam, N. et al. Physical and functional interactions between Werner syndrome helicase and mismatch-repair initiation factors. Nucleic Acids Res. 35, 5706–5716 (2007).
Opresko, P. L., Sowd, G. & Wang, H. The Werner syndrome helicase/exonuclease processes mobile D-loops through branch migration and degradation. PLoS ONE 4, e4825 (2009).
Myung, K., Datta, A., Chen, C. & Kolodner, R. D. SGS1, the Saccharomyces cerevisiae homologue of BLM and WRN, suppresses genome instability and homeologous recombination. Nat. Genet. 27, 113–116 (2001).
Le, D. T. et al. PD-1 blockade in tumors with mismatch-repair deficiency. N. Engl. J. Med. 372, 2509–2520 (2015).
Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576 (2017).
Wang, T. et al. Gene essentiality profiling reveals gene networks and synthetic lethal interactions with oncogenic Ras. Cell 168, 890–903 (2017).
Ballouz, S. & Gillis, J. AuPairWise: a method to estimate RNA-seq replicability through co-expression. PLOS Comput. Biol. 12, e1004868 (2016). Home (25 Doggett St)
Hart, T. & Moffat, J. BAGEL: a computational framework for identifying essential genes from pooled library screens. BMC Bioinformatics 17, 164 (2016).
Yoshihama, M. et al. The human ribosomal protein genes: sequencing and comparative analysis of 73 genes. Genome Res. 12, 379–390 (2002).
Iorio, F. et al. Unsupervised correction of gene-independent cell responses to CRISPR–Cas9 targeting. BMC Genomics 19, 604 (2018).
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).
Aguirre, A. J. et al. Genomic copy number dictates a gene-independent cell response to CRISPR/Cas9 targeting. Cancer Discov. 6, 914–929 (2016).
Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 15, 554 (2014).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Durinck, S. et al. BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21, 3439–3440 (2005).
Cerami, E. G. et al. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 39, D685–D690 (2011).
Iorio, F. et al. Pathway-based dissection of the genomic heterogeneity of cancer hallmarks’ acquisition with SLAPenrich. Sci. Rep. 8, 6713 (2018).
GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Cokelaer, T. et al. GDSCTools for mining pharmacogenomic interactions in cancer. Bioinformatics 34, 1226–1228 (2018).
Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).
Mi, H., Muruganujan, A. & Thomas, P. D. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res. 41, D377–D386 (2013).
Law, C. W., Chen, Y., Shi, W. & Smyth, G. K. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014).
Garcia-Alonso, L. et al. Transcription factor activities enhance markers of drug sensitivity in cancer. Cancer Res. 78, 769–780 (2018).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Baralis, E., Bertotti, A., Fiori, A. & Grand, A. LAS: a software platform to support oncological data management. J. Med. Syst. 36, 81–90 (2012).
We thank D. Adams, G. Vassiliou and L. Parts for comments on the manuscript, members of the M.J.G. laboratory and Sanger Institute facilities (Wellcome Trust grant 206194). Work was funded by Open Targets (OTAR015) to M.J.G., K.Y. and J.S.-R. The K.Y. laboratory is supported by Wellcome Trust (206194). The M.J.G. laboratory is supported by SU2C (SU2C-AACR-DT1213) and Wellcome Trust (102696 and 206194). Support was also received from AIRC 20697 (A.B.) and 18532 (L.T.); 5x1000 grant 21091 (A.B. and L.T.); ERC Consolidator Grant 724748 – BEAT (A.B.); FPRC-ONLUS, 5x1000 Ministero della Salute 2011 and 2014 (L.T.); and Transcan, TACTIC (L.T.).
Extended data figures and tables
Extended Data Fig. 1 Project Score CRISPR–Cas9 screening pipeline, data quality control and analysis set.
a, CRISPR–Cas9 screening pipeline workflow, including quality control steps and go/no-go decisions. b, Genomic characterization of the CRISPR–Cas9-screened cell lines. c, Average Pearson’s correlation of replicate sgRNA counts (n = 86,875) for individual cell lines. d, Data quality control threshold based on the distributions of Pearson’s correlation values of sgRNA fold change values between replicates of the same cell line (in green) and all possible pairwise comparisons (in grey), considering the 838 highly informative sgRNAs (described in the Methods). e, Percentage of experiments passing the quality control filter defined in d. f, Pearson’s correlation values as described in d for the cell lines in the final analysis set. g, ROC and precision/recall curves were obtained after classifying predefined essential (n = 354) and non-essential (n = 747) genes based on gene-level rank positions calculated using depletion fold changes. The median areas under the curve across all cell lines are reported. h, Glass’s Δ scores quantifying the depletion effect size for genes that encode ribosomal proteins (n = 61) and a priori known essential (n = 354) genes for all cell lines. i, Cell lines in the final analysis set grouped by tissue (inner ring) and cancer-type (outer ring). j, Median gene-level depletion fold change (FC) values and interquartiles for reference gene sets defined in g and h for the 324 cell lines included in the analysis set. GEX, gene expression; METH, methylation; CNA, copy number alteration; WES, whole-exome DNA sequencing; AUROC, area under receiver operating characteristic; AUPR; area under precision/recall curve.
Extended Data Fig. 2 Assessment of technical confounders in CRISPR–Cas9 screening data and summary of fitness genes.
a, Absence of association between screening data quality and the number of replicates (as quantified by a Pearson’s correlation with respect to the number of replicates, n = 5 distinct values). Data quality was assessed using the fitness effect (the median fold change) of genes that encode ribosomal protein (n = 61) in each cell line as a reference. b, Absence of an association between data quality (quantified as in a) and average Pearson’s correlation between replicates of individual screened cell lines (n = 324). The P value refers to a two-sample Student’s t-test, the score on the right plot is a Pearson’s correlation. c, Weak correlation and significant association between sgRNA library transduction efficiency in cell lines (averaged for replicates) and data quality. d, Weak correlation and significant association between the Cas9 activity of a cell line (averaged for replicates) and data quality. e, Absence of an association between library coverage and data quality. In c–e, P values, R and sample sizes (n) are defined as for b. f, Number of fitness genes in each cell line (BAGEL FDR < 5%; median = 1,459). g, Number of cell lines with fixed intervals of numbers of fitness genes. h, Absence of correlation between number of significant fitness genes per cell line and number of replicates, R defined as for a. i, The effect of the version of the sgRNA screening library on the number of fitness genes identified. A new version of the library (v.1.1) with additional guides for a subset of genes yields moderately larger numbers of fitness genes; however, this is equally variable in both groups and confounded by the tissue of origin of the cell lines. P value is from a two-sample Student’s t-test. j, Reproducible calling of fitness genes in HT-29 across sgRNA libraries. Left, the number of fitness genes detected in each library. Right, scatter plots of depletion scores at the genome-wide level or considering only highly informative sgRNAs for each library. In both cases, P values from a Fisher’s exact test are below machine precision (<10−16). R indicates Pearson’s correlation; C indicates the percentage of genes called as significantly depleted with both libraries over those detected as significantly depleted with one library only. k, Pearson’s correlation between the number of fitness genes per cell line and Cas9 activity level and library transduction efficiency. l, Pearson’s correlation between the number of fitness genes per cell line and the average Pearson’s correlation of cell line replicates. m, n, Pearson’s correlation between the number of fitness genes per cell line and the ability to detect a defined essential genes. For all panels, each data point is a cell line coloured by cancer type (except g and j). Box-and-whisker plots show the median, interquartile range and 95 percentiles.
Extended Data Fig. 3 Computation of ovary-specific and pan-cancer core fitness genes with ADaM, and a summary of context-specific and core fitness genes.
a, Number of fitness genes in each cell line. b, Number of fitness genes in a fixed number (m) of cell lines. c, Distributions and cumulative distributions of number of fitness genes observed in m cell lines across 1,000 randomized versions of the depletion scores for ovary cell lines. d, True-positive rates (for which a priori known essential genes are counted as positive) when considering the genes that are depleted (fitness genes) in at least m cell lines (blue curve) as predictions and the deviance of the number of these genes from expectations (computed using the randomized data shown in c) for all possible values of m (red curve). The x coordinate (rounded by excess) of the intersection of these two curves estimates the minimal number of cell lines m∗ in which a gene should be significantly depleted in order to be predicted as a core fitness gene for a cancer type. e, Number of genes predicted to be cancer-type-specific core fitness genes for a fixed number (k) of cancer types. f, Distributions (top) and cumulative distributions (bottom) of the number of core fitness genes predicted for a fixed number of tissue types for 1,000 randomized versions of the cancer-type-specific core fitness profiles. g, True-positive rates (for which a priori known essential genes are counted as positive) when considering the genes that are core fitness genes for at least k cancer types (blue curve) as predictions and the deviance of the number of these genes from expectation (computed using the randomized data shown in f; red curve). The x coordinate estimates the minimal number of cancer types k∗ for which a gene should have been predicted as a cancer-type-specific core fitness gene in order to be classified as a pan-cancer core fitness gene. All box-and-whisker plots show the interquartile ranges and 95th percentiles, with centres indicating medians.
a, The 553 pan-cancer core fitness genes in reference essential gene sets are shown9,10. Respective recall and enrichment significance P values from a hypergeometric test when considering the whole set of genes targeted in the CRISPR–Cas9 screen as the background population (n = 17,995). The 132 newly identified core fitness genes fall outside of these reference gene sets. b, c, Pathways (b) and gene families (c) enriched in the 132 newly identified pan-cancer core fitness genes (Benjamini–Hochberg-adjusted hypergeometric test P < 0.05). d, Comparison of the ADaM core fitness genes with two previously reported reference sets9,10 of essential genes in terms of number of genes, estimated precision and recall (the genes included in reference gene sets corresponding to cellular essential process were considered to be true-positive genes). e, FDRs of putative context-specific fitness genes at different thresholds of reliability (n = 7,393, 2,233, 426 and 82 putative context-specific fitness genes, respectively, for thresholds equal to 20, 50, 100 and 200 of log-likelihood of skewed t-distributions). f, Clustering of cancer types based on core fitness gene similarity (left) and numbers of cancer-type core-specific fitness genes exclusive to each cancer type (right). g, Basal expression of cancer-type specific core fitness genes (n, across tissues indicated in Fig. 1c) in matched normal tissues compared with all the other genes in the genome, across cancer types (as indicated by the different colours). Five genes were identified as core fitness genes in a single cancer type and are not expressed at the basal level (<5% quantile) in matched normal tissue (red points). Cancer types are coloured as shown in f. Box-and-whisker plots show interquartile ranges and 95th percentiles, with sample sizes indicated in f (right), centres indicate median values.
a, Criteria for the target prioritization scoring system. b, ANOVA results from differential dependency biomarker analyses with all 1,001 significant associations classified as pan-cancer or cancer-type-specific associations (inner circle), loss- or gain-of-fitness marker (middle circle) and whether the marker is a mutation, copy number gain or loss (outer circle). c, Distributions of pan-cancer (left) and cancer-type-specific (right) non-null target priority scores based on the therapeutic indication of approved or preclinical compounds. The significance threshold was based on the distribution of scores for targets with approved anticancer compounds (specific anticancer compounds for the cancer-type-specific priority score) versus scores for targets with no available anticancer compounds. d, Overlap between cancer-type-specific priority targets (for at least one cancer type) and pan-cancer priority targets. e, Example priority targets identified only in the pan-cancer context. Each symbol is an individual cell line coloured by cancer type and symbol shapes indicate a significant dependency (n = 324 cell lines).
Each data point is a target with a priority score classified into tractability buckets and groups. The shapes represent the indication of the approved and/or preclinical compound to the corresponding target (other disease (square), anticancer (triangle) or specific to the cancer type considered (rhombus)); circles indicate the absence of a compound. Symbols within each data point indicate the strength of the genomic marker associated with differential dependency on the target (class A to C indicate strong to weak associations).
Extended Data Fig. 7 GPX4 fitness selectivity for cells undergoing epithelial–mesenchymal transitions, functional classification of priority targets and WRN differential fitness in other cancer types.
a, Differentially expressed genes in cell lines that are dependent on GPX4 (left) (n = 113, non-dependent versus dependent, moderated t-statistic FDR estimates). Epithelial–mesenchymal transition is the top differentially enriched cancer hallmark gene signature in GPX4-dependent cell lines (right). P values from single-sample gene set enrichment analyses were obtained by randomly permuting gene signatures 10,000 times and adjusted for multiple testing using the Benjamini–Hochberg FDR correction. b, Functional classification of priority targets in each tractability group using the PANTHER database. For clarity, kinases (a subset of transferases) and transcription factors are shown separately. Protein classes are indicated by colour. Statistical enrichment was calculated using a systematic hypergeometric test across protein families, following correction for multiple testing with the Benjamini–Hochberg method. Pie charts indicate the percentage of targets in each group classified according to protein families. c, WRN dependency in multiple cancer types. Each data point is a cell line showing the quantile-normalized WRN sgRNA fold change value stratified by MSI status. Box-and-whisker plots show interquartile ranges and 95th percentiles and centres indicate median values. Individual values are shown as dots. Statistical significance was calculated from the systematic ANOVA analysis for each cancer type for which the number of cell lines was greater than 10 (n = 14 for gastric carcinoma).
a, WRN dependency using a co-competition assay in MSI (top row, n = 7) and MSS (bottom row, n = 7) cell lines from four cancer types. sgRNAs targeting essential (sgEss) and non-essential (sgNon) genes were used as controls. Bars represent mean co-competition score; lines represent maximum and minimum values; individual data points overlaid. b, Selective WRN dependency in MSI versus MSS cell lines was confirmed using clonogenic assays in four cancer types (images are representative of two independent experiments). c, A reduction in WRN protein levels with all WRN sgRNAs was confirmed by western blot (images are representative of two independent experiments). d, An association between WRN dependency and MSI status was confirmed by mining data from an independent study that used RNA interference, project DRIVE12 (Student’s t-test, P = 0.004; n = 214). Each circle represents the WRN RNA-interference dependency score in a cancer cell line. Box-and-whisker plots represent median and 1.5× interquartile range. e, siRNA depletion of WRN inhibited proliferation of HCT116 cells. Data are mean ± s.d. of three independent experiments. The P value was determined using a non-parametric Student’s t-test. f, siRNA-mediated depletion of WRN was verified by western blot (images are representative of two independent experiments). For western blot source data, see Supplementary Fig. 1.
a, A WRN co-competition assay in MSS SW620 cells with stable MLH1 knockout. Cells were cultured for 3 months before assessing WRN dependency. Data are mean ± s.e.m. of three independent experiments. b, Western blotting confirmed MLH1 and WRN knockout (images are representative of two independent experiments). c, MLH1 and MSH3 expression by western blot in HCT116 parental and isogenic cell lines complemented with chromosome 2 (Ch.2; negative control), Ch.3 (which contains MLH1), Ch.5 (which contains MSH3) and Ch.3 + Ch.5 (which contains both MLH1 and MSH3). Data are representative of two independent experiments. d, Effect of WRN knockout (WRN sgRNAs 1 and 4 (sgWRN1 and sgWRN4, respectively)) on viability after 7 days in HCT116 parental and isogenic cell lines. Data are mean ± s.d. of three independent experiments. e, Clonogenic assays (14 days) after WRN knockout in HCT116 parental and isogenic cell lines. Data are representative of three independent experiments. f, Reduction in WRN levels was confirmed by western blot. Data are representative of two independent experiments. Source data for all western blots are shown in Supplementary Fig. 1.
Extended Data Fig. 10 Functional rescue experiments and in vivo validation of WRN dependency in a MSI colorectal cancer cell line.
a, Expression of wild-type mouse Wrn rescued the viability effect of WRN knockout in MSI cell line SW48. MSS cell line SW620 was used a negative control. Box-and-whisker plots represent the median and 1.5× interquartile range. Data represent two independent biological replicates completed in technical triplicate. b, Western blots confirmed expression of Flag-tagged protein using all variants of the Wrn vector. Images are representative of experiments performed in triplicate. c, WRN knockout induced by doxycycline treatment in WRN sgRNA-expressing HCT116 (HCT116-WRN) cells measured by western blot for two separate clonal lines. Data are representative of two independent experiments. d, Growth curves of HCT116 parental, HCT116 sgNon (non-essential sgRNA) and WRN sgRNA-expressing HCT116 cells grown in the absence (black line) or presence of doxycycline (2 μg ml−1; yellow line). Data are mean ± s.d. of 10 technical replicate wells for each condition (1 image per well) and representative of two independent experiments. e, Growth curves of WRN sgRNA-expressing HCT116 (clone b) subcutaneous tumours from mice treated with doxycycline (50 mg kg−1; yellow line) or vehicle (grey line). Tumour growth suppression was observed (P = 0.03, two-way ANOVA comparing doxycycline versus vehicle). The number of mice in each cohort is indicated. Data are mean ± s.e.m. f, Representative KI-67 immunohistochemistry assessment of WRN sgRNA-expressing HCT116 (clone b) tumours explanted after one week of doxycycline treatment (left). Scale bar, 50 μm; 40× magnification. Quantification of KI-67 staining (right). Data are mean ± s.d. of 10 fields from three different samples (n = 30) and means were compared using a two-sided Welch’s t-test. Source data for all western blots are shown in Supplementary Fig. 1. Source Data
This file contains Supplementary Text and Data (see contents page for details) and a guide to the Supplementary Data available on Figshare.
This file contains the uncropped blots from Extended Data Figures 8-10.
This file contains Supplementary Tables 1-10 and a Supplementary Table Guide.
About this article
Nature Reviews Drug Discovery (2019)
Functional linkage of gene fusions to cancer cell fitness assessed by pharmacological and CRISPR-Cas9 screening
Nature Communications (2019)
Nature Reviews Cancer (2019)