Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells

Abstract

The CRISPR–Cas9 system has revolutionized gene editing both at single genes and in multiplexed loss-of-function screens, thus enabling precise genome-scale identification of genes essential for proliferation and survival of cancer cells1,2. However, previous studies have reported that a gene-independent antiproliferative effect of Cas9-mediated DNA cleavage confounds such measurement of genetic dependency, thereby leading to false-positive results in copy number–amplified regions3,4. We developed CERES, a computational method to estimate gene-dependency levels from CRISPR–Cas9 essentiality screens while accounting for the copy number–specific effect. In our efforts to define a cancer dependency map, we performed genome-scale CRISPR–Cas9 essentiality screens across 342 cancer cell lines and applied CERES to this data set. We found that CERES decreased false-positive results and estimated sgRNA activity for both this data set and previously published screens performed with different sgRNA libraries. We further demonstrate the utility of this collection of screens, after CERES correction, for identifying cancer-type-specific vulnerabilities.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Genomic copy number confounds the interpretation of CRISPR–Cas9 loss-of-function proliferation screens of cancer cell lines.
Figure 2: Schematic of the CERES computational model.
Figure 3: CERES corrects the copy number effect and improves the specificity of CRISPR–Cas9 essentiality screens while preserving true gene dependencies.
Figure 4: CERES estimates guide activity scores for each sgRNA.
Figure 5: CERES decreases false-positive differential dependencies.
Figure 6: CERES decreases false positives among lineage-specific differential dependencies due to recurrently amplified chromosome arms.

Similar content being viewed by others

References

  1. Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015).

    Article  CAS  Google Scholar 

  2. Hart, T. et al. High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell 163, 1515–1526 (2015).

    Article  CAS  Google Scholar 

  3. Aguirre, A.J. et al. Genomic copy number dictates a gene-independent cell response to CRISPR/Cas9 targeting. Cancer Discov. 6, 914–929 (2016).

    Article  CAS  Google Scholar 

  4. Munoz, D.M. et al. CRISPR screens provide a comprehensive assessment of cancer vulnerabilities but generate false-positive hits for highly amplified genomic regions. Cancer Discov. 6, 900–913 (2016).

    Article  CAS  Google Scholar 

  5. Cheung, H.W. et al. Systematic investigation of genetic vulnerabilities across cancer cell lines reveals lineage-specific dependencies in ovarian cancer. Proc. Natl. Acad. Sci. USA 108, 12372–12377 (2011).

    Article  CAS  Google Scholar 

  6. Marcotte, R. et al. Essential gene profiles in breast, pancreatic, and ovarian cancer cells. Cancer Discov. 2, 172–189 (2012).

    Article  CAS  Google Scholar 

  7. Cowley, G.S. et al. Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies. Sci. Data 1, 140035 (2014).

    Article  CAS  Google Scholar 

  8. Tzelepis, K. et al. A CRISPR dropout screen identifies genetic vulnerabilities and therapeutic targets in acute myeloid leukemia. Cell Rep. 17, 1193–1205 (2016).

    Article  CAS  Google Scholar 

  9. Wang, T. et al. Gene essentiality profiling reveals gene networks and synthetic lethal interactions with oncogenic Ras. Cell 168, 890–903.e15 (2017).

    Article  CAS  Google Scholar 

  10. Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576.e16 (2017).

    Article  CAS  Google Scholar 

  11. Fellmann, C., Gowen, B.G., Lin, P.-C., Doudna, J.A. & Corn, J.E. Cornerstones of CRISPR–Cas in drug discovery and therapy. Nat. Rev. Drug Discov. 16, 89–100 (2017).

    Article  CAS  Google Scholar 

  12. Corsello, S.M. et al. The Drug Repurposing Hub: a next-generation drug library and information resource. Nat. Med. 23, 405–408 (2017).

    Article  CAS  Google Scholar 

  13. Doench, J.G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191 (2016).

    Article  CAS  Google Scholar 

  14. Hart, T., Brown, K.R., Sircoulomb, F., Rottapel, R. & Moffat, J. Measuring error rates in genomic perturbation screens: gold standards for human functional genomics. Mol. Syst. Biol. 10, 733 (2014).

    Article  Google Scholar 

  15. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).

    Article  CAS  Google Scholar 

  16. Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 15, 554 (2014).

    Article  Google Scholar 

  17. Hart, T. & Moffat, J. BAGEL: a computational framework for identifying essential genes from pooled library screens. BMC Bioinformatics 17, 164 (2016).

    Article  Google Scholar 

  18. Doench, J.G. et al. Rational design of highly active sgRNAs for CRISPR-Cas9–mediated gene inactivation. Nat. Biotechnol. 32, 1262–1267 (2014).

    Article  CAS  Google Scholar 

  19. Xu, H. et al. Sequence determinants of improved CRISPR sgRNA design. Genome Res. 25, 1147–1157 (2015).

    Article  CAS  Google Scholar 

  20. Xiang, X. et al. Grhl2 determines the epithelial phenotype of breast cancers and promotes tumor progression. PLoS One 7, e50781 (2012).

    Article  CAS  Google Scholar 

  21. Werner, S. et al. Dual roles of the transcription factor grainyhead-like 2 (GRHL2) in breast cancer. J. Biol. Chem. 288, 22993–23008 (2013).

    Article  CAS  Google Scholar 

  22. Zhang, X.D. A pair of new statistical parameters for quality control in RNA interference high-throughput screening assays. Genomics 89, 552–561 (2007).

    Article  CAS  Google Scholar 

  23. Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).

    Article  CAS  Google Scholar 

  24. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).

    Article  CAS  Google Scholar 

  25. Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45 D1, D353–D361 (2017).

    Article  CAS  Google Scholar 

  26. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    Article  Google Scholar 

  27. Boyd, S. & Vandenberghe, L. Convex Optimization 1–730 (Cambridge Univ. Press, 2004).

Download references

Acknowledgements

This work was supported by grants U01 CA176058, U01 CA199253, and P01 CA154303 (W.C.H.) and by the Slim Initiative for Genomic Medicine, a project funded by the Carlos Slim Foundation and the H.L. Snyder Foundation.

Author information

Authors and Affiliations

Authors

Contributions

R.M.M., J.G.B., and A.T. conceived and designed the study. R.M.M., J.G.B., and J.M.M. performed computational analysis and interpretation of results. J.G.B. wrote and implemented the modeling software. R.M.M., B.A.W., and A.E.S. processed and managed data. H.X. and N.V.D. assisted with computational analysis. P.G.M. provided computational tools. G.S.C., S.P., and F.V. provided project management. A.G., Y.L., L.D.A., G.J., R.L., W.F.H., M.S., T.W., D.C.H., V.A.Z., M.R.W., Z.K., J.J.C., and M.O. assisted with data generation. R.M.M., J.G.B., J.M.M., W.C.H., and A.T. wrote and/or revised the manuscript with assistance from other authors. K.S., T.R.G., J.S.B., F.V., D.E.R., W.C.H., and A.T. supervised the study and performed an advisory role.

Corresponding authors

Correspondence to William C Hahn or Aviad Tsherniak.

Ethics declarations

Competing interests

W.C.H. reports receiving a commercial research grant from Novartis and serving as a consultant/advisory-board member for Novartis as well as for KSQ Therapeutics. No potential conflicts of interest are disclosed by the other authors.

Integrated supplementary information

Supplementary Figure 1 Screen quality and the copy number effect in previously published CRISPR–Cas9 essentiality screens.

(a) Screen quality as measured by area under the receiver operating characteristic curve (AUC) in discriminating between sets of common core essential and nonessential genes in two previously published datasets. (b) sgRNA depletion is regressed against genomic cut sites using a saturating linear model. The fit is plotted in red, and the median depletion at various levels of cuts is shown with horizontal bars. The r-squared of the fit is shown as a function of breakpoint in the inset plot. To the right, the distributions of sgRNAs targeting the ribosome, proteasome, and spliceosome are shown in red, non-targeting sgRNAs in blue, and all other sgRNAs in grey. (c) The fit for all cell lines is shown, as in Figure 1b, for the previously published cell lines. (d) For each dataset, the fits in Figure 1b and in panel c are averaged across cell lines, grouped by p53 mutation status. Shaded regions indicate mean +/− 1 s.d.

Supplementary Figure 2 Copy number–amplified genes are enriched for depletion ranks in CRISPR–Cas9 essentiality screens.

(a) A one-sample K-S test is performed on the depletion ranks of the 100 most amplified genes per cell line in each dataset. The K-S enrichment statistic for each line is plotted against the mean copy number of these 100 genes. Red points indicate cell lines for which these 100 genes are significantly enriched at P < 0.05. (b) As in Figure 1d, genes are ranked by average guide score for cell lines screened in two previously published datasets, and the ranks (median and IQR) of the 100 genes with the highest copy number measurements are plotted and colored by their mean copy number.

Supplementary Figure 3 CERES corrects the copy number effect in CRISPR–Cas9 essentiality screens.

(a) Boxplots of gene dependency scores across copy number levels before and after CERES correction of two previously published datasets. (b) Boxplots as in Figure 3a and panel a filtering for only genes called unexpressed by RNA-seq. (c) The dependency of each gene across cell lines is correlated with its copy number measurements, before and after CERES correction. The distribution of Pearson correlation coefficients is shown for each dataset analyzed for all genes on the left and for unexpressed genes on the right. The mean of each distribution is shown with a dotted line.

Supplementary Figure 4 CERES improves the specificity of CRISPR–Cas9 essentiality screens while preserving data in copy number–amplified regions.

(a) The recall common core essential genes at a 5% FDR of nonessential genes is plotted for each cell line before (red) and after (blue) CERES correction for two previously published datasets. (b) The recall at 5% FDR is plotted after correction using a linear model of copy number correction for the Avana dataset. (c) Using a simple filtering scheme removing all genes with a copy number > 4, the total number of genes filtered per cell line is plotted on the left, and the number of genes per cell line with a CERES gene effect < -0.6 on the right. The means are shown with dotted lines.

Supplementary Figure 5 Example correction of JAK2 amplification in the HEL cell line from the Wang 2017 dataset.

Copy number and gene dependency scores, before and after CERES correction, are plotted as in Figure 3c. JAK2 is highlighted in orange and labeled, as well as RCL1, which is involved in the biogenesis of the 40S ribosomal subunit.

Supplementary Figure 6 CERES preserves known cancer-specific genetic dependencies.

(a-d) Known cancer-specific genetic dependencies are plotted against copy number before (left) and after (right) CERES correction for 342 cell lines. The dependencies plotted are: KRAS dependency colored by KRAS mutation status (a), BRAF dependency colored by BRAF mutation status (b), PIK3CA dependency colored by PIK3CA mutation status (c), and MYCN dependency with neuroblastoma cell lines in purple (d).

Supplementary Figure 7 CERES infers guide activity scores in a previously published dataset.

(a) The composition of guide activity scores inferred by CERES from the Wang 2017 dataset. (b) For the set of 18,501 sgRNAs shared between the Avana and Wang 2017 libraries, sgRNAs are ranked by guide activity scores in each dataset and are plotted against each other, with darker purple representing greater density of sgRNAs.

Supplementary Figure 8 CERES reduces false-positive differential dependencies in two previously published datasets.

(a) The percentage of amplified genes above given thresholds of differential dependencies are plotted as in Figure 5a. (b) The percentage of unexpressed genes above given thresholds of differential dependencies are plotted as in Figure 5b.

Supplementary Figure 9 Differentially dependent genes in breast cancer cell lines.

(a,b) Taking genes that were called differential dependencies in breast lines in the Avana dataset, a differential dependency analysis using a dataset of RNAi screens is plotted for genes on chromosome 8q (a) and other regions (b). (c) Example relationships between highly expressed transcription factors in breast cancer and differential dependency scores after CERES correction. Breast cell lines are highlighted in pink. TRPS1 and GRHL2 are labeled in red, indicating that they are on chromosome 8q.

Supplementary Figure 10 Precision recall of random subsamplings of cell lines.

The F1-measure (harmonic mean of precision and recall) is calculated for each random sub-sampling of cell lines and compared to improvement in F1-measure from the full run of CERES on 342 cell lines.

Supplementary Figure 11 Hyperparameter optimization of the CERES model.

Error of the CERES algorithm evaluated on training and test data at 25 values of the regularization parameter for each of the three datasets analyzed. As the regularization strength is eased (moving from left to right), error on the training set decreases monotonically, while error on the test set decreases then increases.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–11

Life Sciences Reporting Summary

Supplementary Table 1

Cancer cell line information

Sample information for the 342 cancer cell lines used in this study

Supplementary Table 2

sgRNA sequences and targets

sgRNA barcode sequences with genome alignments and coding sequence mappings for the Avana library

Supplementary Table 3

Avana gene-knockout effects

CERES-estimated gene-knockout effects for 342 cancer cell lines screened with the Avana sgRNA library

Supplementary Table 4

GeCKOv2 gene-knockout effects

CERES-estimated gene-knockout effects for 33 cancer cell lines screened with the GeCKOv2 sgRNA library published in Aguirre et al. (2016)

Supplementary Table 5

Wang2017 gene-knockout effects

CERES-estimated gene-knockout effects for 14 AML cell lines screened with the Wang2017 sgRNA library published in Wang et al. (2017)

Supplementary Table 6

Avana guide activity scores

CERES-estimated guide activity scores for sgRNAs in the Avana dataset

Supplementary Table 7

GeCKOv2 guide activity scores

CERES-estimated guide activity scores for sgRNAs in the GeCKOv2 dataset

Supplementary Table 8

Wang guide activity scores

CERES-estimated guide activity scores for sgRNAs in the Wang2017 dataset

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Meyers, R., Bryan, J., McFarland, J. et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat Genet 49, 1779–1784 (2017). https://doi.org/10.1038/ng.3984

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3984

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer