Genome-wide screening using CRISPR coupled with nuclease Cas9 (CRISPR–Cas9) is a powerful technology for the systematic evaluation of gene function. Statistically principled analysis is needed for the accurate identification of gene hits and associated pathways. Here, we describe how to perform computational analysis of CRISPR screens using the MAGeCKFlute pipeline. MAGeCKFlute combines the MAGeCK and MAGeCK-VISPR algorithms and incorporates additional downstream analysis functionalities. MAGeCKFlute is distinguished from other currently available tools by its comprehensive pipeline, which contains a series of functions for analyzing CRISPR screen data. This protocol explains how to use MAGeCKFlute to perform quality control (QC), normalization, batch effect removal, copy-number bias correction, gene hit identification and downstream functional enrichment analysis for CRISPR screens. We also describe gene identification and data analysis in CRISPR screens involving drug treatment. Completing the entire MAGeCKFlute pipeline requires ~3 h on a desktop computer running Linux or Mac OS with R support.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $41.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The source code of MAGeCKFlute (version 0.99.18) is freely available at https://bitbucket.org/liulab/mageckflute/ under the three-clause Berkeley Software Distribution (BSD) open-source license. Questions or comments can be submitted through the MAGeCK Google group: https://groups.google.com/d/forum/mageck. The datasets used in this paper are presented in http://cistrome.org/MAGeCKFlute/.
Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).
Gilbert, L. A. et al. Genome-scale CRISPR-mediated control of gene repression and activation. Cell 159, 647–661 (2014).
Konermann, S. et al. Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature 517, 583–588 (2015).
Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).
Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80–84 (2014).
Koike-Yusa, H., Li, Y., Tan, E. P., Velasco-Herrera Mdel, C. & Yusa, K. Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nat. Biotechnol. 32, 267–273 (2014).
Shalem, O. et al. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science 343, 84–87 (2014).
Hart, T. et al. High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell 163, 1515–1526 (2015).
Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015).
Chen, S. et al. Genome-wide CRISPR screen in a mouse model of tumor growth and metastasis. Cell 160, 1246–1260 (2015).
Manguso, R. T. et al. In vivo CRISPR screening identifies Ptpn2 as a cancer immunotherapy target. Nature 547, 413–418 (2017).
Burr, M. L. et al. CMTM6 maintains the expression of PD-L1 and regulates anti-tumour immunity. Nature 549, 101–105 (2017).
Kurata, M. et al. Using genome-wide CRISPR library screening with library resistant DCK to find new sources of Ara-C drug resistance in AML. Sci. Rep. 6, 36199 (2016).
Han, K. et al. Synergistic drug combinations for cancer identified in a CRISPR screen for pairwise genetic interactions. Nat. Biotechnol. 35, 463–474 (2017).
Shi, J. et al. Discovery of cancer drug targets by CRISPR-Cas9 screening of protein domains. Nat. Biotechnol. 33, 661–667 (2015).
Mootha, V. K. et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 267–273 (2003).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).
Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 15, 554 (2014).
Li, W. et al. Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR. Genome Biol. 16, 281 (2015).
Toledo, C. M. et al. Genome-wide CRISPR-Cas9 screens reveal loss of redundancy between PKMYT1 and WEE1 in glioblastoma stem-like cells. Cell Rep. 13, 2425–2439 (2015).
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Luo, B. et al. Highly parallel identification of essential genes in cancer cells. Proc. Natl. Acad. Sci. USA 105, 20380–20385 (2008).
Konig, R. et al. A probability-based approach for the analysis of large-scale RNAi screens. Nat. Methods 4, 847–849 (2007).
Hart, T. & Moffat, J. BAGEL: a computational framework for identifying essential genes from pooled library screens. Bioinformatics 17, 164 (2016).
Yu, J., Silva, J. & Califano, A. ScreenBEAM: a novel meta-analysis algorithm for functional genomics screens via Bayesian hierarchical modeling. Bioinformatics 32, 260–267 (2016).
Morgens, D. W., Deans, R. M., Li, A. & Bassik, M. C. Systematic comparison of CRISPR-Cas9 and RNAi screens for essential genes. Nat. Biotechnol. 34, 634–636 (2016).
Falcon, S. & Gentleman, R. Using GOstats to test gene lists for GO term association. Bioinformatics 23, 257–258 (2007).
Yu, G., Lg, W., H., Y. & Qy., H. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).
Shalem, O., Sanjana, N. E. & Zhang, F. High-throughput functional genomics using CRISPR-Cas9. Nat. Rev. Genet. 16, 299–311 (2015).
Gini, C. “Concentration and dependency ratios” (in Italian). Rev. Pol. Econ. 87, 769–789 (1997).
Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).
Chen, C. H. et al. Improved design and analysis of CRISPR knockout screens. Bioinformatics 34, 4095–4101 (2018).
Jiang, P. et al. Network analysis of gene essentiality in functional genomics experiments. Genome Biol. 16, 239 (2015).
DeKelver, R. C. et al. Functional genomics, proteomics, and regulatory DNA analysis in isogenic settings using zinc finger nuclease-driven transgenesis into a safe harbor locus in the human genome. Genome Res. 20, 1133–1142 (2010).
Hockemeyer, D. et al. Efficient targeting of expressed and silent genes in human ESCs and iPSCs using zinc-finger nucleases. Nat. Biotechnol. 27, 851–857 (2009).
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
Aguirre, A. J. et al. Genomic copy number dictates a gene-independent cell response to CRISPR/Cas9 targeting. Cancer Discov. 6, 914–929 (2016).
Sherr, C. J. & Roberts, J. M. CDK inhibitors: positive and negative regulators of G1-phase progression. Genes Dev. 13, 1501–1512 (1999).
Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).
Wang, T. et al. Gene essentiality profiling reveals gene networks and synthetic lethal interactions with oncogenic Ras. Cell 168, 890–903 (2017).
Tzelepis, K. et al. A CRISPR dropout screen identifies genetic vulnerabilities and therapeutic targets in acute myeloid leukemia. Cell Rep. 17, 1192–1205 (2016).
Wang, T., Wei. J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80–84 (2014).
Chen, C.H., et al. Improved design and analysis of CRISPR knockout screens. Bioinformatics 34, 4095-4101 (2018).
Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Luo, W. & Brouwer, C. Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics 29, 1830–1831 (2013).
This project was supported by the National Institutes of Health (R01 HG008927), the National Key Research and Development Program of China (2017YFC0908500 to X.S.L), the Breast Cancer Research Foundation, the Department of Defense (PC140817P1 to M.B. and X.S.L), and the start-up fund of the Center for Genetic Medicine Research and the Gilbert Family Neurofibromatosis Institute (to W.L.).
T.X. and X.S.L are co-founders and M.B. and X.S.L. are on the Scientific Advisory Board of GV20 Oncotherapy. The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Key references using this protocol
Li, W. et al. Genome Biol. 15, 554 (2014): https://doi.org/10.1186/s13059-014-0554-4
Li, W. et al. Genome Biol. 16, 281 (2015): https://doi.org/10.1186/s13059-015-0843-6
Jeselsohn, R. et al. Cancer Cell 33, 173–186 (2018): https://doi.org/10.1016/j.ccell.2018.01.004
Xiao, T. et al. Proc. Natl Acad. Sci. USA 115, 7869–7878 (2018): https://doi.org/10.1073/pnas.1722617115
Key data used in this protocol
Toledo, C. M. et al. Cell Rep. 13, 2425–2439 (2015): https://doi.org/10.1016/j.celrep.2015.11.021
Hart, T. et al. Cell 163, 1515–1526 (2015): https://doi.org/10.1016/j.cell.2015.11.015
Shalem, O. et al. Science 343, 84–87 (2014): https://doi.org/10.1126/science.1247005
Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Science 343, 80–84 (2014): https://doi.org/10.1126/science.1246981
Chen, C.-H. et al. Bioinformatics 34, 4095–4101 (2018): https://doi.org/10.1093/bioinformatics/bty450
Integrated supplementary information
(a) Distribution of expression of all non-essential genes in CCLE cell lines. The x-axis is the relative expression of all non-essential genes measured by microarray. The y-axis is the density of expression of all non-essential genes. Genes with expression levels below the cutoff (red dashed line) were excluded from the non-essential gene list. (b) The coordinate of each dot indicates the number of genes (y-axis) whose expression ranked between the 5th and 100th percentile of the number of cell lines (x-axis). The dashed lines indicate that there are 350 out of 937 non-essential genes had expression that ranked between the 5th and 100th percentile in 98.3% (1019 out of 1036) Cancer Cell Line Encyclopedia (CCLE)36 cell lines.
Model of the relationship between β scores and gene copy numbers before (a) and after (b) copy number correction. The red line of each panel is the regression line, and the inflection point is calculated by minimizing the least squared error. Without the copy number bias correlation, the beta score shows a positive correlation with copy number. This bias can be corrected using MAGeCKFlute.
Beta score of core essential genes (blue dots) and all genes except essential genes (red dots) before and after normalization with essential genes. The histograms (blue bars) show the beta scores of treatment (top) and control (right) conditions. Before normalization (a), the beta score distribution of treatment and control conditions are not comparable. After normalization (b), these two distributions are more comparable (c) The formula for normalization of the beta score using essential genes where c is an empirical value is used to scale the normalized beta score. The value of c is 0.6 and was obtained from public screen data8.
All the data are from a genome-wide CRRSPR screen on the A375 cell line (EQUIPMENT) and downstream analysis was performed with FluteMLE (a) Beta score distribution of treatment samples (PLX7_R1, PLX7_R2) and control samples (D7_R1, D7_R2). (b) Scatterplot of beta scores of treatment (PLX7_R1) and control (D7_R1) sample. The regression line (dashed line) indicates the consistency of beta scores between the two conditions. (c) The MA plot can be used to visualize the differences between beta scores in two samples, by transforming the data onto M (log ratio) and A (mean average) scales, in which M= βT-βC, A=βT+βC, βT is the beta score of treatment samples, βC is the beta score of control samples. Blue line is M=0 and red line is the loess regression line. (d) Identification of treatment related genes. The horizontal and vertical dashed lines indicate the mean plus or minus one stand deviation of treatment and control beta score, respectively. The diagonal dashed line indicates mean plus or minus one standard deviation of the differential beta score which can be calculated by subtracting the control from the treatment beta score. The number in red is the number of genes classified in each group. Top 5 genes are selected based on the largest absolute value of the differential beta score and labelled in each group. Genes in the green group are strongly negatively selected in the control samples and are weakly positively or negatively selected in the treatment samples. These genes are potentially located in the pathways targeted by the treatment. The orange group contains genes that are weakly selected in the control and strongly positively selected in treatment. These are genes whose loss confers treatment resistance. Genes in the blue group are strongly positively selected in the control and weakly selected in the treatment. These genes may be either potential regulators of cell proliferation in general, or regulators of the treatment target. Genes in the purple group are weakly selected in the control and strongly negatively selected in the treatment. These genes are potentially synthetically lethal in combination with the drug treatment. The histograms (grey bars) show the beta scores of treatment (top) and control (right) conditions.
Supplementary Figures 1–4 and Supplementary Methods
The nonessential gene list.
Copy-number file used to perform the copy-number correction.
The list of core essential genes.
The LNCap data, which include AAVS1, CCR5 and ROSA26 as negative-control genes.
A video tutorial showing how to edit the ‘config.yaml’ file used by MAGeCK-VISPR.
About this article
Quarterly Reviews of Biophysics (2019)