Titrating gene expression using libraries of systematically attenuated CRISPR guide RNAs

Abstract

A lack of tools to precisely control gene expression has limited our ability to evaluate relationships between expression levels and phenotypes. Here, we describe an approach to titrate expression of human genes using CRISPR interference and series of single-guide RNAs (sgRNAs) with systematically modulated activities. We used large-scale measurements across multiple cell models to characterize activities of sgRNAs containing mismatches to their target sites and derived rules governing mismatched sgRNA activity using deep learning. These rules enabled us to synthesize a compact sgRNA library to titrate expression of ~2,400 genes essential for robust cell growth and to construct an in silico sgRNA library spanning the human genome. Staging cells along a continuum of gene expression levels combined with single-cell RNA-seq readout revealed sharp transitions in cellular behaviors at gene-specific expression thresholds. Our work provides a general tool to control gene expression, with applications ranging from tuning biochemical pathways to identifying suppressors for diseases of dysregulated gene expression.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Mismatched sgRNAs titrate GFP expression at the single-cell level.
Fig. 2: A large-scale CRISPRi screen identifies factors governing mismatched sgRNA activity.
Fig. 3: Identification and characterization of intermediate-activity constant regions.
Fig. 4: Neural network predictions of sgRNA activity.
Fig. 5: Compact mismatched sgRNA library targeting essential genes.
Fig. 6: Rich phenotyping of cells with intermediate-activity sgRNAs by Perturb-seq.

Data availability

Raw and processed Perturb-seq data are available at GEO under accession code GSE132080. Raw and processed sgRNA read counts from pooled screens are provided as supplementary tables. All other data will be made available by the corresponding author upon reasonable request.

Code availability

Custom scripts in this manuscript largely build on scripts published previously14,34,52. An IPython notebook detailing the initialization of the CNN model and its use to predict mismatched sgRNA activities is included as a supplementary file. All custom scripts will be made available upon request.

References

  1. 1.

    Huang, N., Lee, I., Marcotte, E. M. & Hurles, M. E. Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 6, e1001154 (2010).

  2. 2.

    Rest, J. S. et al. Nonlinear fitness consequences of variation in expression level of a eukaryotic gene. Mol. Biol. Evol. 30, 448–456 (2013).

  3. 3.

    Bauer, C. R., Li, S. & Siegal, M. L. Essential gene disruptions reveal complex relationships between phenotypic robustness, pleiotropy, and fitness. Mol. Syst. Biol. 11, 773–773 (2015).

  4. 4.

    Keren, L. et al. Massively parallel interrogation of the effects of gene expression levels on fitness. Cell 166, 1282–1294.e18 (2016).

  5. 5.

    Dykhuizen, D. E., Dean, A. M. & Hartl, D. L. Metabolic flux and fitness. Genetics 115, 25–31 (1987).

  6. 6.

    Dekel, E. & Alon, U. Optimality and evolutionary tuning of the expression level of a protein. Nature 436, 588–592 (2005).

  7. 7.

    Alper, H., Fischer, C., Nevoigt, E. & Stephanopoulos, G. Tuning genetic control through promoter engineering. Proc. Natl Acad. Sci. USA 102, 12678–12683 (2005).

  8. 8.

    Perfeito, L., Ghozzi, S., Berg, J., Schnetz, K. & Lässig, M. Nonlinear fitness landscape of a molecular pathway. PLoS Genet. 7, e1002160 (2011).

  9. 9.

    Michaels, Y. S. et al. Precise tuning of gene expression levels in mammalian cells. Nat. Commun. 10, 818 (2019).

  10. 10.

    Patwardhan, R. P. et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nat. Biotechnol. 27, 1173–1175 (2009).

  11. 11.

    Moore, R., Chandrahas, A. & Bleris, L. Transcription activator-like effectors: a toolkit for synthetic biology. ACS Synth. Biol. 3, 708–716 (2014).

  12. 12.

    Dominguez, A. A., Lim, W. A. & Qi, L. S. Beyond editing: repurposing CRISPR-Cas9 for precision genome regulation and interrogation. Nat. Rev. Mol. Cell Biol. 17, 5–15 (2016).

  13. 13.

    Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816–821 (2012).

  14. 14.

    Horlbeck, M. A. et al. Compact and highly active next-generation libraries for CRISPR-mediated gene repression and activation. eLife 5, e19760 (2016).

  15. 15.

    Sanson, K. R. et al. Optimized libraries for CRISPR-Cas9 genetic screens with multiple modalities. Nat. Commun. 9, 5416 (2018).

  16. 16.

    Sternberg, S. H., Redding, S., Jinek, M., Greene, E. C. & Doudna, J. A. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature 507, 62–67 (2014).

  17. 17.

    Szczelkun, M. D. et al. Direct observation of R-loop formation by single RNA-guided Cas9 and Cascade effector complexes. Proc. Natl Acad. Sci. USA 111, 9798–9803 (2014).

  18. 18.

    Gilbert, L. A. et al. Genome-scale CRISPR-mediated control of gene repression and activation. Cell 159, 647–661 (2014).

  19. 19.

    Nishimasu, H. et al. Crystal structure of Cas9 in complex with guide RNA and target DNA. Cell 156, 935–949 (2014).

  20. 20.

    Kocak, D. D. et al. Increasing the specificity of CRISPR systems with engineered RNA secondary structures. Nat. Biotechnol. 37, 657–666 (2019).

  21. 21.

    Maji, B. et al. A high-throughput platform to identify small-molecule inhibitors of CRISPR-Cas9. Cell 177, 1067–1079 (2019).

  22. 22.

    Chiarella, A. M. et al. Dose-dependent activation of gene expression is achieved using CRISPR and small molecules that recruit endogenous chromatin machinery. Nat. Biotechnol. https://doi.org/10.1038/s41587-019-0296-7 (2019).

  23. 23.

    Tian, R. et al. CRISPR interference-based platform for multimodal genetic screens in human iPSC-derived neurons. Neuron 104, 239–255 (2019).

  24. 24.

    Nakamura, M. et al. Anti-CRISPR-mediated control of gene editing and synthetic circuits in eukaryotic cells. Nat. Commun. 10, 194 (2019).

  25. 25.

    Gilbert, L. A. et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell 154, 442–451 (2013).

  26. 26.

    Kampmann, M., Bassik, M. C. & Weissman, J. S. Integrated platform for genome-wide screening and construction of high-density genetic interaction maps in mammalian cells. Proc. Natl Acad. Sci. USA 110, E2317–E2326 (2013).

  27. 27.

    Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191 (2016).

  28. 28.

    Hsu, P. D. et al. DNA-targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31, 827–832 (2013).

  29. 29.

    Boyle, E. A. et al. High-throughput biochemical profiling reveals sequence determinants of dCas9 off-target binding and unbinding. Proc. Natl Acad. Sci. USA 114, 5461–5466 (2017).

  30. 30.

    Chen, B. et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell 155, 1479–1491 (2013).

  31. 31.

    Dang, Y. et al. Optimizing sgRNA structure to improve CRISPR-Cas9 knockout efficiency. Genome Biol. 16, 280 (2015).

  32. 32.

    Grevet, J. D. et al. Domain-focused CRISPR screen identifies HRI as a fetal hemoglobin regulator in human erythroid cells. Science 361, 285–290 (2018).

  33. 33.

    Briner, A. E. et al. Guide RNA functional modules direct Cas9 activity and orthogonality. Mol. Cell 56, 333–339 (2014).

  34. 34.

    Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882 (2016).

  35. 35.

    Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).

  36. 36.

    Kim, H. K. et al. Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).

  37. 37.

    Luo, J., Chen, W., Xue, L. & Tang, B. Prediction of activity and specificity of CRISPR-Cpf1 using convolutional deep learning neural networks. BMC Bioinformatics 20, 332 (2019).

  38. 38.

    Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866 (2016).

  39. 39.

    Jaitin, D. A. et al. Dissecting immune circuits by linking CRISPR-pooled screens with single-cell RNA-seq. Cell 167, 1883–1896 (2016).

  40. 40.

    Datlinger, P. et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat. Methods 14, 297–301 (2017).

  41. 41.

    Replogle, J. M. et al. Direct capture of CRISPR guides enables scalable, multiplexed, and multi-omic Perturb-seq. Preprint at bioRxiv https://doi.org/10.1101/503367 (2018).

  42. 42.

    Harding, H. P. et al. An integrated stress response regulates amino acid metabolism and resistance to oxidative stress. Mol. Cell 11, 619–633 (2003).

  43. 43.

    McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).

  44. 44.

    Semenova, E. et al. Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence. Proc. Natl Acad. Sci. USA 108, 10098–10103 (2011).

  45. 45.

    Wiedenheft, B. et al. RNA-guided complex from a bacterial immune system enhances target recognition through seed sequence interactions. Proc. Natl Acad. Sci. USA 108, 10092–10097 (2011).

  46. 46.

    Mandegar, M. A. et al. CRISPR interference efficiently induces specific and reversible gene silencing in human iPSCs. Cell Stem Cell 18, 541–553 (2016).

  47. 47.

    Genga, R. M. J. et al. Single-cell RNA-sequencing-based CRISPRi screening resolves molecular drivers of early human endoderm development. Cell Rep. 27, 708–718 (2019).

  48. 48.

    Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

  49. 49.

    Perez, A. R. et al. GuideScan software for improved single and paired CRISPR guide RNA design. Nat. Biotechnol. 35, 347–349 (2017).

  50. 50.

    Bae, S., Park, J. & Kim, J.-S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473–1475 (2014).

  51. 51.

    Bassik, M. C. et al. Rapid creation and quantitative monitoring of high coverage shRNA libraries. Nat. Methods 6, 443–445 (2009).

  52. 52.

    Norman, T. M. et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365, 786–793 (2019).

  53. 53.

    Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

Download references

Acknowledgements

We thank G. Ow and E. Collisson (University of California, San Francisco) for sharing the mCherry-marked sgRNA expression vector, R. Pak, J. Stern and A. Xu for help with library cloning and sequencing library preparation, B. Adamson for sharing the modified CROP-seq vector, M. Jones, J. Chen, L. Gilbert, J. Replogle and all members of the Weissman laboratory for helpful discussions and E. Chow, D. Bogdanoff and K. Chaung from the UCSF Center for Advanced Technology for help with sequencing. This work was funded by National Institutes of Health grants F32 GM116331 and K99 GM130964 (both to M.J.), U01 CA168370, U01 CA217882 and RM1 HG009490 (all to J.S.W.) and R35 GM118061 (C.A.G.) and the Innovative Genomics Institute, UC Berkeley (C.A.G.). J.S.W. is a Howard Hughes Medical Institute Investigator. D.A.S. is supported by NSF Graduate Research Fellowship 1650113 and a Moritz–Heyman Discovery Fellowship. R.A.S. is supported by a Fannie and John Hertz Foundation Fellowship and an NSF Graduate Research Fellowship. M.A.H. is a Byers Family Discovery Fellow and is supported by the UCSF Medical Scientist Training Program and the School of Medicine. T.M.N. is a fellow and J.A.H. is the Rebecca Ridley Kry Fellow of the Damon Runyon Cancer Research Foundation (T.M.N., DRG-2211–15; J.A.H., DRG-2262–16).

Author information

M.J. conducted the large-scale growth screen, supervised the constant region and Perturb-seq experiments, implemented the linear machine-learning model, analyzed the large-scale screen and Perturb-seq data, conceived experiments and wrote the manuscript. D.A.S. conducted the GFP and constant-region screens, implemented the deep-learning model, designed and conducted the compact library screens, analyzed data, conceived experiments and wrote the manuscript. R.A.S. designed the constant-region library and conducted a pilot screen, designed and conducted the Perturb-seq experiment, analyzed data, conceived experiments and edited the manuscript. M.A.H. assisted with the large-scale growth screen and, with J.S.H., designed the large-scale library. S.M.S. evaluated modified constant-region activities by RT-qPCR. J.A.H. and T.M.N. assisted with data analysis. C.R.L. assisted with library cloning and screens. C.A.G. supervised the generation of the large-scale library and edited the manuscript. J.S.W. conceived and supervised experiments and wrote the manuscript. All authors provided feedback on the manuscript.

Correspondence to Jonathan S. Weissman.

Ethics declarations

Competing interests

J.S.W., M.J., D.A.S., R.A.S., M.A.H. and T.M.N. have filed patent applications related to CRISPRi/a screening, Perturb-seq and mismatched sgRNAs. J.S.W. consults for and holds equity in KSQ Therapeutics, Maze Therapeutics and Tenaya Therapeutics. J.S.W. is a venture partner at 5AM Ventures and a member of the Amgen Scientific Advisory Board. M.J., M.A.H. and T.M.N. consult for Maze Therapeutics.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Details of the GFP mismatch experiment.

(a) Representative plots illustrating gating strategy to select cells for analysis. (b) Comparison of relative activities obtained from two replicate transductions. Relative activity was defined as the fold-knockdown of each mismatched variant (GFPsgRNA[non-targeting]/GFPsgRNA[variant]) divided by the fold-knockdown of the perfectly-matched sgRNA. The background fluorescence of a GFP strain was subtracted from all GFP values prior to other calculations. n = 57 sgRNAs; r2 = squared Pearson correlation coefficient. (c) KDE plots of GFP distributions 10 days after transducing K562 GFP+ cells with the perfectly-matched sgRNA, a non-targeting sgRNA, and each of the 57 singly-mismatched variants. Fluorescence of GFP K562 cells is shown in gray. Although most GFP distributions are unimodal, some are broadened compared to those with the perfectly matched sgRNA or the negative control sgRNA. This heterogeneity could be a consequence of the random integration of the GFP locus, cell-to-cell differences in expression of the dCas9-KRAB effector in our polyclonal cell line, the amplification of gene expression bursts by long GFP half-lives, or a combination of these factors. Two replicate transductions were evaluated for each sgRNA (see panel b); data from one replicate are shown here.

Supplementary Figure 2 Additional analysis of large-scale mismatched sgRNA screen.

(a, b) Comparison of growth phenotypes (γ) of all sgRNAs derived from replicates of the (a) K562 (n = 119,201 sgRNAs) and (b) Jurkat screens (n = 119,229 sgRNAs). Marginal distributions are normalized to element count in each category. r2 = squared Pearson correlation coefficient for targeting sgRNAs (mismatched and original). (c) Comparison of γ of perfectly matched sgRNAs from the K562 screen in this work and a previously published K562 screen14 (average of two replicate screens). n = 4,830 sgRNAs; r2 = squared Pearson correlation coefficient. (d) Comparison of γ of perfectly matched sgRNAs in K562 and Jurkat cells reveals substantial differences, likely reflecting cell-type specific gene essentiality (average of two replicate screens). n = 4,892 sgRNAs; r2 = squared Pearson correlation coefficient. (e) Comparison of mismatched sgRNA relative activities in K562 and Jurkat cells, classified by the difference in γ of the corresponding original guide. n = 15,103 (left) and 26,409 (right) sgRNAs; r2 = squared Pearson correlation coefficient. (f) Distribution of mismatched sgRNA relative activities for sgRNAs with 1 mismatch (left) or 2 mismatches (right). (g) Distribution of mismatched sgRNA relative activities stratified by sgRNA GC content, grouped by mismatches located in positions –19 to –13 (PAM-distal region), positions –12 to –9 (intermediate region), and positions –8 to –1 (PAM-proximal/seed region). n = 282-7,592 sgRNAs. (h) Distribution of mismatched sgRNA relative activities stratified by the identity of the 2 bases flanking the mismatch, grouped by mismatches located in the three regions as in g. n = 155-2,031 sgRNAs. (i) Distribution of mismatched sgRNA relative activities stratified based on whether or not the invariant first G of the sgRNA (position –20) matches the genome, grouped by mismatches located in the three regions as in g. n = 4,267-11,524 sgRNAs. (j) Comparison of mean CRISPRi relative activities from large-scale screen and cutting frequency determination (CFD) scores27. Values are compared for identical combinations of mismatch type and mismatch position; mean relative activities were calculated by averaging relative activities for all mismatched sgRNAs with a given combination. n = 228 mismatch type/position combinations; r2 = squared Pearson correlation coefficient. (k) Distribution of sgRNA series by number of sgRNAs with intermediate activity (0.1 < relative activity < 0.9), using only sgRNAs with a single mismatch (top) or all mismatched sgRNAs (bottom). Lines in violin plots in panels g, h, i denote distribution quartiles.

Supplementary Figure 3 Additional analysis of modified constant regions.

(a) Comparison of growth phenotypes measured in replicate screens after 4, 6, or 8 days of growth from t0. Data from Day 4 were used for all subsequent analyses. n = 35,830 sgRNAs; r2 = squared Pearson correlation coefficient. (b) Comparison of relative % knockdown (quantified via RT-qPCR) and mean relative growth phenotype for 10 intermediate-activity constant region variants paired with two targeting sequences against DPH2. Data represent the mean of technical triplicates. (c) Relative activities of constant regions paired with all 30 targeting sequences, ranked by the average strength of each constant region and displayed as rolling means with a window size of 50. (d) Distribution of all pairwise correlations of constant region relative activities within and between gene targets. n = 30 and 1,350 for intra-gene and inter-gene comparisons, respectively; indicated p-values are derived from a two-tailed Student’s t-test; dashed lines in violin plots indicate the distribution quartiles. (e) Relative activity of each indicated target sequence:constant region pair vs. the mean relative activity of the respective constant region for all targets. Growth phenotypes (γ) with the unmodified constant region are indicated in the figure legends. Lines represent rolling means of individual data points.

Supplementary Figure 4 Additional details for the neural network.

(a) Graph of the CNN model architecture. (b) Example of 5-fold cross-validation using only the training dataset, further analyzed in the subsequent two panels of this figure. A similar scheme was used to optimize hyperparameters for the CNN model, albeit with 3-fold cross-validation to allow for larger training sets in each split. (c) Model loss, measured as root mean squared error, for training and test data over 30 training epochs. Each line represents one of 5 splits diagrammed in panel b. The final models used for our predictions were trained for 8 epochs, as additional cycles only reduced training loss without significant improvement in validation loss (i.e., the model becomes overfit). (d) Stability of the model with different input data. For each split in panel b, 20 independent CNN models were trained for 8 epochs on the same data. The root mean squared error on the test set for each model is plotted as a blue dot. Box plots indicate the interquartile range of each distribution. (e) Model loss for the final CNN ensemble. Each line represents one of 20 models trained for 8 epochs on the entire training set. (f) Explained variance of validation sgRNA relative activities for each individual model (black), and for the mean prediction of all 20 models (red). n = 5,241 sgRNAs evaluated for each model; r2 = squared Pearson correlation coefficient. (g) Validation error stratified by mismatch position. (h) Validation error stratified by mismatch type. (i) Comparison of CNN prediction error (difference between measured and predicted activity) and off-target specificity score for all sgRNAs in the validation set. Off-target specificity scores were calculated using CRISPRi relative activities as described in the Methods. n = 5,241 sgRNAs; r = Pearson correlation coefficient. (j) Partitioning of sgRNAs into bins based on relative activity in the large-scale K562 screen. (k) Confusion matrix showing the fraction of sgRNAs in each actual (measured) activity bin that were assigned to each predicted bin by the CNN model. Each row sums to 1. (l) Statistics indicating the requisite number of randomly sampled sgRNAs from each activity bin to have a given probability of selecting at least one sgRNA with true activity in that bin. Simulations are based on the probabilities outlined in the confusion matrix (panel e). (m) Similar to panel l, with random sampling from bin 2 (relative activity 0.37-0.63) to yield at least one sgRNA with intermediate activity (0.1-0.9). We tested several sampling schemes (e.g. drawing from bin 1, 2, 3, or combinations of these), and found this method to empirically give the highest success rate for selecting sgRNAs with intermediate activities.

Supplementary Figure 5 Additional details for the linear model.

(a) Comparison of measured relative growth phenotypes from the large-scale screen and predicted activities assigned by the elastic net linear model. Marginal histograms show distributions of relative activities along the corresponding axes. n = 5,241 sgRNAs; r2 = squared Pearson correlation coefficient. (b) Comparison of measured relative activity (relative knockdown) in the GFP experiment and predicted relative sgRNA activity. n = 57 sgRNAs; r2 = squared Pearson correlation coefficient. (c) Comparison of predicted relative activities from the linear model and the neural network, based on the validation set of singly-mismatched sgRNAs. n = 5,241 sgRNAs; r2 = squared Pearson correlation coefficient. (d) Regression coefficients assigned to each feature in the linear model. 228 features (gray, blue) describe the position and type of mismatch; 42 features (gold) carry other information about the sgRNA and genomic context surrounding the protospacer. These features are detailed in subsequent panels. (e) Linear coefficients for features of the sgRNA and targeted locus. TSS; transcription start site. (f) Linear coefficients for features covering positions in the distal, intermediate, and seed regions of the targeting sequence (highlighted blue in panel d).

Supplementary Figure 6 Additional analysis of the compact allelic series screen.

(a) Composition of the compact library, in terms of previously measured relative activities in the large-scale screen (dark purple), or predicted relative activities assigned by the CNN model ensemble (light purple). Perfectly matched sgRNAs, which by definition have relative activities of 1.0, comprise 20% of the library but were not included in the histogram. (b) Distribution of mismatch positions and types for singly-mismatched sgRNAs in the compact library, for previously measured (dark purple) and CNN-imputed (light purple) sgRNAs. (c) Heatmap showing the distribution of mutated positions for doubly-mismatched sgRNAs in the compact library. (d) Comparison of growth phenotypes measured in each K562 replicate screen 4- and 7-days post-transduction. Data from Day 7 were used for all subsequent analyses. n = 25,518 sgRNAs; r2 = squared Pearson correlation coefficient. (e) Comparison of growth phenotypes measured in each HeLa replicate screen 6- and 8-days post-transduction. Data from Day 8 was used for all subsequent analyses. n = 25,518 sgRNAs; r2 = squared Pearson correlation coefficient. (f) Comparison of growth phenotypes of original (perfectly matched) sgRNAs in HeLa and K562 cells (γ, expressed as the average of two replicate screens). n = 4,810 sgRNAs; r2 = squared Pearson correlation coefficient. (g) Measured vs. predicted relative activities of CNN-imputed sgRNAs in K562 cells (left) and HeLa cells (right). A small number of points beyond the y-axis limits were excluded to more clearly display the bulk of the distribution. n = 6,147 sgRNAs; r2 = squared Pearson correlation coefficient. (h) Comparison of sgRNA composition and model error for the large-scale and compact libraries. The CNN-imputed guides had substantially higher predicted activities than those for the large-scale validation set; higher predicted activity was generally associated with higher model error for the validation (red) and imputed (blue) sgRNA sets, consistent with the discrepancy in model performance on each set. (i) Distribution of the number of intermediate-activity mismatched sgRNAs targeting each gene in the compact library. The number of genes with at least 2 intermediate activity sgRNAs is indicated above each histogram; sgRNA activities were quantified for 1907 and 1442 genes in K562 and HeLa cells, respectively. Note that here activities are aggregated by gene as opposed to by series, as was done in Supplementary Fig. 2i. (j) Comparison of phenotypes measured in replicate screens after 12 days of growth in the drug screen. n = 25,518 sgRNAs; r2 = squared Pearson correlation coefficient. (k) Comparison of vehicle- (γ) and lovastatin-treatment (τ) growth phenotypes for all sgRNAs in the compact library. Knockdown of HMG-CoA reductase (HMGCR) greatly sensitizes cells to lovastatin, compared to knockdown of other genes such as tubulin (TUBB). n = 25,518 sgRNAs.

Supplementary Figure 7 Summary of the Perturb-seq experiment.

(a) Schematic of Perturb-seq strategy to capture single-cell transcriptomes with matched sgRNA identities. (b) Summary of sequencing and perturbation assignment statistics. (c) Distribution of number of cells captured per perturbation. Median: 122 cells per perturbation; 5th to 95th percentile: 66 – 277 cells per perturbation. n = 19,587 cells. (d, e) Comparison of (d) growth phenotypes (γ) and (e) relative activities measured in the large-scale mismatched sgRNA screen and in the Perturb-seq experiment. Differences are likely due to the different timescales and the different vectors used. n = 128 sgRNAs; r2 = squared Pearson correlation coefficient.

Supplementary Figure 8 Target gene expression in cells with indicated perturbations.

(a) Distribution of target gene expression levels, quantified as target gene UMI count normalized to total UMI count per cell. Cell numbers for each perturbation are listed in Supplementary Table 14. Box plots inside violin plots denote quartile ranges (box), median (center mark), and 1.5 × interquartile range (whiskers). (b) Mean target gene expression levels for target genes with low basal expression levels.

Supplementary Figure 9 Target gene expression in cells with indicated perturbations (different quantification).

Expression is quantified as raw target gene UMI count. Cell numbers for each perturbation are listed in Supplementary Table 14. Box plots inside violin plots denote quartile ranges (box), median (center mark), and 1.5 × interquartile range (whiskers).

Supplementary Figure 10 Phenotypes resulting from gene titration.

(a) Distributions of total UMI counts in cells with the perfectly matched sgRNA against the indicated genes. Cell numbers for each perturbation are listed in Supplementary Table 14. Box plots inside violin plots denote quartile ranges (box), median (center mark), and 1.5 × interquartile range (whiskers). (b) Left: Comparison of median UMI count per cell and relative growth phenotype in cells with sgRNAs targeting BCR, GATA1, or POLR2H or control cells. Right: Comparison of median UMI count per cell and target gene expression. (c) Cell cycle scores (Methods) for populations of cells with individual sgRNAs. (d) Fraction of cells in indicated cell cycle phase for populations with a negative control sgRNA or sgRNAs targeting CAD. (e) Magnitudes of gene expression change of populations with perfectly matched sgRNAs targeting indicated genes. Magnitude of gene expression change is calculated as sum of z-scores of genes differentially expressed in the series (FDR-corrected p < 0.05 with any sgRNA in the series, two-sided Kolmogorov-Smirnov test, Methods), with z-scores of each gene in individual cells signed by the average direction of change in the population. Cell numbers and violin plots are as in a. (f) Comparison of magnitude of gene expression change to growth phenotype (γ) for all perfectly matched sgRNAs in the experiment. (g) Comparison of relative growth phenotype and magnitude of gene expression change for all individual sgRNAs, as in Fig. 6f but without increased transparency for individual series. (h) Comparison of magnitude of gene expression and target gene knockdown, as in Fig. 6g but without increased transparency for individual series. (i) Comparison of relative growth phenotype and target gene expression, as in Fig. 6f. (j) Comparison of measured growth phenotype (γ, not normalized to strongest sgRNA) and target gene expression, as in Fig. 6f.

Supplementary Figure 11 Diverse phenotypes resulting from essential gene depletion.

(a) Clustered correlation heatmap of perturbations. Gene expression profiles for genes with mean UMI count > 0.25 in the entire population were z-normalized to expression values in cells with negative control sgRNAs and then averaged for populations with the same sgRNA. Crosswise Pearson correlations of all averaged transcriptomes were clustered by the Ward variance minimization algorithm implemented in scipy. Cell numbers for each perturbation are listed in Supplementary Table 14. (b) UMAP projection, distribution of cells with indicated sgRNAs, target gene expression (rolling mean over 50 cells), and magnitudes of transcriptional changes for all differentially expressed genes and selected ISR regulon genes (rolling mean over 50 cells) for cells with knockdown of ATP5E or control cells. n = 2,781 cells total (negative control: 2,084 cells; ATP5E (0.070): 101 cells; ATP5E (0.554) 136 cells; ATP5E (0.914): 137 cells; ATP5E (1.000) 175 cells; ATP5E (1.185) 148 cells). See Methods for details.

Supplementary information

Supplementary Figures

Supplementary Figures 1–11

Reporting Summary

Supplementary Tables

Supplementary Tables 1–17

Supplementary File

IPython notebook detailing the initialization of a convolutional neural network to predict mismatched sgRNA activities.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jost, M., Santos, D.A., Saunders, R.A. et al. Titrating gene expression using libraries of systematically attenuated CRISPR guide RNAs. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-019-0387-5

Download citation