High-resolution characterization of gene function using single-cell CRISPR tiling screen

Identification of novel functional domains and characterization of detailed regulatory mechanisms in cancer-driving genes is critical for advanced cancer therapy. To date, CRISPR gene editing has primarily been applied to defining the role of individual genes. Recently, high-density mutagenesis via CRISPR tiling of gene-coding exons has been demonstrated to identify functional regions in genes. Furthermore, breakthroughs in combining CRISPR library screens with single-cell droplet RNA sequencing (sc-RNAseq) platforms have revealed the capacity to monitor gene expression changes upon genetic perturbations at single-cell resolution. Here, we present “sc-Tiling,” which integrates a CRISPR gene-tiling screen with single-cell transcriptomic and protein structural analyses. Distinct from other reported single-cell CRISPR screens focused on observing gene function and gene-to-gene/enhancer-to-gene regulation, sc-Tiling enables the capacity to identify regulatory mechanisms within a gene-coding region that dictate gene activity and therapeutic response.

functional R domain in DOT1L that coordinated with KMT domain to regulate histone modification. It is interesting that the R1 and R2 elements in the R domain exhibit opposite responses during DOT1L-inhibitory treatment, indicating that this self-regulatory R domain dictates the response to DOT1L-targeted therapy.
Overall, this is a well-organized study with attractive novel points. This is the first report of an application of tiling screen at a single cell resolution. The discovery of the opposite response of the R1 and R2 is interesting too. Meanwhile, I have the following questions and suggestions that might be helpful for the refinement of the manuscript.
1. What are the pros and cons of sc-tiling compared to the survival-based pooled tiling screens? This is a critical question that needs to be addressed. A comparison of the sc-tiling result in Fig. 2A and the survival screen in Fig. 2B can be useful. I'd suggest performing a PCA analysis on the correlation matrix of Fig. 2A and to compare the principal component to the viability profile in Fig.  2B. This will tell us which amino acids are functionally associated with the KMT core at the transcriptome level, in comparison to those at the cell phenotypical level. More discussion on the pros and cons of the two alternative methods is preferred too. 2. In the sc-tiling screen, samples were collected at day 3 after transduction. Why this timepoint is chosen? Is this because DOT1L is very essential in MLL-AF9 cells? I wonder if day 3 is sufficient for some relatively inefficient sgRNAs to achieve genome editing. The authors should provide explanation and address the potential limitation. 3. As shown in Fig. 2D, the H3K79me2 level is decreased in sg-KMT and sg-R1 cells without DOT1L-inhibitory treatment. Does the H3K79me2 level alter upon sg-R2 or Q584P (Fig. 3I) without DOT1L-inhibitory treatment? 4. In Figures 2C and 2D, sg-Luc was used as the control. This doesn't introduce cleavage to the DNA. The sgRNAs targeting non-essential genomic regions (i.e. AAVS safe-harbor region) would be better controls when testing relative proliferation and H3K79me2. 5. To explore the function of the R2 element, several R2 variants of DOT1L were over-expressed in wild-type MLL-AF9 cells in which endogenous DOT1L is expressed. For a fair comparison, it is preferred to express exogenous wt and mutant DOT1L in cells without endogenous DOT1L. This can be achieved by designing exogenous wt and mutant DOT1L harboring synonymous mutations at the KMT core, followed by KO with an gRNA that has perfect match to the endogenous sequence but has mismatches to the exogenous sequences (ref. PMID 31586052). minor comment: 1. In Britt Adamson's direct-capture Perturb-seq paper (PMID: 32231336), they mentioned that integration of CS1 at 3' end of sgRNA could compromise CRISPR activity, therefore they didn't recommend this design. In this manuscript, CS1 was inserted to the 3' end of sgRNA and is shown to work well according to RFP inactivation assay and sc-tiling data. Is there any explanation? 2. Some typos in the manuscript, for example in Figure S1, the primer should be CS1_R01.
We thank the reviewers for their thoughtful and constructive critiques of the manuscript. We have added new data/analyses to address comments, and we think the manuscript is significantly improved. We also modified the original "Introductory Paragraph" to become "Abstract" (revised page 2) and include an "Introduction" (revised page 2-3) to adhere to the Nature Communication format. Below is our point-by-point response (blue) to the reviewers' comments.

REVIEWER COMMENTS
Reviewer #1 (Remarks to the Author): General comment-summary: Yang and colleagues combine CRISPR/Cas9-mediated editing with single cell RNA sequencing (that they call "sc-tiling") to dissect the role of the H3K79 methyltransferase DOT1L in leukemic cells. Hereby they established a library of >600sgRNAs (3'fused to a capture sequence) targeting the Dot1L ORF of 4.6kb, delivered by viral transduction into MLL-AF9 fusion gene immortalized mouse hematopoietic cells ( Fig.1a-b). Sc-RNA sequencing revealed particular regions within the Dot1L ORF which resulted in differentiation or maintained an immature phenotype ( Fig.1c-f). They found several regions in and outside the KMT domain that seemed essential to maintain the undifferentiated state ( Fig.1g-h). They identified 3 critical regions in the N-terminal Dot1L ORF, including "R" between KMT and the AF9 binding domain (Fig.2). They also used the sc-tiling approach to identify regions that would modify cellular response to a well-characterized small molecule DOT1L-KMT inhibitor (EPZ5676).
Hereby they observed that while the N-terminal part of R ("R1") enhanced, disrupting of the C-terminal part of R ("R2") impaired inhibitor activity. Mutational analysis revealed several critical residues in R2, and structural modeling suggested a regulatory mechanism of KMT activity involving R1 and R2 ( Fig.3a-f). Interestingly, they identified multiple single-nucleotide variants in patient samples in the R2 region of DOT1L which, when expressed in MLL-AF9 expressing cells increased their resistance towards EPZ5676 ( Fig.3h-j). Overall, this is a well-written and interesting paper showing how the combination of CRISPR/Cas9 editing with scRNA sequencing can help to better understand the structure/functional relations of a particular pharmaceutical target and the strategy of inhibition. Many findings that emerge from this technically sophisticated approach seem however of confirmatory nature to the current knowledge except the identification of a particular stretch of the DOT1L ORF that when mutated provides resistance against the EPZ5676. This observation raises interesting questions about the biological relevance of such variants/alterations for cancer/leukemia patients, particularly those that showed limited response to small molecule DOT1L inhibitors in clinical trials. A better characterization of the variants in this particular region would further increase the impact of this paper beyond the high technical achievements.
Specific comments: 1. The identification of a region "R2" that when altered leads to resistance to small molecule DOT1L inhibitors seems novel and interesting. It would be important to provide more information about these rare variants observed in 104/54510 "patient samples". (1) What kind of samples were those? (2) From cancer patients or other diseases? (3) What kind of cancers? (4) Were the variants identified upon de novo diagnosis or after chemotherapy? (5) Can such variants also be acquired and/or selected during chemotherapy of human cancers? (6) Are these variants present in human cancer cell lines? (7) What about normal human beings: do we find DOT1L germline variants in this region?
(1-2) The 54,510 "patient samples" included in this study were from the cBioPortal Therefore, our current analysis from the cBioPortal database does not provide sufficient information to examine whether these variants in the R2 region were associated (acquired/selected) with chemotherapy. (6) We examined the 1,457 human cell line sequencing information from the CCLE database (Cancer Cell Line Encyclopedia; BROAD Institute). A total of 6 variants were observed in the DOT1L R2 region (summarized in the table below; also see revised Supplementary Fig. 13a), including one missense variant G594S detected in RCCFG2 cells (clear cell renal carcinoma). The G594S variant was also observed in the cBioPortal dataset (adrenocortical carcinoma; kidney).
(7) We examined the SNPs in the DOT1L R2 region from the Single Nucleotide Polymorphism Database (dbSNP; NCBI), including genomic information from 1000Genome, ExAC, TOPMed, GnomAD, GoESP. Eight out of 19 missense DOT1L variants discussed in our study were observed in the dbSNP Database as potential germline variants (summarized in the table below; also see revised Supplementary Fig. 13b).
We would like to thank the reviewer's suggestion to investigate these clinically relevant genomics databases. The presence of these variants in normal and diseased human population supports the potential impact of this study on future personalized precision medicine. We have included these in the revised manuscript (page 9 line 9; see also revised Supplementary Fig. 13 and Supplementary Table 1) 2. Fig.1g needs to be better explained for non-experts. from far it appears to have more critical regions, particular in the KMT regions that reach a similar score, than the ones that are described in more detail in the structure in Fig.1h?
We appreciate the reviewer's suggestion to provide a more detailed labeling of the critical regions in Fig. 1g. In the revised Fig. 1g (comparison attached below), we highlighted four functionally essential regions in the KMT core that were discussed in Fig. 1h. These include (1) P133-T139 D1 loop, (2) SAM pocket (key residues shown in Supplementary Fig 6a), (3) R282 loop, and (4) T320-K330 helix.

Fig.2a compare Dot1L ORF regions with impact on gene expression with the "AF9-binding motif" showing "a moderate correlation. However, in the experiments showed in Fig.3a, this region seemed very sensitive in control as well as EPZ5676 treated cells, in fact reaching the lowest NCS? The authors may clarify that in the text.
DOT1L has two reported functions in MLL-r leukemia through (1) KMT core's H3K79 methyltransferase activity to maintain open-accessible chromatin (Ref #18), and (2) AF9-binding motif to recruit the AF9containing "Super Elongation Complex" (SEC; consist ENL, ELL, AF9, and AF4) to support gene transcription (Ref #33,34). While the sgRNAs targeting the AF9-binding motif exerted a similar cell killing effect as targeting the KMT core, sc-Tiling (a transcriptomic-based single-cell CRISPR profiling) observed detectable differences between these two functional elements with a modulate correlation score (Pearson correlation @ 0.75). Conversely, sgRNAs targeting the R1-element exerted similar epigenetic (loss of H3K79me2) and transcriptomic (Pearson correlation > 0.8) profiles as targeting the KMT core, suggesting the collaborative nature between the R1-element and KMT core. The fact that cell killing through targeting the AF9-binding motif did not impair the H3K79me2 level (revised Fig. 2e) supports the utility of sc-Tiling to provide a superior characterization of the functional elements than the traditional survival-based CRISPR gene scan. We appreciate the reviewer's suggestion and have included these notions in the revised manuscript (page 8 lines 10-20).

Reviewer #2 (Remarks to the Author):
In this manuscript, Lu et al integrated a CRISPR-tiling screen with the direct-capture Perturb-seq, named 'sc-tiling', to identify gene functions at a sub-gene resolution and investigate the regulatory mechanisms within gene-coding regions. Using this new approach, they discovered a novel functional R domain in DOT1L that coordinated with KMT domain to regulate histone modification. It is interesting that the R1 and R2 elements in the R domain exhibit opposite responses during DOT1L-inhibitory treatment, indicating that this self-regulatory R domain dictates the response to DOT1L-targeted therapy.
Overall, this is a well-organized study with attractive novel points. This is the first report of an application of tiling screen at a single cell resolution. The discovery of the opposite response of the R1 and R2 is interesting too. Meanwhile, I have the following questions and suggestions that might be helpful for the refinement of the manuscript. Fig.  2A and the survival screen in Fig. 2B can be useful. I'd suggest performing a PCA analysis on the correlation matrix of Fig. 2A and to compare the principal component to the viability profile in Fig. 2B. This will tell us which amino acids are functionally associated with the KMT core at the transcriptome level, in comparison to those at the cell phenotypical level. More discussion on the pros and cons of the two alternative methods is preferred too.

What are the pros and cons of sc-tiling compared to the survival-based pooled tiling screens? This is a critical question that needs to be addressed. A comparison of the sc-tiling result in
We want to thank the excellent suggestions from the reviewer. To our knowledge, DOT1L has two reported functions in MLL-r leukemia through (1) KMT core's H3K79 methyltransferase activity to maintain open-accessible chromatin (Ref #18), and (2) AF9-binding motif to recruit the AF9-containing "Super Elongation Complex" (SEC; consist ENL, ELL, AF9, and AF4) to support gene transcription ( Ref  #33,34). The fact that cell killing through targeting the AF9-binding motif does not impair the H3K79me2 level (revised Fig. 2e) argues the catalytic-independent role of the AF9-binding motif in DOT1L.
Following the reviewer's suggestion, we performed the PCA analysis of the sc-Tiling (data from Fig. 2a) and cross-compared the PC1 score with the survival-based CRISPR scan score (day 12 shown in Fig.  2b) of individual amino acid residues. As shown below (see also revised Fig. 2c), the distribution of KMT core residues (black) overlap with the AF9-binding motif residues (green) when evaluated by the survival CRISPR scan score (x-axis). In contrast, the sc-Tiling PC1 score (y-axis) can distinguish the AF9-binding motif from the KMT core. This combination analysis suggested by the reviewer also revealed the critical amino acid residues located in the center of R1 (E489 -L515; red) that functionally associated with the KMT core at both the transcriptomic level and cellular survival phenotypes, further improved the resolution of the sc-Tiling analysis. Of note, all three sgRNAs for R1 element used in this study (target amino acids 495, 505, and 510; also see Supplementary Fig. 7b) target within the "R1 center". We have updated these notions in the revised manuscript (revised Fig. 2c; page 5 line 26 to page 6 line 5; page 8 lines 10-16).
Based on these new analyses, we envision one major advantage of sc-Tiling is the ability to distinguish underlying mechanisms between the functional elements within a protein. This could be significant as the traditional survival CRISPR scan can only address the overall cellular fitness with limited power to imply the underlying mechanisms of a particular domain. Furthermore, the transcriptomic profiling in sc-Tiling argues the potential of this approach in dissecting functional elements that participate in diverse cellular processes (e.g., metabolism, cell fate decision, tissue homeostasis, etc.) that the end phenotypes might not be the cellular survival or proliferation. We updated these notions in the revised manuscript (page 8 lines 16-20).

In the sc-tiling screen, samples were collected at day 3 after transduction. Why this timepoint is chosen? Is this because DOT1L is very essential in MLL-AF9 cells? I wonder if day 3 is sufficient for some relatively inefficient sgRNAs to achieve genome editing. The authors should provide explanation and address the potential limitation.
We agree with the reviewer's comment that the time point of the sc-Tiling is critical. In our preliminary studies, we monitored the cell viability and the expression level of Hoxa9 (a DOT1L driven gene) in MLL-AF9 cells transduced with a sg-KMT (below; see also revised Supplementary Fig. 2d,e). In these assays, we constantly observed a significant reduction of cell viability upon 6 days of culture (this phenomenon was also observed in the survival CRISPR scan shown in Fig. 2b, i.e., first sgRNA dropout phenotype observed on day 6). We also noted that the expression of Hoxa9 (which serves as a DOT1L functional reporter) was significantly attenuated by day 3, whereas the cells remained largely viable. To avoid the influence of "cell death signature" dominating the transcriptomic analysis, we selected day 3 to perform the sc-Tiling screen. We updated these notions in the revised Supplementary Fig. 2d,e.
We echo the reviewer's concern regarding the genome editing efficiency in CRISPR screens. As it remains technically challenging to investigate the cutting efficiency of all ~600 sgRNAs in the library, we focused on examining the editing efficiency of the 14 DOT1L sgRNAs validated in this study. We collected the genomic DNA from day 3 transduced cells, PCR amplified the CRISPR targeted genomic regions, and submitted for Illumina PE150 sequencing to determine the mutation % of each CRISPR targeted locus (below; see also revised Supplementary Fig. 7c). Our results showed an average of 94.8 ± 3.4% editing efficiency, with 13 out of 14 tested DOT1L sgRNAs exerted higher than 90% editing efficiency by day 3. However, one sgRNA targeting AF9-binding motif (sg_mDot1l_2645_s) achieved only ~85% editing.
We acknowledge the variable cutting efficiency, potential for off-targeting, and the mosaic effect [i.e., generation of random mutations]) of individual sgRNA remain concerns in the CRISPR sc-Tiling. We propose that we could increase the statistical confidence and minimize the noise associated with individual sgRNAs by considering multiple sgRNAs via a local-smoothing strategy. We included these notions in the revised manuscript (page 8 lines 21-25; revised Supplementary Fig. 7c).