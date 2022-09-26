The current genome editing tools, such as CRISPR–Cas9, have proven to be robust and efficient tools for many sequence manipulations. They have been extensively used for mutating specific genomic loci in single-gene studies1 as well as genome-wide screens2,3,4. However, resolution of the CRISPR–Cas9 editing is limited by the suitable protospacer adjacent motif (PAM) sequences found in close proximity of the region of interest. Homology-directed recombination (HDR)-mediated precision editing can be used to introduce genetic alterations exactly at the intended loci, but this method suffers from strong DNA damage response, low efficiency and incompatibility with pooled CRISPR screening approaches. Because of the low efficiency of precision genome editing, pooled screens commonly use lentiviral introduction of libraries of guide RNAs to cell lines that express either Cas9 nuclease alone that generates a series of insertion and deletion alleles or nuclease-dead Cas9 fused to transcriptional repressor (CRISPRi) or activator (CRISPRa) domains5,6,7. These methods do not have single-base or single-allele resolution, and their precision is limited because they use an indirect measure, inferring the perturbation from the presence of a guide sequence integrated into the cells at a (pseudo)random genomic position.

Furthermore, interpreting the functional consequence of targeted Cas9-induced mutations is confounded by the DNA damage introduced by Cas9 and the off-target effects of the Cas9 nuclease8. In particular, double-strand breaks (DSBs) at on-target or off-target loci cause DNA damage and genomic instability, resulting in paused cell cycle or apoptosis9,10,11. These problems are particularly acute in analysis of small intergenic features, such as transcription factor (TF) binding sites. This is because non-coding sequence is commonly repetitive, and single guide RNAs (sgRNAs) targeting small binding motifs cannot be selected from a large number of possible sequences predicted to have the same effect. Here we describe a competitive precision genome editing (CGE) approach using CRISPR–Cas9 genome editing at precise loci to accurately analyze the effect of mutations on cellular properties and molecular functions, such as fitness, TF binding and mRNA expression. The experimental design in the CGE approach mitigates the confounding factors associated with CRISPR experiments, such as the hampering effect of double-strand DNA break itself on cell proliferation, enabling dissection of the effect of individual sequence features on cellular fitness. Here, we use the CGE method for dissecting the transcriptional network downstream of the master regulatory oncogene MYC.

MYC is a basic helix-loop-helix (bHLH) TF that forms a heterodimer with another bHLH protein, MAX, and regulates a large set of target genes by binding to regulatory elements containing E-box (CACGTG) motifs12,13,14. MYC is indispensable for embryonic development15, but, in normal cells, its expression is tightly controlled. The importance of tight regulation of MYC activity is highlighted by the fact that it is one of the most frequently deregulated oncogenes across multiple human cancer types16. MYC regulates major pathways promoting cell growth and proliferation, such as ribosome biogenesis and nucleotide biosynthesis17. However, owing to the large number of MYC targets, identifying direct transcriptional targets of MYC has been challenging. It has been proposed that MYC, instead of being a regulator of a particular transcriptional programs, is a universal amplifier of gene expression that increases transcriptional output at all active promoters18,19. Conversely, it has been shown that MYC can selectively regulate specific sets of genes, including those involved in metabolism and assembly of the ribosome20,21,22. Nevertheless, despite its well-known phenotypic effects on cellular growth and proliferation, the precise MYC target genes accounting for its oncogenic activity are still elusive. We reasoned that the most effective way to dissect the gene regulatory network downstream of MYC would be to individually assess the role of each target gene by mutating the MYC binding sites at its regulatory regions, which we have done here using the CGE method.

The CGE method uses CRISPR–Cas9 technology combined with a library of HDR templates with sequence tags enabling lineage tracing of the targeted cell populations. The HDR templates harbor two types of mutations: experimental variants targeting a genomic feature of interest and silent or near-silent mutations that introduce variable sequence tags (Fig. 1a). One of the key design features of the CGE method is the use of at least two experimental variants. One of them (control) reconstitutes the wild-type sequence of the region of interest by harboring the original genomic sequence, whereas the other replaces it with desired mutated sequence, such as non-functional TF binding site (Fig. 1a). In addition to the experimental variants, each individual HDR template molecule has variable sequence tag(s) flanking the sequence of interest serving as a genetic barcode that can be detected from the Illumina sequencing reads of the targeted locus (Fig. 1a). Inclusion of a large set of different sequence tags generates a large number of internal replicate lineages and lineage pools within each assay. As most cells remain unedited, the lineages are expected to grow largely independently of each other, increasing the statistical power of the method. Inclusion of the tags also allows precise counting of the editing events and enables exclusion of the possibility that the tags themselves, and not the intended mutations, cause the observed phenotype. Pairwise analysis of the cell lineages harboring the same sequence tags, in turn, enables direct measurement of the effect of the targeted mutation.

Fig. 1: Strategy of the CGE method to lineage-trace cells with distinct genome editing events using sequence tags with silent or near-silent mutations. a, The CGE method uses a library of HDR templates with two experimental variants: original genomic sequence (blue) and desired mutation (orange). In addition, the HDR templates harbor sequence tags that can be identified by Illumina sequencing of the targeted locus, enabling lineage tracing of the edited clones and creating a large number of internal replicates in each experiment. The sequence tags are generated by mutating nucleotides flanking the region of interest with the probability of 24%, a strategy that typically introduces 2–3 mutated nucleotides (indicated with red diamonds; Extended Data Fig. 1), leaving most of the flanking sequence intact, as demonstrated by the position weight matrices. b, Experimental strategy using a mixture of HDR template libraries harboring the original and mutated sequences for the same target. The abundance of each HDR template in the cell population is analyzed from the sequence tags after different assays and compared to respective baseline: cellular fitness (gDNA at day 8/day 2), TF binding (chromatin-immunoprecipitated DNA/input DNA) and mRNA expression (mRNA abundance/respective gDNA). c, The number of possible sequence variations with zero (n = 1), one (n = 30), two (n = 405) and three (n = 3,240) flanking mutations when the sequence tags are created by mutating ten nucleotides with the probability of 24% and their abundance in the HDR template library analyzed from read counts in ChIP input sample of the edited SHMT2 E-box locus. The box plots indicate the median read count with upper and lower quartiles, and the whiskers extend to 1.5 times the interquartile range. The number of sequence tags recovered in each experiment is shown in Supplementary Table 3. d, The effect of E-box mutation at the RPL23 gene promoter on fitness of HAP1 cells shown by read count ratios for mutated/original sequences for each cell lineage pair harboring identical sequence tags with one flanking mutation (see also Extended Data Figs. 1b and 2b). Of note, the sequence tags with two flanking mutations are used in Fig. 2 for more robust analysis (Methods). Full size image

In the CGE experiment, DNA samples from cells edited with either mutant or control sequence are collected at two or more timepoints (early and late), and the cell lineages with particular editing event can be followed before and after subjecting the cells to selection pressure, such as competitive growth in culture, after which cellular fitness can be analyzed (Fig. 1b). In addition, the CGE method can be used for measuring the effect of defined mutations on TF binding to the target locus and on the expression levels of mRNA by comparing the chromatin-immunoprecipitated DNA to input DNA or mRNA levels against respective genomic DNA (gDNA). Because the sequence tags are present in both repair templates, this experimental design allows precise comparison of the mutated versus control sequence by excluding the non-edited wild-type sequences from the analysis. Sequencing reads will then be assigned to the distinct editing events based on their sequence tags, and the ratio between mutated and control sequences for each tag are determined at each experimental condition (such as both timepoints), resulting in dozens of internal replicate measurements for each editing event within a single assay (Fig. 1a). Thus, statistical power to detect differences between the conditions is very high. The experiment is a single-well assay in which the repair templates harboring both experimental variants (mutant and control) are transfected to cells within one culture well, and the genomic perturbation is compared directly to control in the same cell population. This eliminates the experimental bias and variation originating from transfection/transduction and Cas9-introduced DSBs and variation caused by differences in culture and experimental conditions between wells. Thus, the CGE method is a sensitive assay with lower risk for systematic errors and fewer confounding variables compared to replicate experiments performed in separate wells.

To preserve potentially functional flanks of the sequence of interest, it is important that the sequence tags are introduced using silent or near-silent mutations. For coding regions, this can be accomplished by introducing synonymous mutations of codons and avoiding splice junctions. Because less is known about functional elements within non-coding regions, we decided to use a diverse library that largely conserves the wild-type sequence, introducing only one or few point mutations per cell within the five nucleotides flanking the sequence of interest on both sides, a region wider than a typical TF binding site (~10 base pairs (bp)). In our case, each of the ten positions within the flanking sequence was mutated with probability of 24%, thus keeping most of the positions intact (Fig. 1a) but introducing typically (in ~53% of the sequences) 2–3 mutations per repair oligo (Extended Data Fig. 1a). This mutation strategy generates 30 distinct sequence tags whose sequence differs from the native sequence by exactly one nucleotide (Extended Data Fig. 1b), 405 distinct sequences with 2-nucleotide (nt) difference to the native sequence and 3,240 distinct sequences with three mutations (Fig. 1c and Extended Data Fig. 1c). In the oligo synthesis for HDR templates, the probability for any individual sequence tag with one mutation is higher than for tag with two or three mutations, which is reflected in the data with single-mutation tags having higher read counts than double and triple mutants (Fig. 1c), consistent with the fact that single-mutation sequence tags are present in the original mixture of synthesized oligos in more copies than double and triple mutants. Control experiments also indicated that the overall base distribution of the flanking mutations at day 2 was fairly uniform (Extended Data Fig. 2a). After assigning the read counts for each cell lineage with a unique sequence tag and distinct experimental mutation (mutated or native sequence of interest) at the two experimental timepoints, a pairwise analysis for the cell lineages harboring identical sequence tags can be performed by calculating the ratio of mutated-to-native sequences for each sequence tag pair. This mitigates the potential effect of the flanking mutation on the measured phenotype and enables robust and accurate measurement for the effect of the mutation on cellular fitness for each cell lineage separately (Fig. 1d and Extended Data Fig. 2b).

To validate our CGE approach in functional studies, we first introduced mutations to the coding regions of genes. To this end, we mutated previously described phosphorylation sites of the CDK1 (cyclin-dependent kinase 1) and the GRB2 (growth factor receptor-binding protein 2) genes. In coding regions, sequence tags were generated by randomizing the degenerate positions of the adjacent codons in the repair template. Phosphorylation sites were abolished by alanine (A) or phenylalanine (F) substitutions of the phosphorylated serine (S), threonine (T) or tyrosine (Y) residues. To mimic phosphorylation, the same amino acids were also mutated to the acidic residues glutamate (E) or aspartate (D), which, in many proteins, can lead to the same effect as phosphorylation of the serine, threonine or tyrosine residues23. In the CGE method, the cell lineages that carry mutations that impair cell proliferation should be underrepresented in the cell population after 1 week of culture compared to cells edited with the original sequence with the same sequence tags. This can be analyzed from gDNA collected at the beginning and at the end of the experiment (Fig. 1b).

The experiments for measuring the effect of phosphorylation sites in the GRB2 protein were carried out in haploid HAP1 and near-haploid chronic myelogenous leukemia KBM-7 cell lines. HAP1 cells are a derivative of KBM-7 that grow adherently, no longer express hematopoietic markers and, in early passage cultures, are haploid for all chromosomes. Haploid and near-haploid cells are particularly useful for mutational screens because only one editing event is sufficient for a full knockout. Previous mutagenesis screen by Blomen et al.24 suggests that the adaptor protein GRB2 that links tyrosine kinase signaling to the RAS-mitogen-activated protein kinase (MAPK) pathway is essential for both KBM-7 and HAP1 cells, but all other components of the BCR/ABL-RAS/MAPK pathway are only essential for KBM-7 but not for HAP1 cells. GRB2 is phosphorylated at Y160 and Y209, with phosphorylated Y160 activating and Y209 inhibiting downstream MAPK signaling25,26. Mutation Y160F to prevent activation of MAPK had no effect in either cell type, whereas the mutations Y160D and Y209F that are predicted to increase MAPK activity decreased proliferation of KBM-7 but not HAP1 cells (Extended Data Fig. 3), consistent with the more important role of RAS/MAPK signaling in KBM-7 compared to HAP1 cells. The decreased fitness observed for KBM-7 cells upon MAPK activation might result, for example, from MAPK-induced senescence27,28. These results indicate that the CGE method can be used to separate essentiality of a gene from essentiality of individual amino acid residues and to identify functionally important phosphorylation events in cells.

To further validate our CGE method, we evaluated the fitness effect of CDK1 regulatory phosphorylation site mutations in human HAP1 cells. CDK1 activation and onset of mitosis requires phosphorylation of T161 in the activation segment and dephosphorylation of T14 and Y15 (ref. 29). The non-phosphorylatable double-mutant T14A/Y15F cells were almost completely lost after 1 week of precision editing (Fig. 2a). These findings are consistent with earlier work reporting that the T14A/Y15F double mutant can be activated prematurely during the cell cycle30, and overexpression of this mutant in cells results in cell death due to mitotic catastrophe31. The effect of the phosphorylation site mutation in the CDK1-activating segment, T161A, was less prominent. Loss of phosphorylation resulted in markedly decreased cell proliferation, whereas T161E phosphomimetic mutation allowed cells to proliferate normally (Fig. 2a). This is consistent with the lack of requirement of regulation of the CDK-activating kinase in human cells32. We also tested the recently reported prime editing method33 for mutating a phosphorylation site and for introducing the sequence tag within the CDK1 coding region. Using this approach, we observed reduced fitness of HAP1 cells as a result of Y15F mutation (Fig. 2a), demonstrating that prime editing can also be used for generating the targeted mutations and sequence tags for our precision genome editing method.

Fig. 2: The effect of mutating TF binding sites and protein phosphorylation sites on cellular fitness determined by lineage tracing of editing events. a, The effect of mutating protein phosphorylation sites of CDK1 on fitness of HAP1 cells. log 2 (day 8/day 2) is shown for each sequence tag pair with read count >5 on day 2 after calculating the ratio of read counts for mutated/original sequences at both timepoints. The CGE method was used for measuring the effect of Y15F mutation on the fitness also after introducing this mutation to HAP1 cells using prime editing33. In a–d, dots represent individual cell lineages harboring a unique barcode—that is, internal replicates for which median (red line) and P value are calculated (two-sided Wilcoxon signed-rank test separately for each experiment, no multiple comparison adjustments; see Supplementary Table 3 for statistical details and Supplementary Table 4 for sequencing depth and editing efficiency). b, The effect of mutating MYC binding motifs (E-box) at promoters of MYC target genes on fitness of HAP1 cells (see also Extended Data Fig. 4)—synonymous mutation in the MYC coding region as a negative control. log 2 (day 8/day 2) is shown for each sequence tag pair with two flanking mutations and read count >50 on day 2 after calculating the ratio of read counts for mutated/original sequences at both timepoints (see also Supplementary Table 5). c, The effect of E-box mutation on MYC occupancy and H3K27ac at promoters of MYC target genes. log 2 (IP sample/input) is shown for each sequence tag pair with two flanking mutations and read count >100 in the input after calculating the ratio of read counts for mutated/original sequences. Genome browser snapshots with ChIP-seq and ATAC-seq tracks demonstrate robust MYC binding to the targeted sites in wild-type HAP1 cells. d, Reproducibility of the CGE method shown for the E-box at the PPAT promoter from two independent experiments (Exp 1 and Exp 2) and from two internal replicate groups (IR1 and IR2) (Methods). e, The key advantages of the CGE method are high statistical power due to internal replicates and mitigation of the confounding effects characteristic of CRISPR–Cas9-based methods by excluding the unedited cells. Full size image

After demonstrating the power of the precision editing approach in studying the functional consequence of individual protein phosphorylation sites, we used it for studying the gene regulatory elements within the non-coding genome. Specifically, a 6-nt MYC binding motif (E-box) was mutated at the promoters of MYC target genes to study their effect on cell proliferation and fitness. If a particular E-box is essential for cell growth, the alleles containing tags and the wild-type sequence should be enriched in the cell population compared to the E-box deleted alleles after 1 week of culture (Fig. 1b,d). Although a large number of genes have been reported as MYC target genes17, the functional consequence for cell proliferation resulting from MYC binding to a promoter of a particular gene has not been previously shown. For the purpose of this study, putative MYC target genes were selected for editing on the basis of functional genomics studies in human colon cancer cell lines and previously published datasets in the HAP1 haploid cell line using the following criteria: (1) the gene should preferably contain only one E-box within the chromatin immunoprecipitation (ChIP)-nexus peak34 (Extended Data Fig. 4); (2) the gene should display robust MYC binding at its promoter within open chromatin on the basis of signal from assay for transposase-accessible chromatin with sequencing (ATAC-seq) and clear change in expression upon MYC silencing in colon cancer cells34,35 (Extended Data Fig. 4); and (3) the gene must be essential in HAP1 cells, reported by both publications24,36. Gene essentiality was used as a selection criterion because it is likely that fitness effects can be found for regulatory or epigenetic elements associated with essential genes. It should be noted, however, that individual binding motif mutations are likely to cause less severe phenotypes than loss of entire genes, as single binding motifs may contribute only partially to gene expression or not be required for expression at all. Thus, CGE targeting of binding motifs does not address the essentiality of the target genes per se but can be used for identifying critical regulatory or epigenetic features controlling the function of these genes.

The CGE experiments for testing the effect of E-box mutations were carried out in HAP1 cells using the original E-box sequence and a non-functional TATTTA sequence as the experimental variants and the flanking near-silent mutations as the sequence tags (Extended Data Fig. 1). For the different E-box targets, 7–42% of the sequencing reads matched to the mutation patterns expected from the HDR-mediated editing (Supplementary Table 4). The cell lineages harboring either the original or mutated sequence with exactly two flanking mutations were analyzed at day 2 and day 8 (Methods). Targeted mutation of the E-box sequence to a non-functional TATTTA at the promoters of four MYC target genes—RPL23 (ribosomal protein L23), HK2 (hexokinase 2), PPAT (phosphoribosyl pyrophosphate amidotransferase) and MDN1 (midasin AAA ATPase 1)—resulted in reduced cell growth as measured from the read counts for the sequence tags with two mutations at day 8 as compared to day 2 (Fig. 2b). However, there were E-boxes at promoters of MYC target genes that can be mutated to non-functional sequence without affecting cell proliferation, such as SHMT2 (serine hydroxymethyltransferase 2) and PAICS (phosphoribosylaminoimidazole carboxylase and phosphoribosylaminoimidazolesuccinocarboxamide synthase) (Fig. 2b), demonstrating the strength of this approach in dissecting the contribution of each individual TF binding site to cell proliferation. Furthermore, the CGE method can robustly measure the effect of each E-box on cellular fitness also for genes that harbor several of them within their regulatory region, as demonstrated for the MDN1 gene. Out of the two E-boxes within the MDN1 promoter, mutation of the E-box closer to the transcription start site (TSS) (TSS +32) had an effect on cell proliferation (Fig. 2b), whereas the mutation of the E-box farther away (TSS −151) had no effect (Extended Data Fig. 5), despite MYC binding detected at both of these sites in HAP1 cells as well as using ChIP-nexus in colon cancer cells34 (Extended Data Fig. 4).

Because the competitive precision genome editing method showed clear effects on cell proliferation resulting from a mutation of a single MYC binding motif, we set to analyze the direct effects of E-box mutation on MYC binding to the promoter and activation of the promoter as measured by an increase in the active chromatin mark histone 3 lysine 27 acetylation (H3K27ac). For this, we performed ChIP using anti-MYC and anti-H3K27ac antibodies from the HAP1 cells after precision editing. To quantify the editing events, each targeted locus was amplified using polymerase chain reaction (PCR), and the amplicons were Illumina sequenced. We detected fewer antibody-enriched sequences with TATTTA-mutated sequence compared to CACGTG original sequence, demonstrating less MYC binding to the mutated sequences at RPL23, MDN1 and SHMT2 E-boxes, as opposed to the input sample with equal ratios of TATTTA and CACGTG (Fig. 2c). We also observed decrease in H3K27ac at TATTTA-mutated RPL23 and MDN1 E-boxes (Fig. 2c). The markedly lower MYC binding and lower level of activating chromatin mark at these loci indicates that these E-box motifs are biologically active and may contribute to the MYC-dependent expression of the respective genes. However, there were no changes in the level of H3K27ac at the SHMT2 locus, consistent with the observation that mutation of this E-box had no effect on cell proliferation (Fig. 2b,c). To further test the applicability of the CGE method for studying precise mutations in diploid cells, we performed ChIP using anti-MYC and anti-H3K27ac antibodies after precision editing of the MDN1 locus in HCT116 colon cancer cells. In agreement with the results from HAP1 cells, we observed less MYC binding and decrease in H3K27ac at alleles harboring TATTTA instead of the native E-box sequence (Extended Data Fig. 6). In conclusion, we identified here several genes that are directly regulated by MYC and demonstrate that mutation of a single MYC binding motif is sufficient for reducing cellular fitness.

The large number of individual cell lineages analyzed within one experiment gives the CGE method a high statistical power for measuring phenotypic effects of specific mutations, as shown here for protein phosphorylation sites and MYC binding sites. The sequence tags allow following the growth of cell lineages independently, because the measurement of abundance of each lineage is not dependent on the others within the same culture. The internal replicates also allow splitting the data to internal replicate groups for further statistical analyses (see also ref. 37). To demonstrate the robustness of the internal replicate analysis, we grouped the internal replicates into two or four separate groups by binning them based on the mutations within their sequence tags (Methods and ref. 37). The internal replicate analysis showed that the medians of the groups are highly similar to each other both for the targeted E-boxes at the PPAT and MDN1 promoters (Fig. 2d and Extended Data Fig. 7a; see also Supplementary Table 3) and for the phosphorylation sites of CDK1 (Extended Data Fig. 7b). To further demonstrate the reproducibility of the results obtained using the CGE method, we performed independent experiments targeting the same E-boxes at the MYC target gene promoters. The results were highly reproducible both for the targets that showed a fitness effect, such as RPL23, HK2 and PPAT, and for the targets that did not, such as PAICS and SHMT2 (Fig. 2d and Extended Data Fig. 7c), indicating the robustness and high statistical power of the CGE method. The replicate experiments also enable studying whether the mutations that generate the sequence tags are silent or near-silent as intended. To this end, the read count ratios between day 8 and day 2 were plotted for the sequence tags that were present in both replicate experiments both for cell lineages that were edited with the original E-box sequence only (Extended Data Fig. 8a) and for the pairs of cell lineages edited with mutant and original sequences harboring identical sequence tags (Extended Data Fig. 8b). Overall, there was no correlation in the read count ratios measured from cell lineages with identical sequence tags between the two replicates, and only one of the targets (HK2) showed statistically significant correlation between the replicates (Extended Data Fig. 8a). These results demonstrate that the CGE method enables measuring the effect elicited by each mutation, but that, overall, the flanking mutations did not contribute to the observed fitness effects or the variation between cell lineages in the assay. The variation between the internal replicates is, thus, likely to reflect different growth rates between lineages as well as different numbers of cells that were transfected with each individual tag. Such variation is inherent to cell-based assays, but our method is robust to the variation and able to precisely measure the biological effect of each mutated target, whereas, if the assay were performed without the sequence tags, the true biological effect could be masked by the variation. It should be noted, however, that internal replicates do not capture day-to-day variation of the experiments, which can, for example, arise from small changes in culture conditions or transfection that affect the growth rate of the cell population. To control for such day-to-day variation, separate independent experiments should be performed (Extended Data Fig. 7c).

Here we show a method for precise analysis of the effect of mutations on cellular phenotype by using CRISPR–Cas9 precision editing combined with lineage-tracing sequence tags and employ it for studying the precise effects of individual TF binding sites and post-translational modifications. Previously, next-generation-sequencing-based methods, such as GUIDE-seq38 and Repair-seq39, were developed for assessing the off-target DNA cleavage sites and the repair mechanisms of Cas9-induced DNA breaks, respectively. Moreover, random sequence labels have been used for increasing precision and accuracy of CRISPR screens37 and DSB-independent base editors for improving the predictability of the Cas9-induced genetic variation in the pooled screens40,41. The advantage of our CGE method over these approaches is that both pooled CRISPR screens and high-throughput base editing approaches rely on inferring mutations from the presence of an sgRNA and, thus, require additional validation, whereas the CGE method enables analyzing the mutated loci directly. In a recent saturation mutagenesis screen, a repair template library with single-nucleotide variants (SNVs) targeting the BRCA1 gene was transfected to target cells along with Cas9 and sgRNA, and targeted gDNA and RNA sequencing was performed to quantify SNV abundances42. This method enables distinguishing the edited cells from non-edited ones, providing a powerful method for analyzing the SNVs within coding regions of the target gene studied. Compared to saturation mutagenesis, which is highly effective in analyzing individual genes, CGE is more suitable for dissecting genetic networks, as it can be used to target a large number of genomic loci. Furthermore, in CGE, the genetic barcode is generated by silent or near-silent mutations within the coding and non-coding genomic regions. Thus, CGE is more precise and yields more statistical power to test the effect of particular targeted mutations, enabling a precise assessment of the effect of mutations with subtle phenotypic effects, such as critical targets of protein kinases or critical binding sites of TFs. Our approach of using parallel editing of the target loci with two HDR templates in a single cell culture has two key advantages over previously described genome editing assays (Fig. 2e). First, silent or near-silent mutations that generate sequence tags to HDR templates provide means to discard all confounding information from the next-generation sequencing output of the method. Second, direct comparison of the mutated sequence to the reconstituted native sequence mitigates all the detrimental off-target effects as well as enables lineage tracing of edited clones, thus providing statistical power to the analysis. When measuring allele-specific phenotypes, the method also allows the use of diploid cells for analysis of phenotypes, such as TF binding or RNA expression. We have demonstrated here that the CGE method combined with ChIP can be successfully used for measuring the effect of E-box mutation on MYC binding and H3K27ac also in diploid colon cancer cells. Measuring RNA expression requires that the coding region of a gene of interest harbors a genetic barcode that enables linking the expression measurement to the experimental mutation of the TF binding site. The long-range genome editing for concurrent mutation of the coding region and the TF binding site could be achieved, for example, using recently reported dual prime editing strategies (such as refs. 43,44,45). Measuring more complex phenotypes in diploid cells is also possible, but it requires either prior deletion of one allele from the targeted locus or dilution of the two repair templates by a template that inactivates the wild-type allele in such a way that most cells carry either two inactive alleles or one inactive allele and one targeted allele. This will be easier when targeting coding regions, as failure of targeted repair commonly leads to inactivation of the target gene due to generation of frameshift or deletion alleles by non-homologous end-joining (NHEJ).

The CGE method is particularly useful for studying the effect of small sequence features, such as individual TF binding sites and post-translational modifications, as shown here for MYC binding motifs and phosphorylation sites in the CDK1 and GRB2 proteins, because precision editing is not dependent on finding a highly specific guide sequence precisely overlapping the feature of interest. In addition, the phenotypic impact of such mutations is often milder than that of complete loss of function of the upstream TF, kinase or phosphorylated target. Because the experimental design of the CGE method mitigates the phenotypic effects associated with the genome editing process itself, the method is sensitive enough to detect the subtle effects resulting from mutating TF binding sites and post-translational modifications. Here we identify several MYC binding motifs at the promoters of its target genes that are critical for cellular fitness. The critical target genes represent the major pathways previously associated with MYC function17: (1) ribosome biogenesis, including RPL23, a component of 60S large ribosomal subunit, and MDN1, a nuclear chaperone required for maturation and nuclear export of pre-60S ribosome subunit46; (2) cellular metabolism, as shown for glycolytic enzyme HK2; as well as (3) nucleotide synthesis, as shown for PPAT involved in de novo purine biosynthesis. However, mutation of the E-box at the SHMT2 promoter had no effect on cellular fitness in HAP1 cells, although SHMT2 has been previously shown to partially rescue the growth defects of Myc-null fibroblast cells47. These results highlight the importance of precise quantitative studies for determining the functional consequence of transcriptional regulatory events on cellular phenotype.

In summary, we report here an advanced method for measuring the phenotypic effects of precise targeted mutations. The method allows controlling for the effect of DNA damage, which is the major confounder in CRISPR-based methods. We also demonstrate the power of the technology by robustly detecting small fitness effects of individual TF binding motifs and single amino acid substitutions. The method is widely applicable and extends the utility of CRISPR–Cas9-mediated genome editing to address important biological questions that have been difficult to address using existing technologies. Using this technology, we identified several target genes whose regulation via canonical E-boxes is responsible for the growth-promoting activity of the universal oncogene MYC.