Abstract
Tumor genomes often harbor a complex spectrum of single nucleotide alterations and chromosomal rearrangements that can perturb protein function. Prime editing has been applied to install and evaluate genetic variants, but previous approaches have been limited by the variable efficiency of prime editing guide RNAs. Here we present a high-throughput prime editing sensor strategy that couples prime editing guide RNAs with synthetic versions of their cognate target sites to quantitatively assess the functional impact of endogenous genetic variants. We screen over 1,000 endogenous cancer-associated variants of TP53—the most frequently mutated gene in cancer—to identify alleles that impact p53 function in mechanistically diverse ways. We find that certain endogenous TP53 variants, particularly those in the p53 oligomerization domain, display opposite phenotypes in exogenous overexpression systems. Our results emphasize the physiological importance of gene dosage in shaping native protein stoichiometry and protein–protein interactions, and establish a framework for studying genetic variants in their endogenous sequence context at scale.
Similar content being viewed by others
Main
A wide range of human diseases are associated with diverse genetic alterations that may be responsible for initiating, promoting or otherwise modifying the course of a given disease. These alterations can be quite complex; for instance, cancer genomes typically contain a repertoire of single nucleotide variants (SNVs) and large-scale copy number alterations that can impact many genes in different ways depending on the type of alteration, gene function and biological context. While tumor genotype is a well-established determinant of disease initiation, progression and therapy responses, the functional impact conferred by the thousands of unique mutations observed in human tumors remains poorly understood. This presents a major challenge to precision medicine efforts that aim to tailor cancer therapies to patients suffering from cancers harboring specific genetic lesions. Beyond the clinic, understanding the impact that diverse types of mutations have on different residues and protein domains would improve our fundamental understanding of gene and protein function (Fig. 1a).
Until recently, approaches for studying genetic variants have been limited to low-throughput, homology-directed repair (HDR)-based methods or high-throughput, nonphysiological gene overexpression systems1,2,3,4,5,6,7. While powerful, the former approach lacks scalability and generality due to the requirements of HDR and its limitation primarily to actively dividing cells. Gene overexpression systems have fewer requirements and are scalable, but fail to physiologically recapitulate the biology driven by these variants due to the absence of endogenous gene regulation mechanisms, many of which are not known for genes of interest. The recent development of precision genome editing tools, including base editing and prime editing, allows variants to be modeled in their native, endogenous genomic context with increased editing efficiency and theoretically higher throughput8,9,10.
Prime editing10 can be used to generate effectively any type of small mutation, including all SNVs and small insertions and deletions (indels). Prime editors are directed to engineer a mutation of interest by the instructions encoded in a prime editing guide RNA (pegRNA), which contains both a protospacer (the ‘search’ sequence) and a 3′ extension sequence (the ‘replace’ sequence that dictates the mutation installed at the site). The modular search-and-replace ability of prime editing has been leveraged to interrogate endogenous variants in high-throughput methods11,12,13. In these approaches, libraries of pegRNAs are delivered transiently or stably to cells expressing prime editors, and the fitness of variants is assessed by determining the relative distribution of endogenous alleles and/or pegRNAs. While powerful, these approaches have important limitations for screening applications, including reliance on a small number of variant-specific pegRNAs with unknown editing performance, inability to quantitatively assess endogenous genome editing at scale, and potential overrepresentation of undesired indels due to using PE3, a prime editor system that uses an additional guide RNA that nicks the nonedited strand to increase editing efficiency10.
With these challenges in mind, we sought to develop an integrative computational and experimental framework for high-throughput design, screening and deconvolution of pegRNA libraries to interrogate a diverse spectrum of genetic variants. This includes pairing each pegRNA with a variant-specific synthetic ‘sensor’ site14 that recapitulates the native architecture of the endogenous target locus. This sensor-based approach links pegRNA identity to editing outcomes for simultaneous high-throughput quantification of pegRNA editing activity and empirical calibration of screening data.
We chose the p53 transcription factor as a prototype to test this approach for investigating the biological impact of specific genetic variants. Notably, TP53 is the most frequently mutated gene in cancer and exhibits extensive allelic variation, leading to the generation of altered proteins that can produce functionally distinct phenotypes. Whether distinct variants of TP53 (and other genes) encode proteins with differing functional activities that influence cancer phenotypes remains controversial and technically challenging to investigate, particularly at scale. Several studies have used orthogonal cDNA-based exogenous overexpression systems to probe the fitness of p53 variants in human, mouse and yeast systems6,7,15,16. However, given the artificial nature of these screens, which rely on expression of variants at supraphysiological levels, we hypothesized that these strategies could misrepresent one or more phenotypes associated with p53 variants. Artifacts that stem from exogenous overexpression systems could be particularly relevant when studying proteins like p53 because p53 functions as a tetramer whose expression and degradation is tightly controlled by the cell17,18,19. Thus, we reasoned that alterations to the stoichiometric balance of p53 via overexpression could lead to erroneous conclusions about the effects of particular p53 variants, including misclassifying certain variants as noncausal or otherwise benign.
To tackle this question, we generated and screened a library of >28,000 pegRNAs targeting >1,000 TP53 variants observed across >40,000 cancer patients20—the largest set of endogenous TP53 variants studied so far. We included SNVs, insertions and deletions observed in patients, putative neutral silent substitutions as controls and a panel of random indels to increase the functional search space. These experiments identified alleles that impact p53 function in mechanistically diverse ways. We discovered that certain types of endogenous variants, particularly those found in the p53 oligomerization domain (OD), display opposite phenotypes when tested with exogenous overexpression systems. Collectively, these results highlight the physiological importance of gene dosage in shaping native protein stoichiometry and protein–protein interactions, and establish a powerful computational and experimental framework for studying diverse types of genetic variants at scale. To ensure widespread accessibility of this resource for the scientific community, we provide a publicly available Python package, Prime Editing Guide Generator (PEGG) (https://pegg.readthedocs.io/en/latest/), as a tool to generate prime editing sensor libraries.
Results
High-throughput design of prime editing sensor libraries
A principal limitation of using prime editing to systematically investigate genetic variants is the inherent variability in editing efficiency among different pegRNAs10,21,22,23. A number of computational tools for pegRNA design have been developed24,25,26,27,28,29,30,31,32,33, including machine-learning algorithms that can nominate sets of pegRNAs predicted to produce high efficiency edits. However, even pegRNAs generated by these predictive algorithms require extensive experimental validation, and their editing activity is not guaranteed to correlate strongly across different cell types. We hypothesized that coupling pegRNAs with ‘sensors’—artificial copies of their endogenous target sites—would allow us to systematically identify high efficiency pegRNAs while controlling for the confounding effects of variable editing efficiency in a screening context (Fig. 1b).
Synthetic sensor-like target sites have been used previously by our group and others to control for base editing gRNA editing efficiencies while defining the relative fitness of variants in genetic screens14,34. Several studies have applied a similar strategy to both base and prime editing technologies to identify features of efficient gRNAs or pegRNAs and train predictive algorithms21,32,33,35,36,37. However, this approach has yet to be applied for high-throughput phenotypic screening of endogenous genetic variants with prime editing, probably due in part to the lower editing efficiency of prime editing relative to base editing. We reasoned that a sensor-based prime editing screening approach could be powerful to discriminate bona fide endogenous variants from undesired editing outcomes that enrich or deplete in a screen. Moreover, the sensor approach would theoretically overcome the limitations of assessing variants at different genetic sites in parallel by eliminating the need to sequence several endogenous loci.
To test this approach, we first needed to build a computational tool capable of designing and ranking pegRNAs for thousands of genetic variants, while automatically generating a paired sensor site. To address this unmet need, we built and publicly released PEGG (Extended Data Fig. 1a)—a Python package that enables high-throughput design of prime editing sensor libraries38 (available at https://pegg.readthedocs.io/en/latest/). PEGG is compatible with a range of mutation input formats, including all of the datasets on the cBioPortal, ClinVar identifiers and custom mutation inputs39,40.
We chose the TP53 tumor suppressor gene as a prototype to establish and credential a scalable prime editing sensor-based screening approach for a number of reasons. First, TP53 is the most frequently mutated gene in human cancer, with ~50% of patients suffering from tumors harboring a mutation within the TP53 gene while the rest often inactivate the p53 pathway through other mechanisms. Second, thousands of unique TP53 mutations have been identified in cancer patients, including eight or so ‘hotspot’ alleles in specific residues that exhibit the highest mutational frequencies19. Although p53 has been studied for decades, there have been few systematic studies, and those have been hampered by reliance on artificial overexpression of mutant p53 proteins, unrepresentative cell lines and/or a limited spectrum of mutations evaluated6,7,15,16. These and other studies have sparked controversy in the field over whether any mutant p53 proteins are endowed with activities that go beyond LOF or dominant negative activity to achieve GOF or neomorphic status. These are important questions that extend beyond TP53 because mutant GOF proteins generated by cancer-associated variants, and the phenotypes they produce, could represent attractive therapeutic targets. Finally, prime editing sensor-based screening could be scaled up and broadly deployed to identify causal genetic variants implicated in cancer and other diseases with a strong genetic association.
With the above goals in mind, we first sought to generate a library of pegRNAs targeting TP53 variants. To generate this library, we selected variants from the MSK-IMPACT database, which uses deep exon sequencing of patient tumor samples to identify cancer-associated variants20. From this database of over 40,000 patients, we chose all observed SNVs in p53, as well as frequently observed insertions and deletions, along with a collection of random indels (Extended Data Fig. 1b). We reasoned that including several pegRNAs with different protospacers and combinations of pegRNA properties for each variant would allow us to scan the pegRNA design space more thoroughly to identify highly efficient guides for robust statistical analysis of variant phenotypes. To accomplish this, we used PEGG to produce 30 pegRNA designs per variant (for pegRNAs with a sufficient number of accessible PAM sequences) with varying reverse transcription template (RTT) (10–30 nucleotides) and primer binding site (PBS) lengths (10–15 nucleotides) coupled to canonical ‘NGG’ protospacers. The generated pegRNA designs were ranked based on a composite ‘PEGG score’ that integrates literature best practices for pegRNA design (Extended Data Fig. 1a and Supplementary Table 1).
PEGG also generated silent substitution variants as neutral internal controls for the screen, and we filtered pegRNAs to exclude protospacers with an MIT specificity score below 50 to reduce the probability of off-target editing41 (Extended Data Fig. 1e). In addition, these pegRNA designs included an epegRNA motif—tevopreQ1—an RNA pseudoknot located at the 3′ end of the pegRNA that improves editing by preventing degradation of the guide22. Even after these relatively stringent filtration steps, we were able to generate pegRNA designs for more than 95% of the input variants, resulting in a library of >28,000 pegRNAs (Fig. 1c,d and Extended Data Fig. 1c,d). Each pegRNA in the library is also paired with a 60-nucleotide long variant-specific synthetic ‘sensor’ that is generated by PEGG and included in the final oligonucleotide design. Every sensor is designed to recapitulate the native endogenous target locus, thereby linking pegRNA identity to editing outcomes (Fig. 1b).
To test the efficacy of using the sensor as a readout of editing at the endogenous locus, we randomly selected eight TP53 variant-specific pegRNA sensors generated during the process of library preparation. We generated lentivirus for each of these prime editing sensor constructs and performed separate transductions into cells expressing PEmax. At 3- (3D) and 7-days posttransduction (D7), we harvested genomic DNA and amplified both the pegRNA–sensor cassette and the endogenous locus targeted by each pegRNA. Analysis of editing at the sensor and endogenous locus revealed a very high correlation between the sensors and endogenous sites (Spearman correlation ≥0.9; Fig. 1e). In general, the prime editing sensor seems to slightly overestimate the editing activity at the endogenous locus, probably in part due to differences in locus chromatin accessibility42, but the ranking of pegRNA editing efficiencies is largely preserved, validating our sensor-based approach.
High-throughput interrogation of TP53 variants
Next, we screened our library of variants in TP53 wild type (WT) A549 lung adenocarcinoma cells stably expressing PEmax21. To measure the prime editing activity of this cell line, we generated and transduced these cells with a modified all-in-one lentiviral version of the fluorescence-based PEAR reporter43, validating that the cells displayed strong editing activity (Extended Data Fig. 2a). We then introduced the lentiviral TP53 sensor library into these cells at a low multiplicity of infection and in triplicate while ensuring a library representation of >1,000× at every step of the sfcreen. At 4 days posttransduction (D4), we split the populations into untreated or Nutlin-3-treatment arms (Fig. 1f). Nutlin-3 is a small molecule that inhibits MDM2 to activate the p53 pathway, which can be used to select for TP53 mutations that promote bypass of p53-dependent cell cycle arrest and apoptosis44. We hypothesized that this treatment group may increase the signal-to-noise ratio between TP53 variants with putative loss-of-function (LOF) or gain-of-function (GOF) activities and benign variants. We allowed the screen to progress for 34 days (D34), harvesting cell pellets from each replicate and treatment arm at several timepoints (Extended Data Fig. 2b). Genomic DNA extracted from each sample was used to amplify the pegRNA–sensor cassettes, which were subjected to next-generation sequencing (NGS) to simultaneously assess enrichment/depletion of pegRNAs and their editing activity and outcomes at the sensor target site (Extended Data Fig. 2c,d).
The average editing efficiency among all pegRNAs in the library increased in a time-dependent manner, peaking at ~8% in the final timepoint. In general, we observed low indel rates and strong correlation in sensor editing among replicates (Fig. 1g and Extended Data Fig. 3a–d). Strikingly, selecting only the most efficient pegRNA design for each variant led to a twofold increase in the average editing efficiency, highlighting the utility of the sensor for systematic empirical identification of high efficiency pegRNAs (Fig. 1g and Extended Data Fig. 3e–g). Cells with higher editing efficiency also exhibited stronger Nutlin-3 bypass in the Nutlin-3-treatment arm (Fig. 1g). Based on the assessment of editing at the sensor locus, we were able to identify active pegRNAs (≥2% editing efficiency) for more than half of the TP53 variants included in the library. This includes highly efficient pegRNAs that install the desired edit with over 20% efficiency for more than 20% of the variants (Fig. 1h). These validated pegRNAs could be further engineered with silent mutations that evade mismatch repair to boost overall editing efficiency21.
The size and diversity of this library also allowed us to examine features of highly efficient pegRNAs that broadly recapitulated previous observations32,33,37,45. Correlation analysis between various pegRNA features and editing efficiency across all timepoints identified the estimated on-target activity of the protospacer (as predicted by Rule Set 2)46 as the single largest determinant of prime editing efficiency (Fig. 2a). In addition, the distance between the edit and the nick introduced by nCas9 was correlated negatively with editing efficiency, while the length of the postedit homology was correlated positively with editing efficiency (Fig. 2a). Thus, edits closer to the nick and with larger postedit homology were more efficient, consistent with previous findings32,33,37,45.
Notably, the PEGG Score, which is a weighted linear combination of pegRNA features based on literature best practices, correlated more strongly with prime editing efficiency than any other single feature, achieving a Spearman correlation of up to 0.4 (Fig. 2a–c). Although this correlation is modest relative to published predictive models32,33,37, the PEGG score is a simple, unbiased and cell type/organism-agnostic prediction of pegRNA activity that could complement machine-learning-based predictions of prime editing activity, which may vary due to training on particular cell types.
To further analyze the differences in prime editing activity among the 173 protospacers spanning the TP53 locus, we visualized the number of pegRNAs that utilized each protospacer and the average editing efficiency at each protospacer (Fig. 2d). This analysis suggests that only a subset of protospacers can be used to generate high efficiency pegRNAs, while other protospacers retain little-to-no editing activity. We also found that pegRNAs that introduce edits that disrupt the protospacer or PAM sequence tend to be more efficient (Fig. 2e). Relative to the nick created by nCas9, SNVs introduced at the +1–3 position, which mutate the protospacer, and at the +5–6 position, which mutate the guanine bases in the NGG PAM, display increased editing activity. In contrast, edits introduced at the +4 position, corresponding to the ‘N’ in the ‘NGG’ PAM sequence, display reduced editing, probably due to their failure to disrupt the PAM sequence (Fig. 2e).
Finally, we trained a random forest regressor to predict pegRNA efficiency (Extended Data Fig. 4a). Even with a restricted set of features, this algorithm was able to predict pegRNA activity with a Spearman correlation of ~0.6, comparable with other, more complex algorithms used to predict PE activity32,33,37 (Extended Data Fig. 4b). Analysis of the relative feature importance of this random forest model was again consistent with previous findings, and highlighted the GC content of the PBS as another important determinant of editing not identified with simple correlation analysis (Fig. 2f). These results demonstrate that large-scale, gene-specific prime editing sensor screening datasets can also provide insight into the determinants of high efficiency prime editing, even though these libraries were not designed with that objective in mind.
Sensor-based calibration identifies pathogenic TP53 variants
To assess the relative fitness conferred by engineered TP53 variants, we used the MAGeCK pipeline to normalize read counts among replicates and quantify the log2 fold change (LFC) and false discovery rate (FDR) of pegRNAs in the library47. While the LFC in pegRNA counts was highly correlated in replicates from the untreated and Nutlin-3-treated arms of the screen, respectively, the correlation among replicates between the two conditions was modest, suggesting that treatment-dependent biological effects were occurring (Extended Data Figs. 3a and 5). We then used the sensor target site as a quantitative proxy for editing efficiency at the endogenous locus to systematically filter pegRNAs based on their empirical editing efficiency and precision (Fig. 3a and Extended Data Fig. 6a–d). As expected, the number of significantly enriched or depleted pegRNAs in ‘sensor-calibrated’ datasets decreased as we increased the editing activity threshold (Fig. 3b). These results demonstrate that our sensor-based approach allows empirical removal of pegRNAs that exhibit potentially spurious enrichment or depletion, and low and/or undesired editing activity, retaining pegRNAs that are more likely to introduce the variants of interest with high efficiency and precision. Based on these results, we decided to focus our statistical analyses on a dataset composed of pegRNAs with ≥10% editing efficiency to minimize the confounding effects of imprecise editing (Fig. 3c–g).
As hypothesized, the dynamic range in the Nutlin-3-treated arm of the screen was considerably higher than in the untreated arm, with pegRNAs more strongly enriching and depleting in the presence of Nutlin-3 (Fig. 3c,d). Treatment with Nutlin-3 also selected for cells with higher-efficiency editing, improving the resolution of the screen (Fig. 3c,d). Editing of sensor loci continued throughout the screen due to the constitutive expression of PEmax, with sensor editing increasing fourfold on average from D4 to D16, and twofold from D16 to D34. However, sensor editing rates among negatively selected (LFC < −1), unselected (−1 ≤ LFC ≤ 1) and positively selected (LFC > 1) pegRNAs remained constant all throughout the screen (Extended Data Fig. 7). These results indicate that any differences in editing rate among the pegRNAs or cells in the population were unlikely to contribute to the results of the screen and were instead controlled internally.
Several putative pathogenic TP53 variants, including R196P and R267P, were strongly enriched in both treatment arms, with several pegRNAs per variant appearing as top hits (Fig. 3e,f). Given the higher dynamic range of the Nutlin-3 treatment arm, as well as the possibility that this treatment biases towards the discovery of dominant negative TP53 mutations2,6,16, we focused our analyses on this treatment group. Several TP53 variants showed significant enrichment in the Nutlin-3-treatment group, including SNVs and indels in the C-terminal half of the DNA-binding domain (DBD) and the OD (Fig. 3f). This includes several variants at residues 248 and 249, which are known mutational hotspots in p53 and commonly observed in cancer patients and individuals with the Li–Fraumeni cancer predisposition syndrome17. We also identified strongly depleting variants in the DBD that may retain WT p53 transcriptional activity or fail to exert a dominant negative effect on the p53 tetramer (Fig. 3f). Collectively, these results validate the utility of our approach and dataset to accurately identify functionally diverse pathogenic TP53 variants.
Interestingly, the most commonly observed TP53 mutation in human cancer, R175H, did not show strong enrichment despite the existence of several R175H pegRNAs exhibiting ≥10% editing efficiency. In fact, most of the top enriching variants we identified were not in known mutational hotspots19, suggesting that other types of variants can produce stronger phenotypes. These include the top hit—an insertion of a histidine between residues 254 and 255—as well as R196P and several variants in the OD (F328L, N345S, A347P) (Fig. 3f). These observations are consistent with the possibility that a subset of TP53 hotspot mutations are observed in part due to disproportionately high mutagenesis rates at the genomic level due to extrinsic and intrinsic factors, such as tobacco smoke and APOBEC (apolipoprotein B mRNA editing catalytic polypeptide-like) activity, rather than only to the fitness advantage conferred by these variants relative to other TP53 mutations6,19. An alternative explanation to these observations is that the hotspot variants were simply not efficiently engineered by the pegRNAs used in our screen. Although this was true for several hotspot variants, even hotspots with several efficient pegRNA designs (for example, R248Q, Y220C, G245S) were outcompeted by other, less frequently observed variants (Extended Data Fig. 8a–c). This includes rare-variant-encoding pegRNAs that outcompete hotspot variant-encoding pegRNAs with similar or identical empirical editing frequencies (Extended Data Fig. 8a–c). Even at the frequently mutated codon 248, the two most commonly observed substitutions, R248Q and R248W, were outcompeted in the screen by the rarer substitutions R248P and R248G (Extended Data Fig. 8d,e). These results suggest that the spectrum of cancer-associated TP53 mutations is mechanistically diverse and probably arises through the contextual combination of disproportionate mutagenesis rates and phenotypic selection of functionally important codons and their cognate residues.
Bulk quantification of pegRNAs grouped by variant class also revealed that nonsense variants were significantly more enriched compared with missense and silent variants. This is evident at several thresholds of pegRNA activity (Fig. 3g,h). As expected, silent variants tend to deplete, particularly when considering pegRNAs at higher threshold for editing efficiency (≥20%), bolstering our confidence in the fidelity of the screen (Fig. 3h). Using available annotations of p53 residue function48, we also found that, as expected, variants in DNA-binding/contacting residues displayed strong enrichment (including residues 248 and 273) (Fig. 3i). Intriguingly, certain variants involved in tetramerization and transactivation (for example, L22V) were also strongly enriched, despite the low frequency of mutation in these residues (Fig. 3i). Other observations are more difficult to interpret, such as the large variance in the enrichment of mutations that affect residues involved in zinc binding, or the fact that variants in partially exposed residues tended to deplete while those in buried and exposed residues tended to enrich (Fig. 3i). Altogether, these observations suggest that there is a large, underappreciated phenotypic variance in the relative fitness conferred by distinct TP53 variants—not all p53 variants are one and the same, a concept that is likely relevant across many other genes and more broadly in biology.
We also sought to quantify the degree of concordance between our screening results and widely used metrics of variant deleteriousness. To do so, we used the combined annotation-dependent depletion (CADD) score, which integrates evolutionary conservation of residues with other metrics of pathogenicity to generate a CADD score, with higher scoring variants predicted to be more deleterious49. We observed a low correlation between CADD score and enrichment of all SNV-specific pegRNAs. However, the correlation between CADD score and variant-specific fitness increased dramatically when we used sensor target sites to restrict our analysis to variants generated by high efficiency pegRNAs (Fig. 3j). We achieved a Spearman correlation of ~0.3 when considering Nutlin-3-treated pegRNAs with ≥15% editing activity, and >0.4 when considering pegRNAs ≥50% editing (Fig. 3j). Across all minimum pegRNA editing activity thresholds, the CADD score correlated more strongly with fitness of Nutlin-3-treated pegRNAs to the untreated condition. These results emphasize the significant advantage of including sensor target sites in prime editing screening libraries to quantitatively assess pegRNA efficiency and reinforce the ability of Nutlin-3 to effectively pull out genuine LOF and putative neomorphic and separation-of-function TP53 variants (Fig. 3j,k).
Sequencing of endogenous TP53 validates sensor approach
The above results demonstrate that sensor-calibrated quantification of pegRNA enrichment and depletion can be used to identify bona fide pathogenic TP53 variants. However, these analyses do not rule out the possibility that these changes are independent of true editing at the endogenous TP53 locus. To formally test whether prime editing sensor screens can faithfully quantify the effects of variants engineered at endogenous loci, we performed targeted next-generation sequencing of specific regions in exons 6, 7 and 10 of TP53 using genomic DNA extracted from untreated and Nutlin-treated cells from D4 and D34 timepoints (Fig. 4a). We reasoned that sequencing the native TP53 locus would allow us to directly compare the fold change in pegRNA counts with the fold change of variants installed at defined targeted sites within endogenous TP53.
Unlike nonsensor-based prime editing screens, which are likely to suffer from extensive noise during enrichment due to the difficulty of designing high efficiency pegRNAs, our sensor-based prime editing screen could be denoised empirically by filtering for pegRNAs that edited their cognate sensor above a given editing threshold. Indeed, we observed no correlation between the LFC in pegRNA counts with the corresponding LFC of endogenous variants engineered in the native TP53 locus when considering all pegRNAs (Fig. 4b–d). However, taking advantage of the sensor site to filter out pegRNAs below a given correct editing threshold dramatically reduced the noise in these data and revealed a strong correlation between the fold change in pegRNA counts and endogenous variant counts (Fig. 4b–c). This correlation increased monotonically as the minimum correct editing threshold increased, reaching a Spearman correlation >0.4 in the untreated arm and >0.5 in the Nutlin-treated arm of the screen when using a minimum correct editing threshold of 50% (Fig. 4d). We note that this is probably an underestimate of correlation on a per variant basis, as several pegRNAs with variable editing efficiencies are being compared directly with a single endogenous genomic site. Importantly, edited exonic sites targeted by top enriching pegRNAs (for example, R196P, A347P, I254_I255insH) were also enriched significantly relative to their WT counterparts and correlated strongly with their respective pegRNA counts. We were also able to detect nearly all of the variants engineered by active pegRNAs (≥500 counts per variant and ≥1% correct sensor editing), and the detectable fraction reached saturation when considering pegRNAs producing ≥14% editing at the sensor site (Fig. 4e). Altogether, these results emphasize the need to integrate quantitative, sensor-like approaches to accurately extract true signal from the high levels of noise that are inherent in large-scale prime editing screens. Indeed, our analysis demonstrates that screening pegRNAs without any empirical quantification of their editing activity invariably leads to spurious conclusions concerning the fitness of the variants that those pegRNAs are intended to engineer.
Functional validation of pathogenic TP53 variants
The above data demonstrate that pegRNA-specific sensor modules can be used to rigorously calibrate screening results to limit the analysis of variant fitness effects only to highly efficient pegRNAs. Though suggestive, these results do not formally prove that top scoring pegRNAs are enriched due to the introduction of defined genetic variants at the endogenous target locus and that these drive the observed biological differences. To test this, we selected a cohort of 29 pegRNAs that significantly enriched or depleted in the screen, or that generated commonly observed ‘hotspot’ mutations (Fig. 5a and Extended Data Fig. 9a). This set of pegRNAs targeted residues that spanned the TP53 locus and also included two control pegRNAs that install silent edits (Fig. 5a). In all, this validation set included low, medium and high efficiency pegRNAs spanning a range of 0–86% correct editing percentages, as measured by their respective sensor sites (Extended Data Fig. 9a). We transduced A549-PEmax cells with lentiviruses encoding individual pegRNAs and allowed editing to occur for 7–10 days, based on the kinetics of editing we observed previously (Fig. 1f). We then mixed each individual population of sequence-verified isogenic A549-PEmax-pegRNA cells with parental TP53 WT A549-PEmax cells and performed longitudinal fluorescence competition assays in the presence or absence of Nutlin-3 (Fig. 5b). We used the red fluorescent protein (RFP) fluorophore linked to each pegRNA vector to track the relative fitness of pegRNA cells (RFP+) compared with parental cells (RFP−) (Fig. 5c). These competition assays proceeded for 2 weeks, with flow cytometry readings taken every 7 days for each replicate (Extended Data Fig. 9b). We then calculated the difference in the RFP+ cell fraction (∆RFP%) for each pegRNA between the profiled timepoints and the initial timepoint in both conditions (Fig. 5d). Consistent with our screening results, a significant fraction of pegRNAs showed enrichment in the presence of Nutlin-3, often reaching complete saturation (Fig. 5d). Overall, there was strong concordance between the enrichment in cells observed in both treatment conditions in the screen and in these competition assays, supporting the reproducibility of the screening results (Fig. 5e,f). Importantly, we observed a significantly strong enrichment of cells expressing pegRNAs designed to engineer variants in the OD, including A347P and N345S (Fig. 5e,f).
The above results indicate that cells harboring a number of pegRNAs designed to engineer diverse types of TP53 variants have an increased fitness. However, these results do not rule out the possibility that these pegRNAs confer increased fitness through indel generation, rather than through their encoded edits. To assess this, we performed targeted NGS of each endogenous target loci in pure A549-PEmax-pegRNA cell lines (that is, before mixing with the parental population). Comparison of the endogenous editing observed in these cell lines with the corresponding sensor editing observed in the untreated arm of the screen at a similar timepoint (D16) revealed a strong correlation, with on-target editing observed for almost all pegRNAs (Fig. 5g).
Another potential application of our approach is to combine high-throughput prime editing with drug treatments to identify variant–drug interactions that could be exploited to develop allele-specific therapies. This is particularly relevant today because recent advances in rational drug design have shown that small molecules targeting specific mutant proteins (including those produced by oncogenic point mutant KRAS alleles) can have therapeutic potential50. To test whether our approach could be used to identify variant-specific therapeutic sensitivities, we tested two small molecules that have been shown to exhibit mutant-p53-specific effects, COTI-2 and PK7088, in cells transduced with lentiviruses encoding R175H- and Y220C-targeting pegRNAs, respectively51,52,53. Both of these treated populations showed depletion in the RFP+ cell fraction (Extended Data Fig. 9c). In particular, the COTI-2 treatment arm showed significant depletion of R175H-pegRNA cells (Extended Data Fig. 9c). These results demonstrate that prime editing sensor screens could be used to systematically identify variant-specific vulnerabilities to diverse therapies, augmenting cDNA-based approaches for performing similar screens54.
Prime editing reveals new pathogenic variants
High-throughput functional genomics approaches have been used previously to investigate TP53 mutations. For instance, Giacomelli et al. performed deep mutational scanning of TP53 variants using exogenous overexpression of mutant TP53 cDNAs in A549 cells in the presence or absence of Nutlin-3, concluding that most TP53 mutations probably arise as a consequence of endogenous mutational processes that select for dominant negative and LOF activity6. A follow-up study integrated this method with HDR-based modeling of six TP53 hotspot mutations in human leukemia cells and concluded that missense mutations in the TP53 DBD act mainly through dominant negative activity2. More recently, Ursu et al. employed a modified version of Perturb-seq to interrogate the transcriptional effects of 200 mutant TP53 cDNAs in A549 cells by single-cell RNA-sequencing, also concluding that most of these disrupt p53 activity through LOF and dominant negative effects16. In contrast, another study used a similar approach in H1299 and HCT116 cell lines to interrogate variants in the TP53 DBD through parallel in vitro and in vivo experiments, concluding that certain hotspot mutations confer a higher proliferative advantage in vivo, probably through GOF mechanisms7. A number of studies in both mice and humans have also demonstrated that certain TP53 variants, including hotspot mutations at residues R175, R248 and R273, can produce phenotypes consistent with neomorphic/GOF activities55. These include promoting aberrant self-renewal of hematopoietic stem cells56, sustaining tumor growth57,58 and promoting metastatic dissemination59,60,61,62, among others55. As such, there is much controversy in the field regarding the precise cellular and molecular activities of cancer-associated TP53 mutations—the most common genetic lesions observed across all types of cancer.
We hypothesized that cDNA screening approaches are biased in favor of detecting dominant negative activities due to their reliance on supraphysiological overexpression of mutant proteins. This is particularly relevant for studying the active p53 transcription factor, which is a tetrameric protein composed of a dimer of dimers18. As such, we postulated that mutant overexpression studies in WT TP53 cells, including A549, may fail to detect mutant allele-specific activities and phenotypes that may be sensitive to gene dosage and protein stoichiometry. To test this hypothesis, we reanalyzed our data to perform comparative bioinformatic analyses with the dataset generated by Giacomelli et al.6, as their experiments were also carried out in WT TP53 A549 cells treated with Nutlin-3. First, we plotted the Z-scores of SNV-generating pegRNAs (≥10% editing) against the Z-scores of the corresponding variants expressed from cDNAs (Fig. 6a). Supporting our hypothesis, variants in the OD of p53 tended to deplete (that is, Z-score <0) when expressed from cDNAs, but often enriched significantly when expressed from the endogenous locus (Fig. 6a,b). To investigate this difference further, we calculated the difference in Z-scores (∆Z-score) between each pegRNA–cDNA pair by subtracting the cDNA Z-score from the prime editing Z-score. This analysis revealed a significantly higher ∆Z-score for endogenous variants in the OD relative to other domains of p53 (Fig. 6c)—a pattern that consistently held at several thresholds of pegRNA activity (Extended Data Fig. 10a) and even when we restricted our analysis solely to the most efficient pegRNA for each variant (Fig. 6d and Extended Data Fig. 10b).
Some of the most impactful variants in the p53 OD are observed frequently in individuals with Li–Fraumeni syndrome, who carry germline TP53 variants that predispose them to cancer. Two independent studies by the Prives and Lozano laboratories recently showed that p53 proteins harboring A347D mutations (in the OD) form stable dimers instead of tetramers, and that these dimeric p53 proteins exhibit neomorphic activities63,64. Visualizing the residue-averaged ∆Z-scores on the structure of the p53 OD65 further highlights the extensive differences in the behavior of endogenous variants as compared with exogenous (cDNA) variants in this domain (Fig. 6e).
To further investigate the phenotypic differences between endogenous and exogenous TP53 variants, we performed fluorescence competition assays with Nutlin-treated A549-PEmax cells transduced with pegRNAs or matched cDNAs representing specific variants spanning the TP53 DBD and OD regions. We also included a number of important controls for each approach, including silent-edit- or no-edit-inducing pegRNAs that targeted the same loci as their matched variant-inducing pegRNAs, as well as empty vector and WT cDNA constructs. We found a strong agreement between the behavior of mutant TP53 cDNAs tested in competition assays and their corresponding Z-scores in the Giacomelli et al. screen6, validating our assay (Fig. 6f). However, comparing the enrichment of cells harboring endogenous variants engineered with pegRNAs relative to those expressing exogenous variant cDNAs revealed large differences in the behavior conferred by endogenous and exogenous TP53 mutations (Fig. 6g and Extended Data Fig. 10c). Variants in the DBD behaved similarly across both systems, with the exception of R110P and L145P, which enriched only in the cDNA group. However, all OD variants failed to enrich when tested with cDNAs, but three-quarters of OD variants displayed strong enrichment when engineered endogenously using prime editing (Fig. 6g). Importantly, all controls behaved as expected, failing to enrich in the competition assays (Fig. 6g). These results support the hypothesis that certain variant-induced phenotypes can be observed accurately only when engineered and expressed in the endogenous genomic context.
Collectively, our observations highlight gene dosage, protein stoichiometry and protein–protein interaction domains as important variables that must be taken into account when studying the enormous diversity of mutant alleles observed in human cancer. Failing to take these considerations into account might lead to the misclassification of bona fide pathogenic variants, including those identified in patients with hereditary cancer predisposition syndromes like Li–Fraumeni.
Discussion
Most genetic variants associated with various human diseases, including cancer, remain uncharacterized40. The development of CRISPR-based precision genome editing tools, including base and prime editing9,10, has opened the door to rigorous experimental interrogation of disease-associated variants with single basepair resolution at unprecedented scale. However, these approaches, particularly prime editing, remain limited by the variance in editing efficiency among different pegRNAs.
Building on our previous work, we developed a prime editing sensor-based framework for engineering and screening variants that overcomes these limitations. Using this sensor-based approach, in which each pegRNA is coupled to an artificial version of its endogenous target site, we simultaneously identify enriched pegRNAs while empirically quantifying their editing efficiencies. This approach allows for the characterization of variants at several target sites while correcting for the potentially confounding differences in editing activity among distinct pegRNAs. To facilitate the creation of similar libraries by other researchers, we also built PEGG—a new computational tool for creating prime editing sensor libraries (https://pegg.readthedocs.io/en/latest/).
As a prototype for the prime editing sensor framework, we generated a library of pegRNAs targeting over 1,000 cancer-associated variants in TP53. We reasoned that p53 would be the most salient prototype for testing the efficacy of our prime editing sensor screening platform because of its central role in cancer development and progression, the various studies that have performed deep mutational scanning of p53 whose data can provide direct sources of comparison and the ongoing controversy in the field as to whether most (if not all) TP53 mutations observed in cancer patients are functionally redundant.
With our sensor-based approach, we were able to systematically search the pegRNA design space and identify high efficiency pegRNAs, allowing us to install over half of the targeted TP53 variants. Although not the focus of the present study, the breadth and depth of our dataset also allowed us to recapitulate many of the previous findings about the factors affecting pegRNA efficiency.
Analysis of the screening data revealed a wide range in the fitness of TP53 variants, challenging the idea that most p53 variants, particularly those in the DBD, are functionally redundant. While we identified strongly enriching pegRNAs, including many that generated commonly observed ‘hotspot’ mutations at DNA-contacting residues, most of the strongly enriching pegRNAs encoded variants that are not located at mutational hotspots. Instead, a number of these mutations are less frequently observed in patients and remain poorly understood, despite the fact that they collectively affect thousands of patients globally every year. These include variants located within the OD of p53, which were functionally validated with follow-up experiments, and some of which were recently shown to be bona fide pathogenic variants in humans with Li–Fraumeni syndrome63,64.
Comparison between our screening data and a previous study that also screened TP53 variants in A549 cells and in the presence of Nutlin-3 but instead used cDNA-based overexpression libraries revealed a statistically significant enrichment of endogenous—but not overexpressed—variants located in the p53 OD. We further validated this finding with functional assays comparing the behavior of cells harboring exogenous or endogenous TP53 variants. This comparison highlights the potential limitations of cDNA-based screening approaches, particularly when studying variants at sites of protein–protein interactions, underscoring the need to study variants in their native context to access their true biology. Together, our data suggest that stoichiometric imbalances produced by cDNA overexpression could lead to the misclassification of genetic variants as noncausal or otherwise benign. Our findings thus offer a cautionary note when using exogenous overexpression systems to interpret pathogenic alleles, and highlight the importance of using strategies like the one described in this work to investigate variants of interest in their native genomic contexts whenever possible.
More broadly, our study provides a conceptual blueprint and a modular set of experimental and computational tools that can be applied to evaluate diverse types of genetic variants in their native endogenous genomic contexts with high-throughput prime editing. For example, the prime editing sensor strategy described here could be used to investigate the influence of endogenous coding variants on drug resistance or other cancer-relevant cellular phenotypes, while maintaining native levels and regulation of the proteins of interest. Alternatively, our approach could be applied to interrogate the effects of noncoding variants at diverse loci, assessing gene regulatory biology not easily amenable to other screening approaches.
Future prime editing sensor screening efforts could incorporate improved prime editors and pegRNAs, as well as higher-efficiency prime editing systems and strategies like PE3 and PE4 (ref. 10,21). These studies could also be performed in vivo, for example, by delivering compact libraries of mismatch repair-evasive pegRNAs into mice expressing prime editors66 or by codelivery of smaller prime editors, such as PE6a23. Taken together, we envision that our approach will expand our understanding of pathogenic gene variants and help match patients suffering from genetic diseases with effective therapies.
Methods
Experimental materials and methods
Plasmids and pegRNA cloning
All plasmids were generated using Gibson Assembly strategies67 using NEBuilder HiFi DNA Assembly Master Mix (NEB cat. no. E2621) following the manufacturer’s protocol. All new plasmids, along with detailed maps and sequences, will be made available through Addgene. The PEmax coding sequence in Lenti-EFS-PEmax-P2A-Puro was obtained from pCMV-PEmax (Addgene, cat. no. 174820)21. The lentiviral plasmid used to clone and express prime editing sensor libraries was assembled by transferring the U6-sgRNA-EFS-Blast-P2A-TurboRFP cassette from pUSEBR (ref. 14) into the higher titer pLV backbone68. Lenti-PEAR-mCherry—a modified all-in-one lentiviral version of the PEAR reporter43—was also cloned using Gibson Assembly and used to test the editing activity of A549-PEmax cells. The Lenti-UPEmS-tevo plasmid is a modified version of the UPEmS vector66 that contains the tevopreQ1 motif22. This plasmid was used to assemble pegRNAs via Golden Gate Assembly66 for follow-up pegRNA validation experiments. Human WT or mutant TP53 cDNAs were cloned into pCDH-EF1-MCS-IRES-RFP (System Biosciences, cat. no. CD531A-2) using primers EcoRI-TP53-Fwd (5′-CAGTCAGAATTCGCCACCATGGAGGAGCCGCAGTCAG-3′) and BamHI-TP53-Rev (5′-CTGACTGGATCCTCAGTCTGAGTCAGGCCCTTCTGTCTTGAAC-3′). Fragments encoding each cDNA were obtained from Twist Biosciences (Supplementary Table 1).
Virus production
Lentiviruses were produced by cotransfection of HEK293T cells with the relevant lentiviral transfer vector and packaging vectors psPAX2 (Addgene, cat. no. 12260) and pMD2.G (Addgene, cat. no. 12259) using Lipofectamine 2000 (Invitrogen, cat. no. 11668030). Viral supernatants were collected at 48- and 72-h posttransfection and stored at −80 °C.
Drug treatments
Nutlin-3 (Selleck Chemicals, cat. no. S1061) was dissolved in dimethylsulfoxide at a stock concentration of 10 mM and used at a final concentration of 10 μM. PK7088 (Aobious cat. no. AOB4255) was diluted to a final concentration of 200 µM from a stock concentration of 10 mM. COTI-2 (MedChemExpress cat. no. HY-19896) was dissolved in dimethylsulfoxide to a stock concentration of 10 mM and added to a final concentration of 1 µM.
Flow cytometric analyses
Fluorescence-based measurements for the validation of prime editing activity with PEAR and for competition assays were performed using the BD FACSCelesta Cell Analyzer in tube or plate reader format, with BD FACSDiva v.9.0 software used for data collection. Downstream analysis was performed using FlowJo v.10.9.0 to identify single cells and quantify fluorescence.
Generation of A549-PEmax cell lines
To generate A549 cells stably expressing PEmax, we transduced a 15-cm plate with 2.5 million cells with freshly harvested EFS-PEmax-P2A-puromycin lentivirus, and selected cells with 10 µg ml−1 of puromycin at 72 h posttransduction. To assess the prime editing efficiency of these cells, we transduced 250K A549-PEmax cells in triplicate in six-well plates with Lenti-PEAR-mCherry—a modified, all-in-one lentiviral PEAR construct where green fluorescent protein is turned on in the event of successful prime editing. Based on the PEAR reporter activity, we noticed that prime editing activity was not sufficiently high in these cells. We then retransduced these cells with successive rounds of freshly harvested EFS-PEmax-P2A-Puromycin lentivirus. Repeating the PEAR reporter assay revealed a substantial increase in prime editing activity (Extended Data Fig. 2a). This A549-PEmax ‘v2’ cell line was used throughout the present study.
Cloning of p53-sensor library
The oligonucleotide library was ordered from Twist Biosciences. The lyophilized library was resuspended in 100 µl of TE buffer (pH 8.0) and diluted to create 1 ng µl−1 stocks. We performed n = 32 PCR reactions with NEBNext High-Fidelity 2× PCR Master Mix (cat. no. M0541S) to amplify the library with the following primers at a low cycle count: forward 5′-CATAGCGTACACGTCTCACACCG, reverse 5′-GTGCCGTTGACGACCGGATCTAGAATTC. These PCR reactions were pooled and purified using the Qiagen PCR purification kit following the manufacturer’s protocols, with 10 µl of 3 M Na acetate pH 5.2 added for every five volumes of PB used per one volume of PCR reaction. The library was digested with Esp3I (NEB) and EcoRI-HF (NEB), pooled and purified. Subsequently, n = 16 ligations were performed using 300 ng of digested and dephosphorylated Trono-BR backbone and 3 ng of digested insert with high concentration T4 DNA Ligase (NEB, cat. no. M0202M). The ligation reactions were precipitated using QuantaBio 5PRIME Phase Lock Gel tubes before being resuspended in 3 µl of EB Buffer per four precipitated reactions. These precipitated ligation reactions were electroporated into Lucigen Endura ElectroCompetent cells (cat. no. 60242-2) before being plated on LB-carbenicillin plates and incubated at 37 °C for 16 h. Library representation was assessed at this step via serial dilution plating, displaying a representation on the order of 400×. We also picked 30 random colonies from these serial dilution plates to assess the fidelity of library cloning and to test a random set of pegRNAs. We scraped the plates and collected the bacteria in 250 ml of LB-ampicillin per four plates, before incubating for 2 h at 37 °C, collecting the bacteria by centrifugation, and proceeding to perform a Qiagen Maxiprep, following the manufacturer’s protocol. Lentivirus was generated via the aforementioned protocol, and viral titer was determined through serial dilutions of virus, transductions in 12-well plates with one million A549-PEmax cells, and measurement of the RFP-positive cell fraction at 72-h posttransduction. For extended protocol information, see Supplementary Protocol 1.
Screening protocol
For each replicate, 110 million A549-PEmax cells were combined with an appropriate amount of p53-sensor virus to achieve a multiplicity of infection < 1. To this mix, puromycin was added to a final concentration of 10 µg ml−1 and polybrene transfection reagent (Sigma-Aldrich, cat. no. TR-1003) was added to a final concentration of 8 µg ml−1 in F-12K (Gibco, cat. no. 21127030) medium supplemented with 10% FBS and 1× Penicillin-Streptomycin (ThermoFisher). This mix was plated into nine 12-well plates per replicate. At 24-h posttransduction, each 12-well plate was expanded to a 15-cm plate, with medium supplemented with 10 µg ml−1 puromycin and 10 µg ml−1 blasticidin S. These puromycin and blasticidin S concentrations were maintained throughout the screen. At 72-h posttransduction, each 15-cm plate was expanded to two 15-cm plates. At 96-h posttransduction, each replicate was replated at ≥1,000× representation (≥29 million cells) for the untreated arm and the Nutlin-3 treatment arm. For the Nutlin-3 treatment arm, Nutlin-3 was added to a final concentration of 10 µM. At this timepoint, a cell pellet was taken for gDNA extraction. All cell pellets included ≥29 million cells (1,000× representation) and were stored at −80˚C. Subsequently, every 3 days, the cells were split, and replated at 1,000× representation. At each timepoint, a cell pellet was taken if there were a sufficient number of cells to allow for 1,000× representation. This process was repeated until the screen was terminated at D34 posttransduction.
Genomic DNA extraction
Genomic DNA from the D4, D16, D25 and D34 timepoints of the screen was extracted using the Qiagen Genomic Tip/500G following the manufacturer’s protocol. Genomic DNA was resuspended in 200 µl of TE Buffer, pH 8.0. Concentrations were measured using a NanoDrop 2000 (ThermoFisher) and were normalized to 1 µg µl−1. For the competition assays (Fig. 5), genomic DNA was extracted using the DNeasy Blood and Tissue Kit (Qiagen), following the manufacturer’s protocol.
NGS sample preparation
We performed n = 30 PCR1 reactions per sample using Q5 High-Fidelity 2× Master Mix (NEB, cat. no. M0429S) with 10 µg of genomic DNA to maintain ≥1,000× representation. Up to four PCR reactions were pooled and purified using the Qiagen PCR purification kit following the manufacturer’s protocols. These reactions were then gel purified (Qiagen Gel Extraction Kit), pooled and measured using the NanoDrop 2000 (ThermoFisher). We performed n = 4 PCR2 reactions per sample using 10 ng of PCR1 as a template in each reaction. We PCR-purified and then gel purified these samples and eluted in 30 µl of EB Buffer. These samples were then submitted for sequencing. The PCR1 and PCR2 strategies and the deconvolution protocol are described in Supplementary Protocol 2. All primers are listed in Supplementary Table 1. A similar PCR1/2 strategy was used for preparation of endogenous TP53 amplicons for NGS with the Singular G4 sequencing system (Figs. 4 and 5). These protocols are described thoroughly in Supplementary Protocol 2 and the associated primers can be found in Supplementary Table 1.
Next-generation sequencing
We performed Amplicon-EZ sequencing (Azenta) for analysis of the correlation between sensor and endogenous editing (Fig. 1d). For the NGS of the p53-sensor library, we used the NovaSeq S1 200 sequencing system (NovaSeq 6000) with a custom sequencing primer set to amplify the protospacer, 3′ extension, sensor sequence and sample barcode in separate reads. All other NGS data were generated using the Singular G4 Sequencer (2 × 150 paired-end) with stock primers. All sequencing primers are listed in Supplementary Table 1. The custom sequencing approach for the NovaSeq 6000 is diagrammed in Supplementary Protocol 2.
Golden Gate assembly of UPEmS pegRNAs for follow-up validation
For follow-up validation of pegRNAs, individual pegRNAs were cloned via Golden Gate assembly into the Lenti-UPEmS-tevo backbone (generated by the present study). Golden Gate assembly was performed with annealed spacer oligonucleotides, annealed and phosphorylated scaffold oligonucleotides, and annealed 3′ extension oligonucleotides using NEB BsmBI Golden Gate enzyme mix, before being transformed, mini-prepped (Qiagen) and validated via whole-plasmid sequencing (Primordium). For full protocol details, see Supplementary Protocol 3. The full list of oligonucleotides used for cloning can be found in Supplementary Table 1.
Competition assays
To generate variant p53 lines, we seeded 100,000 A549-PEmax cells in six-well plates and added UPEmS lentivirus corresponding to each variant. To achieve saturation editing, we waited 7–10 days, expanding the cells to a 10-cm plate when they reached confluence, and took a cell pellet for gDNA extraction to assess editing. At this point, we mixed 250,000 variant (RFP+) cells with 750,000 untransduced A549-PEmax cells, and plated 50,000 cells in triplicate in six-well plates. For drug-treated conditions (Nutlin-3, COTI-2, PK7088), the compound was added to the appropriate concentration. Remaining cells were used for flow analysis and to generate a t = 0 cell pellet. At D7 and D14, we assessed the RFP+ fraction of the cells via flow cytometry. The flow gating strategy is displayed in Supplementary Fig. 1. For these analyses, we applied a stringent threshold of ≥500 quantifiable events (that is, single cells) because we found that samples with ≤500 quantifiable events, which were typically observed in cells treated with Nutlin-3 that underwent cellular senescence and/or apoptosis, were insufficient to accurately calculate the RFP+ cell fraction. In these cases, we assumed that the RFP+ cell fraction was unchanged from the previous timepoint, akin to a standard 3T3/proliferation assay.
Analytic/computational methods
Selection of TP53 variants and prime editing sensor library generation with PEGG
To select a cohort of TP53 variants for generating a prime editing sensor library, we used the MSK-IMPACT database20. We chose all SNVs observed in patients, as well as a collection of observed and random indels to increase the diversity of edits (Extended Data Fig. 1b). In addition, PEGG automatically generated 95 neutral/silent variants that tiled the TP53 locus to act as internal controls in the screen.
PEGG generated a maximum of 30 ranked pegRNA designs per variant with RTT lengths of 10, 15, 20, 25 and 30 nucleotides, and PBS lengths of 10, 13 and 15 nucleotides coupled to ‘NGG’ protospacers. A ‘G’ was appended to the start of each 20-nucleotide protospacer to improve U6 promoter-mediated transcription. After PEGG generated these pegRNA designs, we further filtered the library to exclude pegRNAs containing polyT termination sequences (≥4 consecutive Ts), EcoRI and Esp3I sites, and protospacers with an MIT specificity score less than 50. In addition, each pegRNA oligo included a matched, 60 nt sensor locus that was generated automatically by PEGG and used to link each pegRNA to its editing outcome.
For full details of generating a prime editing sensor library using PEGG, visit https://pegg.readthedocs.io/en/latest/.
Analysis of the p53-sensor screen
The p53-sensor sequencing results were demultiplexed into separate fastq files based on the sample barcode. Next, using a custom analysis script, we filtered reads with an average Phred quality score below 30, and identified pegRNAs based on the protospacer and 3′ extension sequences. Sequences with no matching protospacer or 3′ extension were discarded, and sequences with mismatched protospacer and 3′ extension sequences were discarded and classified as recombination events. Sequences with matching protospacer and 3′ extension sequences were used to generate pegRNA counts tables that were subsequently used for MAGeCK analysis (v.0.5.9) of pegRNA enrichment/depletion.
To classify editing outcomes at the sensor locus, we first determined whether recombination had occurred to decouple the pegRNA from its matched target sequence. To do so, we used the first and last five nucleotides of the sensor sequence as a barcode to detect recombination. Sensor reads with the first and last five nucleotides of the read matching the appropriate pegRNA were classified as correct sensor reads, while those with the first and last five nucleotides matching other pegRNAs were classified as recombination events and discarded. We noted that recombination between the pegRNA and sensor was observed at a higher rate when the protospacer was in the same orientation as the sensor, prompting us to update PEGG to automatically place the sensor sequence in the reverse orientation to reduce recombination in future PE sensor libraries (Extended Data Fig. 2e,f). Reads with the first and last five nucleotides with no match were classified as potential indels and retained. For each sample, the sensor reads that were not recombined were demultiplexed into separate fastq files for each pegRNA. We then used Crispresso2 (ref. 69) to classify editing outcomes, excluding the first and last five nucleotides of the sensor read from the quantification window. To determine the background subtracted correct editing percentage, we subtracted the correct editing percentage observed in the plasmid library from the correct editing percentage observed at a given timepoint, although we note that for plasmids with at least ten sensor reads, the median correct editing percentage was 0%, the average correct editing percentage was <0.1% and the maximum observed correct editing percentage was 8.7%.
For analysis of enrichment/depletion of pegRNAs, we used the MAGeCK algorithm47, with the D4 sample designated as the control timepoint. We then filtered to exclude pegRNAs with a control count mean <10 reads to reduce spuriously enriching pegRNA hits. For direct comparison with the cDNA libraries, the LFC values produced by MAGeCK were transformed into Z-Scores using the standard Z-score formula including all pegRNAs under consideration.
Processing and analysis of TP53 endogenous amplicon NGS sequencing
The sequencing files were automatically demultiplexed into separate fastq files based on the sample barcode. Next, we trimmed the sequences from 150 nucleotides to 100 nucleotides, to allow the sequences to be joined. The sequences were joined using the fastq-join (v.1.3.1) algorithm with the default parameters enabled70.
For analysis of genomic DNA amplified from the screen (Fig. 4), we then used custom analysis scripts to generate counts tables for all of the unique sequences, merge matching samples from different flow cells and determine the HGVSp and HGVSc of each sequence. To determine the LFC of each variant at these endogenous loci, we first filtered to exclude undesired variants (that is, those not targeted by pegRNAs in the library) and created counts tables for D4, D34 (untreated) and D34 (Nutlin-treated) for each of the three amplicons. For each amplicon, we used MAGeCK to normalize read counts between samples and determine the LFC of each variant. We then filtered endogenous variants with a control count of fewer than ten reads to reduce spuriously enriching variants. These MAGeCK tables from the different amplicons were concatenated to perform downstream analysis, comparing the endogenous variants with the pegRNA–sensor sequencing results. In addition, we performed sequencing of the WT A549-PEmax cell line, which confirmed the WT status of the regions amplified.
For analysis of genomic DNA amplified from the individually transduced A549-PEmax-pegRNA cells generated for competition assay testing (Fig. 5), we used Crispresso2 to classify editing outcomes, excluding the first and last five nucleotides of the sensor read from the quantification window69.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Raw sequencing data from the screen is deposited in the Sequence Read Archive under accession PRJNA1014453. All other processed datasets and source data are available at the following GitHub repository: https://github.com/samgould2/p53-prime-editing-sensor. MSK-IMPACT clinical sequencing data was accessed from the cBioPortal (https://cbioportal.org). Data for the Giacomelli et al.6 cDNA comparison was accessed from Supplementary Table 3 in the corresponding manuscript (https://doi.org/10.1038/s41588-018-0204-y).
Code availability
All analysis scripts, as well as Jupyter notebooks for generating each figure that appears in the paper, are available at the following GitHub repository: https://github.com/samgould2/p53-prime-editing-sensor. Further documentation and installation instructions for PEGG are available at https://pegg.readthedocs.io/en/latest/.
References
Winters, I. P. et al. Multiplexed in vivo homology-directed repair and tumor barcoding enables parallel quantification of Kras variant oncogenicity. Nat. Commun. 8, 2053 (2017).
Boettcher, S. et al. A dominant-negative effect drives selection of TP53 missense mutations in myeloid malignancies. Science 365, 599–604 (2019).
Sharon, E. et al. Functional genetic variants revealed by massively parallel precise genome editing. Cell 175, 544–557.e16 (2018).
Findlay, G. M., Boyle, E. A., Hause, R. J., Klein, J. C. & Shendure, J. Saturation editing of genomic regions by multiplex homology-directed repair. Nature 513, 120–123 (2014).
Findlay, G. M. et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 (2018).
Giacomelli, A. O. et al. Mutational processes shape the landscape of TP53 mutations in human cancer. Nat. Genet. 50, 1381–1387 (2018).
Kotler, E. et al. A systematic p53 mutation library links differential functional impact to cancer mutation pattern and evolutionary conservation. Mol. Cell 71, 178–190.e8 (2018).
Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).
Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).
Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019).
Erwood, S. et al. Saturation variant interpretation using CRISPR prime editing. Nat. Biotechnol. 40, 885–895 (2022).
Ren, X. et al. High throughput PRIME editing screens identify functional DNA variants in the human genome. Mol. Cell 83, 4633–4645 (2023).
Chardon, F. M. et al. A multiplex, prime editing framework for identifying drug resistance variants at scale. Preprint at bioRxiv https://doi.org/10.1101/2023.07.27.550902 (2023).
Sánchez-Rivera, F. J. et al. Base editing sensor libraries for high-throughput engineering and functional analysis of cancer-associated single nucleotide variants. Nat. Biotechnol. 40, 862–873 (2022).
Kato, S. et al. Understanding the function-structure and function-mutation relationships of p53 tumor suppressor protein by high-resolution missense mutation analysis. Proc. Natl Acad. Sci. USA 100, 8424–8429 (2003).
Ursu, O. et al. Massively parallel phenotyping of coding variants in cancer with Perturb-seq. Nat. Biotechnol. 40, 896–905 (2022).
Kastenhuber, E. R. & Lowe, S. W. Putting p53 in Context. Cell 170, 1062–1078 (2017).
Gencel-Augusto, J. & Lozano, G. p53 tetramerization: at the center of the dominant-negative effect of mutant p53. Genes Dev. 34, 1128–1146 (2020).
Baugh, E. H., Ke, H., Levine, A. J., Bonneau, R. A. & Chan, C. S. Why are there hotspot mutations in the TP53 gene in human cancers? Cell Death Differ. 25, 154–160 (2018).
Zehir, A. et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 23, 703–713 (2017).
Chen, P. J. et al. Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell 184, 5635–5652.e29 (2021).
Nelson, J. W. et al. Engineered pegRNAs improve prime editing efficiency. Nat. Biotechnol. 40, 402–410 (2022).
Doman, J. L. et al. Phage-assisted evolution and protein engineering yield compact, efficient prime editors. Cell 186, 3983–4002.e26 (2023).
Morris, J. A., Rahman, J. A., Guo, X. & Sanjana, N. E. Automated design of CRISPR prime editors for 56,000 human pathogenic variants. iScience 24, 103380 (2021).
Li, Y., Chen, J., Tsai, S. Q. & Cheng, Y. Easy-Prime: a machine learning-based prime editor design tool. Genome Biol. 22, 235 (2021).
Chow, R. D., Chen, J. S., Shen, J. & Chen, S. A web tool for the design of prime-editing guide RNAs. Nat. Biomed. Eng. 5, 190–194 (2021).
Hsu, J. Y. et al. PrimeDesign software for rapid and simplified design of prime editing guide RNAs. Nat. Commun. 12, 1034 (2021).
Anderson, M. V., Haldrup, J., Thomsen, E. A., Wolff, J. H. & Mikkelsen, J. G. pegIT—a web-based design tool for prime editing. Nucleic Acids Res. 49, W505–W509 (2021).
Hwang, G.-H. et al. PE-Designer and PE-Analyzer: web-based design and analysis tools for CRISPR prime editing. Nucleic Acids Res. 49, W499–W504 (2021).
Standage-Beier, K., Tekel, S. J., Brafman, D. A. & Wang, X. Prime editing guide RNA design automation using PINE-CONE. ACS Synth. Biol. 10, 422–427 (2021).
Bhagwat, A. M. et al. multicrispr: gRNA design for prime editing and parallel targeting of thousands of targets. Life Sci Alliance 3, e202000757 (2020).
Mathis, N. et al. Predicting prime editing efficiency and product purity by deep learning. Nat. Biotechnol. 41, 1151–1159 (2023).
Yu, G. et al. Prediction of efficiencies for diverse prime editing systems in multiple cell types. Cell 186, 2256–2272.e23 (2023).
Kim, Y. et al. High-throughput functional evaluation of human cancer-associated mutations using base editors. Nat. Biotechnol. 40, 874–884 (2022).
Marquart, K. F. et al. Predicting base editing outcomes with an attention-based deep learning algorithm trained on high-throughput target library screens. Nat. Commun. 12, 5114 (2021).
Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463–480.e30 (2020).
Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 39, 198–206 (2021).
Gould, S. I. & Sánchez-Rivera, F. J. PEGG: a computational pipeline for rapid design of prime editing guide RNAs and sensor libraries. Preprint at bioRxiv https://doi.org/10.1101/2022.10.26.513842 (2022).
Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
Hsu, P. D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31, 827–832 (2013).
Li, X. et al. Chromatin context-dependent regulation and epigenetic manipulation of prime editing. Preprint at bioRxiv https://doi.org/10.1101/2023.04.12.536587 (2023).
Simon, D. A. et al. PEAR, a flexible fluorescent reporter for the identification and enrichment of successfully prime edited cells. eLife 11, e69504 (2022).
Vassilev, L. T. et al. In vivo activation of the p53 pathway by small-molecule antagonists of MDM2. Science 303, 844–848 (2004).
Koeppel, J. et al. Prediction of prime editing insertion efficiencies using sequence features and DNA repair determinants. Nat. Biotechnol. 41, 1446–1456 (2023).
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191 (2016).
Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 15, 554 (2014).
de Andrade, K. C. et al. The TP53 database: transition from the International Agency for Research on Cancer to the US National Cancer Institute. Cell Death Differ. 29, 1071–1073 (2022).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Hong, D. S. et al. KRASG12C inhibition with Sotorasib in advanced solid tumors. N. Engl. J. Med. 383, 1207–1217 (2020).
Salim, K. Y., Maleki Vareki, S., Danter, W. R. & Koropatnick, J. COTI-2, a novel small molecule that is active against multiple human cancer cell lines in vitro and in vivo. Oncotarget 7, 41363–41379 (2016).
Lindemann, A. et al. COTI-2, a novel thiosemicarbazone derivative, exhibits antitumor activity in HNSCC through p53-dependent and -independent mechanisms. Clin. Cancer Res. 25, 5650–5662 (2019).
Liu, X. et al. Small molecule induced reactivation of mutant p53 in cancer cells. Nucleic Acids Res. 41, 6034–6044 (2013).
Song, H. et al. Diverse rescue potencies of p53 mutations to ATO are predetermined by intrinsic mutational properties. Sci. Transl. Med. 15, eabn9155 (2023).
Muller, P. A. J. & Vousden, K. H. Mutant p53 in cancer: new functions and therapeutic opportunities. Cancer Cell 25, 304–317 (2014).
Loizou, E. et al. A gain-of-function p53-mutant oncogene promotes cell fate plasticity and myeloid leukemia through the pluripotency factor FOXH1. Cancer Discov. 9, 962–979 (2019).
Alexandrova, E. M. et al. Improving survival by exploiting tumour dependence on stabilized mutant p53 for treatment. Nature 523, 352–356 (2015).
Schulz-Heddergott, R. et al. Therapeutic ablation of gain-of-function mutant p53 in colorectal cancer inhibits Stat3-mediated tumor growth and invasion. Cancer Cell 34, 298–314.e7 (2018).
Olive, K. P. et al. Mutant p53 gain of function in two mouse models of Li–Fraumeni syndrome. Cell 119, 847–860 (2004).
Lang, G. A. et al. Gain of function of a p53 hot spot mutation in a mouse model of Li–Fraumeni syndrome. Cell 119, 861–872 (2004).
Freed-Pastor, W. A. et al. Mutant p53 disrupts mammary tissue architecture via the mevalonate pathway. Cell 148, 244–258 (2012).
Weissmueller, S. et al. Mutant p53 drives pancreatic cancer metastasis through cell-autonomous PDGF receptor β signaling. Cell 157, 382–394 (2014).
Choe, J. H. et al. Li–Fraumeni syndrome-associated dimer-forming mutant p53 promotes transactivation-independent mitochondrial cell death. Cancer Discov. 13, 1250–1273 (2023).
Gencel-Augusto, J. et al. Dimeric p53 mutant elicits unique tumor-suppressive activities through an altered metabolic program. Cancer Discov. 13, 1230–1249 (2023).
Clore, G. M. et al. High-resolution structure of the oligomerization domain of p53 by multidimensional NMR. Science 265, 386–391 (1994).
Ely, Z. A. et al. A prime editor mouse to model a broad spectrum of somatic mutations in vivo. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01783-y (2023).
Akama-Garren, E. H. et al. A modular assembly platform for rapid generation of DNA constructs. Sci. Rep. 6, 16836 (2016).
Wiznerowicz, M. & Trono, D. Conditional suppression of cellular genes: lentivirus vector-mediated drug-inducible RNA interference. J. Virol. 77, 8957–8961 (2003).
Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).
Aronesty, E. Comparison of sequencing utility programs. Open Bioinformatics J. 7, 1–8 (2013).
Acknowledgements
We dedicate this work to the memory of Jingzhi Zhu—a research computing specialist at the Koch Institute Integrated Genomics and Bioinformatics Core who worked tirelessly to ensure that our community had a robust computing infrastructure. We thank N. Mathey-Andrews, C. L. Prives, M. T. Hemann, J. S. Weissman, L. W. Koblan, M. Kennedy, K. Southard, T. Norman, P. Creixell, S. Lowe and L. Dow for excellent scientific discussions and overall support. We also thank Y. Soto-Feliciano for sharing pCDH-EF1-MCS-IRES-RFP. S.I.G. was supported by T32GM136540 from the National Institutes of Health (NIH)/National Institute of General Medical Sciences and the Massachusetts Institute of Technology (MIT) School of Science Fellowship in Cancer Research. F.J.S.R. is a Howard Hughes Medical Institute (HHMI) Hanna Gray Fellow and was supported by the V Foundation for Cancer Research (V2022-028), National Cancer Institute (NCI) Cancer Center Support Grant P30-CA1405, the Ludwig Center at MIT (2036636), Koch Institute Frontier Awards (2036648 and 2036642), the MIT Research Support Committee (3189800) and the Upstage Lung Cancer Foundation. This work was also supported in part by the Koch Institute Support (core) grant P30-CA14051 from the NCI. We also thank the Koch Institute Swanson Biotechnology Center for technical support, especially the Flow Cytometry Core, the Barbara K. Ostrom (1978) Bioinformatics Facility and the Genomics Facility. V.K.N. is supported by a K12 Paul Calabresi Career Development Award from the NIH/NCI and a Myelodysplastic Syndromes Young Investigator Award from the Edward P. Evans Foundation. A.H. is a National Science Foundation Graduate Research Fellow. A.H. and D.R.L. were supported by NIH U01AI142756, R35GM118062, RM1HG009490 and HHMI. This article is subject to HHMI’s Open Access to Publications policy. HHMI laboratory heads have previously granted a nonexclusive CC BY 4.0 license to the public and a sublicensable license to HHMI in their research articles. Pursuant to those licenses, the author-accepted manuscript of this article can be made freely available under a CC BY 4.0 license immediately upon publication.
Author information
Authors and Affiliations
Contributions
S.I.G. and F.J.S.R. conceived the project and wrote the manuscript. S.I.G. performed all computational analyses and analyzed experimental data. S.I.G., A.N.W., K.D., G.A.J. and O.A. performed experiments. G.A.J. generated Lenti-PEAR-mCherry. V.K.N. generated Lenti-EFS-PEmax-P2A-Puro. S.S.L. assisted in designing sequencing strategy and performing next-generation sequencing. A.H. and D.R.L. provided conceptual and technical advice and comments on the manuscript. F.J.S.R. supervised the work and secured funding.
Corresponding author
Ethics declarations
Competing interests
The authors declare competing financial interests: D.R.L. is a consultant and/or equity owner for Prime Medicine, Beam Therapeutics, Pairwise Plants, Chroma Medicine and Nvelop Therapeutics, companies that use or deliver genome editing or epigenome engineering agents. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Generation of a TP53 prime editing sensor library with PEGG.
a, Schematic of the PEGG pipeline. PEGG takes as input a list of mutations from the cBioPortal, ClinVar identifiers, or custom mutations sets, and produces user-defined pegRNA designs ranked by PEGG score, custom sensor oligos, and visualization tools. PEGG filters pegRNAs with polyT sequences as well as designs containing restriction sites used for cloning. With the library design feature, PEGG can also automate the aggregation of variants located in genes of interest, and automatically design a defined fraction of silent variant-generating pegRNAs. b, Breakdown of the TP53 variants input to PEGG for sensory library design. c, The fraction of input variants amenable to prime editing (that is, able to generate ≥ 1 pegRNA), separated by variant type. d, Histogram of the number of pegRNA designs per variant. e, Histogram of the MIT specificity score of the protospacers for the pegRNAs included in the library. The library was filtered to exclude pegRNAs containing a protospacer with an MIT specificity score less than 50.
Extended Data Fig. 2 Optimization, screening, and deconvolution of the TP53 prime editing sensor library.
a, Assessment of the prime editing activity of A549-PEmax cell lines using a modified all-in-one PEAR reporter, where GFP is turned on in the event of successful prime editing. Three replicates each of A549 WT (gray), A549-PEmax v1 (blue), which underwent a single transduction with EFS-PEmax-P2A-Puro, and A549-PEmax v2 (pink), which underwent multiple rounds of transduction with EFS-PEmax-P2A-Puro, are shown, along with the quantification of GFP-positive cells. b, Cumulative population doublings during the course of the screen of each of the replicates in the untreated and Nutlin-treated conditions. c, Identification and counting of pegRNAs in each replicate and time-point from high quality reads (Q > 30). Correct ID = reads with matching protospacer and 3′ extension. Recombined = reads with mismatched protospacer and 3′ extension. No match = reads with no matching sequence for protospacer or 3′ extension. Unaligned = reads with no identifiable tevopreQ1 sequence. Plasmid = Plasmid Library. d, Count of correctly identified pegRNAs in each replicate and time-point. e, Extraction of the sensor locus from reads with correctly identified pegRNAs. Extracted sensor = sensor read matches pegRNA identification and is thus extracted and saved. Recombined = sensor read does not match pegRNA (discarded). Unaligned = no polyT-tevopreQ1 sequence found. f, Sensor recombination rate as a function of protospacer orientation. When the protospacer is on the positive strand (+) of the sensor (blue), the recombination rate increases compared to when the protospacer is on the negative strand (−) of the sensor.
Extended Data Fig. 3 The TP53 prime editing sensor screen is highly reproducible with low indel rates.
a, Pearson correlation in raw pegRNA counts among each replicate and time-point. Plasmid = plasmid library. b, Spearman correlation in sensor correct editing percentage among each replicate and time-point for pegRNAs with at least 10 sensor reads. c, Median indel percentage among active pegRNAs (≥1% editing) for each time-point and condition. Data are presented as mean values with a 95% confidence interval. d, Boxplot of indel frequency among active pegRNAs (≥1% editing) for each replicate and time-point. Lower quartile = 0% for all replicates (not visible). Boxes indicate the median and interquartile range (IQR) for each sample with whiskers extending 1.5 × IQR past the upper and lower quartiles. N = 3 biologically independent replicates in each condition shown in (c-d). e, Histogram of pegRNA editing efficiency in the Nutlin-treated Day 34 samples. f, Histogram of pegRNA editing efficiency of the most efficient pegRNA for each variant in the Nutlin-treated Day 34 samples. g, Comparative PDF of all pegRNAs and the most efficient pegRNA for each variant.
Extended Data Fig. 4 Training a random forest regressor to predict pegRNA efficiency.
a, A random forest regressor was trained on a restricted set of pegRNA features using 70% of the variants in the untreated condition of Day 16 replicate 1. There was no overlap between the variants used for training and testing. The performance on the held-out test set is shown (spearman correlation = 0.61). b, Assessment of the performance of the random forest regressor in predicting editing activity at each time-point. Again, only variants in the test set are considered. Each dot represents a separate replicate. Spearman correlation between predicted and actual editing is shown in blue, pearson correlation in gray.
Extended Data Fig. 5 Correlation in pegRNA LFC among conditions and time-points.
Each panel is a density plot of the LFC in pegRNAs at each time-point/condition (that is x-axis = LFC of the pegRNAs corresponding with that column’s sample, and y-axis = LFC of the pegRNAs corresponding with that row’s samples). Replicates were merged using MAGeCK to generate a single (median) LFC for each pegRNA at each time-point. Rs = spearman correlation.
Extended Data Fig. 6 Filtration of screening data by sensor editing efficiency.
The LFC of a, pegRNAs ≥ 0% editing b, pegRNAs ≥ 20% editing c, pegRNAs ≥ 40% editing d, pegRNAs ≥ 60% editing, with at least 10 sensor reads at Day 34 relative to Day 4 in the Nutlin-treated condition, with pegRNAs colored by editing efficiency (left) and colored by variant type (right). Selected enriching pegRNAs with FDR < 0.05 labeled and depleting pegRNAs with FDR < 0.05 labeled. Blue = SNV, Green = INS, Purple = DEL, Gray = Silent.
Extended Data Fig. 7 Sensor editing continues over time, independent of selection.
a, Log2 fold change in pegRNA correct editing percentage relative to Day 4, with pegRNAs split into groups by LFC in pegRNA counts at the final time-point. Data shown for pegRNAs ≥ 1% editing and with ≥ 100 sensor reads at all time-points in the untreated condition. LFC calculated using MAGeCK from the median values across n = 3 biologically independent samples. In all boxplots, boxes indicate the median and interquartile range (IQR) for each sample with whiskers extending 1.5 × IQR past the upper and lower quartiles, with outliers indicated with circles. b, Same as (a), but with fold change in editing normalized to Day 16. c, Same as (a), but for the Nutlin-treated condition. d, Same as (b), but for the Nutlin-treated condition.
Extended Data Fig. 8 Analysis of TP53 hotspot variant editing & fitness.
a, Scatterplot of pegRNA sensor correct editing percentage and LFC at Day 34 in the untreated arm of the screen. Top 20 hotspot variant pegRNAs colored in red. Linear regression line displayed. b, Same as (a), but for the Nutlin-treated condition. c, Sensor correct editing percentage for the top 20 most frequently observed hotspot variants, with pegRNAs with a LFC ≥ 2 highlighted in orange, and observed occurrences in MSK-IMPACT dataset shown in right panel. d, Sensor correct editing percentage at Day 34 in the Nutlin-treated condition for the R248 mutants, with pegRNAs with a LFC ≥ 2 highlighted in orange. e, Scatterplot of sensor correct editing percentage at Day 34 in the Nutlin-treated condition for the R248 mutants, plotted against the LFC at the same time-point and condition. Hotspot variants (R248Q & R248W) highlighted in red.
Extended Data Fig. 9 Competition assays functionally validate the pathogenicity of TP53 variants identified with prime editing screens.
a, Detailed information on pegRNAs selected for follow-up evaluation with competition assays. Cells marked with ‘X’ indicate insufficient sensor reads (<10) to determine editing percentage, or insufficient Day 4 control pegRNA counts (<10) to determine LFC. b, Full competition assay results for assayed pegRNAs. Points marked with ‘X’ indicate a replicate with an insufficient viable cell count (<500) to determine the RFP+ %. In this case, the RFP+ % was quantified as unchanged from the previous time-point for the matched replicate. c, Competition assay results for variant-specific therapeutics. In (b-c), data are presented as mean values at each time-point, with a 95% confidence interval.
Extended Data Fig. 10 Comparative analysis of prime editing and cDNA screening datasets of TP53 variants reveals pathogenic variants in the oligomerization domain.
a, The difference in Z-scores between prime editing and cDNA screens (∆Z-score) for pegRNAs ≥ 30% editing, separated by p53 domain. Statistics shown for two-sided t-test with Bonferroni correction. OD vs. DBD (p = 3.255e-50), vs. PRR (p = 3.158e-17), vs. TAD (p = 2.085e-11). * = p-value ≤ 0.05, ** = p-value ≤ 0.01, *** = p-value ≤ 0.001, **** = p-value ≤ 0.0001, ns = not significant (p-value > 0.05). Boxes indicate the median and interquartile range (IQR) for each sample with whiskers extending 1.5 × IQR past the upper and lower quartiles. All z-scores were calculated from each pegRNA’s LFC, in turn calculated using MAGeCK from the median values across n = 3 biologically independent samples. b, Z-scores for variants in the cDNA (red) and prime editing (blue) screens, considering only the most efficient pegRNA for each variant with an editing efficiency ≥ 10%. c, Scatterplot of the cDNA ∆RFP % (day 10) and pegRNA ∆RFP % (day 14) for variants tested with competition assays. rp = Pearson correlation; rs = Spearman correlation.
Supplementary information
Supplementary Information
Supplementary Fig. 1 and Protocols 1–3.
Supplementary Table 1
All oligonucleotide sequences used in the present study, including the full pegRNA library.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gould, S.I., Wuest, A.N., Dong, K. et al. High-throughput evaluation of genetic variants with prime editing sensor libraries. Nat Biotechnol (2024). https://doi.org/10.1038/s41587-024-02172-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41587-024-02172-9