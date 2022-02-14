Abstract
Base editing can be applied to characterize single nucleotide variants of unknown function, yet defining effective combinations of single guide RNAs (sgRNAs) and base editors remains challenging. Here, we describe modular base-editing-activity ‘sensors’ that link sgRNAs and cognate target sites in cis and use them to systematically measure the editing efficiency and precision of thousands of sgRNAs paired with functionally distinct base editors. By quantifying sensor editing across >200,000 editor-sgRNA combinations, we provide a comprehensive resource of sgRNAs for introducing and interrogating cancer-associated single nucleotide variants in multiple model systems. We demonstrate that sensor-validated tools streamline production of in vivo cancer models and that integrating sensor modules in pooled sgRNA libraries can aid interpretation of high-throughput base editing screens. Using this approach, we identify several previously uncharacterized mutant TP53 alleles as drivers of cancer cell proliferation and in vivo tumor development. We anticipate that the framework described here will facilitate the functional interrogation of cancer variants in cell and animal models.
Data availability
All source data (including P values) are available in Supplementary Table 10. Processed screening data is available in Supplementary Tables 1,4,5 and primary sequencing data is available at the Sequence Read Archive (SRA) under accession PRJNA746395.
Code availability
Code for analysis and data visualization is available at: https://github.com/schmidt73/base-editing-analysis, https://github.com/Kastenhuber/AMINEsearch and https://github.com/lukedow/BEsensor
Acknowledgements
We thank D. Solit, Ni. Schultz, M. Berger and B. Gross for access to MSK-IMPACT data, T. Jacks for sharing KP cells, D. Alonso-Curbelo and Dafna Bar-Sagi for sharing PDEC cells, M. Paz Zafra for sharing primers to assess tumor purity, T.M. Norman for conceptual advice and L. Cantley for support and mentorship. We gratefully acknowledge the members of the Molecular Diagnostics Service in the Department of Pathology, the Integrated Genomics Operation and Bioinformatics Core (P30 CA008748) and the Marie-Josée and Henry R. Kravis Center for Molecular Oncology. This work was supported by a project grant from the NIH/NCI (R01CA229773-01A1), P01 CA087497 (SWL), a MSKCC Functional Genomics Initiative (FGI) grant (SWL) and an Agilent Technologies Thought Leader Award (SWL). F.J.S.-R. was supported by the MSKCC TROT program (5T32CA160001), a GMTEC Postdoctoral Researcher Innovation Grant, and is an HHMI Hanna Gray Fellow. B.J.D. was supported by an F31 Ruth L. Kirschstein Predoctoral Individual National Research Service Award (F31-CA261061-01). E.R.K. was supported by an F31 Ruth L. Kirschstein Predoctoral Individual National Research Service Award (F31-CA192835) and is currently supported by NCI R35CA197588, awarded to L. Cantley. A.K. was supported by an F31 Ruth L. Kirschstein Predoctoral Individual National Research Service Award (F31-CA247351-02). J.L. was supported by the German Research Foundation (DFG) and the Shulamit Katzman Endowed Postdoctoral Research Fellowship. S.V.P. was supported by the German Academic Scholarship Foundation. F.M.B. was supported by a GMTEC Postdoctoral Fellowship, an MSKCC’s Translational Research Oncology Training Fellowship (5T32CA160001-08), and a Young Investigator Award from the Edward P. Evans Foundation. K.M.T. is supported by the Jane Coffin Childs Memorial Fund for Medical Research. D.C. and H.Z. acknowledge funding from the MSKCC Marie-Josée and Henry R. Kravis Center for Molecular Oncology for supporting OncoKB. S.W.L. is the Geoffrey Beene Chair of Cancer Biology and an Investigator of the Howard Hughes Medical Institute. L.E.D. is the Burt Gwirtzman Research Scholar in Lung Cancer at Weill Cornell Medicine. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Ethics declarations
Competing interests
L.E.D. is a scientific advisor and holds equity in Mirimus Inc. and is a consultant for Volastra Therapeutic and Fraizer Healthcare S.W.L. is an advisor for and has equity in the following biotechnology companies: ORIC Pharmaceuticals, Faeth Therapeutics, Blueprint Medicines, Geras Bio, Mirimus Inc. and PMV Pharmaceuticals. S.W.L. acknowledges receiving funding and research support from Agilent Technologies for the purposes of massively parallel oligo synthesis. The remaining authors declare no competing interests.
Additional information
Extended data
Extended Data Fig. 1 BE efficiency for mouse sgRNAs in the APS library.
C > T editing efficiency (%) at each APS library mouse target site across base editor enzymes, as indicated. Cas9 and Cas9-NG serve as nuclease controls. Rows denote sgRNAs; columns denote PAM subclass.
Extended Data Fig. 2 Cancer somatic mutation-derived base editing sensor libraries.
(a) Number of unique recurrent SNVs per gene, ordered by mutation frequency of gene. Bars are split to indicate proportion of SNVs targeted (red) or not (black) in the HBES library. (b) Focality of mutations by cancer gene classification. Number of cumulative mutations observed in recurrent sites with respect to the number of unique SNVs observed per gene. Oncogenes are indicated by red dots and tumor suppressor genes are indicated by blue dots. Mutations in oncogenes tend to be more focal on distinct hotspot sites, with greater number of recurrent mutations per unique SNV allele (11.1 vs 6.2 mutations per unique recurrent SNV, p = 0.011, two-tailed t-test). (c) Venn diagram of sgRNAs in HBES library compatible with each base editor configuration. (d) Venn diagram of sgRNAs in MBES library compatible with each base editor configuration. (e) SNV-level annotation with each color bar sorted in order of observed mutation frequency (top). NV characteristics are indicated, including oncogenic function (OncoKB assertion of oncogenic/Likely oncogenic/VUS) and therapeutic implications (OncoKB highest level of evidence for drug sensitivity or resistance) {Chakravarty, 2017 #76;Chakravarty, 2021 #105}.
Extended Data Fig. 3 Off-target editing predictions for base editing sensor libraries.
(a) For sgRNAs in HBES library, distribution of potential off-target (OT) sites identified by PAM specificity and extent of mismatch. (b) Number of sgRNAs in HBES library targeting the human genome with 0 (white) and 1 or more (black) predicted OT sites depending on SPCas9 or Cas9-NG PAM specificity. A greater number of sgRNAs have no predicted OT sites used in conjunction with SpCas9 than with Cas9-NG. p < 2.2e-16, 2-sided Fisher’s exact test. (c) For sgRNAs in HBES library, distribution of potential OT sites identified by PAM specificity and extent of mismatch. (d) Number of sgRNAs in MBES targeting mouse genome with 0 (white) and 1 or more (black) predicted OT sites depending on SPCas9 or Cas9-NG PAM specificity. A greater number of sgRNAs have no predicted OT sites used in conjunction with SpCas9 than with Cas9-NG. p < 2.2e-16, 2-sided Fisher’s exact test. (e) Distribution of not-target editable bases (C for CBE) within the editing window for HBES library targeting human genome. (f) Distribution of not-target editable bases (C for CBE) within the editing window for MBES library targeting mouse genome.
Extended Data Fig. 4 Comparison of editing range (editing window) across FNLS, F2X, and FNLS-NG base editors as a function of dinucleotide context.
Plots represent the mean normalized BE editing efficiency for each base editor (FNLS = yellow, F2X = blue, FNLS-NG = gray) across 5 cell lines (rows) and 4 dinucleotide contexts (columns). Area shaded in grey denotes maximum editing range in each condition where normalized BE is above 30% (dotted line).
Extended Data Fig. 5 Correlation of sgRNA efficiency ranking.
Plots represent correlation of individual sgRNA efficiency rankings between MDA-MB-231 and NIH3T3, KPT1, and PDEC cells, as indicated. To reduce noise created by low efficiency sgRNAs, only HBES sgRNAs that had >1% activity in the sensor were included. Pearson correlation coefficients are shown; for all comparisons, p < 2.22 e-16.
Extended Data Fig. 6 Indel and BE correlation across cell lines.
Correlation of indel and C > T editing frequencies for all sgRNAs in the HBES library across 5 screen cell lines. Pearson correlation coefficients were calculated using ggpubr(0.4.0) package in R, the p value represents the significance of two-sided t-test.
Extended Data Fig. 7 Non-canonical cytosine editing identified by BE Sensor.
(a) Dotplots show percent C>T and C>G editing for individual target cytosines in the HBES library across three BE enzymes (FNLS, F2X and FNLS-NG) and two cell lines (MDA-MB-231 and PC9). Scales on x and y axes are not the same; dotted lines indicate 1:1 ratio (b) Ratio of C > G/C > T editing in FNLS-MDA-MB-231 cells transduced with the HBES library classified by dinucleotide context (fill) and trinucleotide context (column). Data includes all base editors (FNLS, F2X and FNLS-NG) and is filtered for sgRNAs that show more than 5% C > T editing in the sensor assay. Boxplots show the median and interquartile range (IQR) and whiskers represent 1.5*IQR. Outliers are shown as individual points. ns indicate p > 0.05; p values were determined with two-sided Wilcoxon signed rank test. Complete list of all comparisons is available in Supplementary Table 10g. (c) Schematic of (C > G) reporter developed by modifying the GO (C > T) reporter.
Extended Data Fig. 8 In vivo validation of cancer-associated TP53 missense mutations using BE.
(a) Survival analysis of mice transplanted with F2X-expressing PDECs transduced with specific Trp53-targeting base editing sgRNAs. N = 5 mice per sgRNA per mutation. (b) Frequency of target C > T editing in tumors from transplanted mice. Each individual point represents a single isolated tumor (n = 3+ per sgRNA) Target C > T editing was measured by next generation sequencing of amplified target loci and data was analyzed using CRISPResso2. Data are presented as + /- SD. (c) In vivo validation of M237I and C135Y mutations via orthotopic transplantation of FNLS-expressing PDECs transduced with sgRNAs designed to introduce the corresponding mutations in the mouse Trp53 gene (M234I and C132Y, respectively). N = 5 mice per mutation. (d) Representative macroscopic (left) and microscopic (right; H&E) images of pancreatic tumors isolated from mice transplanted with FNLS-expressing PDEC cells transduced with specific Trp53-targeting base editing sgRNAs. (e) Representative Sanger sequencing traces from tumors in (d). Red arrows denote target cytosines that, when mutated to thymine, give rise to the corresponding amino acid changes in the p53 protein. Nucleotide triplets on the right denote the precise mutational events that give rise to mutant p53 proteins. * p ≤ 0.05, ** p ≤ 0.01. P-values were calculated using the log-rank test.
Extended Data Fig. 9 Classification of screen hits by OncoKB.
(a) sgRNAs from the MBES proliferation screen were binned by categories: i) all sgRNAs; ii) sgRNAs depleted by <1.5 LFC and exhibiting 20% editing at the sensor; iii) sgRNAs enriched by >1.5 LFC; or iv) sgRNAs enriched >1.5 LFC and exhibiting 20% editing at the sensor followed by calculation of the percentage of each OncoKB classification. P-values indicate two-sided Fisher’s exact test comparison of the frequency of known or likely oncogenic mutations in each subset. (b) Bubble plot comparing sgRNA log fold changes with mean frequency of C > T editing in the sensor target site between days 5 and 30 post-transduction. Bubbles were colored by their OncoKB classification. Size denotes MaGeCK score (see Supplementary Table 6d).
Extended Data Fig. 10 Expanded base editing predictions.
(a) We used the MSK-IMPACT clinical tumor sequencing dataset and the characteristics of commonly used base editors to inform the design of base editing sensor libraries used in the experiments in Figs. 3–6. These results are available in the Shiny web portal (https://dowlab.shinyapps.io/BEscan/). Using updated and expanded versions of MSK-IMPACT sequencing data, base editing configurations, and AMINEsearch v2, we generated an exploratory set of sgRNA and sensor predictions, which are also available in the Shiny web portal. The more recent version of MSK-IMPACT contains increased numbers of (b) tumors sequenced, (c) total SNVs observed, and (d) candidate unique recurrent SNVs. These factors in the input led to to an increase in the exploratory set (v2) compared to the HBES and MBES libraries (v1) in respect to (e) Cas variants (determining PAM recognition) and base editor variants (determining editing window), collectively making base editor configurations with distinct properties (f). These factors in the input led to to an increase in the exploratory set (v2) compared to the HBES and MBES libraries (v1) in respect to (g) number of sgRNAs designed and (h) unique SNVs targeted by one or more sgRNAs in the database.
Rights and permissions
