Universal toxin-based selection for precise genome engineering in human cells

Prokaryotic restriction enzymes, recombinases and Cas proteins are powerful DNA engineering and genome editing tools. However, in many primary cell types, the efficiency of genome editing remains low, impeding the development of gene- and cell-based therapeutic applications. A safe strategy for robust and efficient enrichment of precisely genetically engineered cells is urgently required. Here, we screen for mutations in the receptor for Diphtheria Toxin (DT) which protect human cells from DT. Selection for cells with an edited DT receptor variant enriches for simultaneously introduced, precisely targeted gene modifications at a second independent locus, such as nucleotide substitutions and DNA insertions. Our method enables the rapid generation of a homogenous cell population with bi-allelic integration of a DNA cassette at the selection locus, without clonal isolation. Toxin-based selection works in both cancer-transformed and non-transformed cells, including human induced pluripotent stem cells and human primary T-lymphocytes, as well as it is applicable also in vivo, in mice with humanized liver. This work represents a flexible, precise, and efficient selection strategy to engineer cells using CRISPR-Cas and base editing systems.

G ene and cell therapy offer new modalities in medicine for a wide range of diseases [1][2][3][4] . Despite the potential of such therapies, technical challenges-such as the precision and low efficiency of current genomic engineering tools 5,6 -restrict their development and clinical application. While the accuracy of genome editing and the ability to predict off-target effects has been considerably improved, the efficiency of genetic engineering in somatic cells, especially precise substitutions and gene insertions, remains generally low, limiting potential therapeutic applications [7][8][9][10][11] . Therefore, there is an urgent need for an approach that specifically selects only those cells with the desired genomic modification.
Cells proficient for genomic editing at one locus are more likely to be proficient for editing events at another locus 12,13 . This principle provides the basis for co-selection of genomic modifications at the desired locus, in combination with an external or endogenous selection marker. Editing of a selection marker can produce a detectible signal or a growth advantage for the edited cell. Cells that undergo simultaneous editing at the selection locus and a second-site targeted locus are typically enriched by fluorescent reporter-based sorting or resistance to specific cytotoxic reagents [14][15][16][17][18] . This co-selection strategy has been used in cells from organisms ranging from Caenorhabditis elegans to humans [19][20][21][22][23] . However, most previous methods involved introducing a random set of insertions or deletions (indels), and few studies involved co-selection of precise DNA substitutions. These, in turn, often introduced unsupervised and risky modifications to the engineered cells 21,24 . A selection method that (1) does not require an external selection marker, (2) specifically eliminates non-edited cells without side effects on edited cells, and (3) introduces precise (safe) modification at the selection locus is still lacking.
Bacterial toxins have high selectivity and potency in eliminating plant and animal cells 25 . Unlike small molecules that freely diffuse through membranes, penetrating all cells, most bacterial toxins are large molecules that enter cells via a specific receptor. Typically, such bacterial toxins consist of two domains: one domain recognizes a specific membrane receptor and mediates endocytosis and translocation, and the second domain executes cytotoxic functions inside the targeted cell [26][27][28] . This modular structure allows the uncoupling of specificity from toxicity 29 . The diphtheria toxin (DT) from Corynobacterium diphtheriae 30 is composed of domain B (DT-B) that binds to the membraneembedded form of heparin-binding EGF-like growth factor (HBEGF), and mediates endocytosis and translocation of DT. Once DT enters the cytoplasm, domain A (DT-A) inactivates translation elongation factor 2, causing cell death 30 . DT exhibits toxicity in most mammalian species, with the exception of mice and rats 31 . The resistance of these organism stems from impaired DT binding to their HBEGF homologs, due to differences in amino acid sequence 32 . Introduction of analogous amino acid substitutions from mouse to human HBEGF (hHBEGF) prevents its binding to DT and establishes resistance to the toxin 33 .
In this work, we exploit the interaction between DT and HBEGF to develop a universal selection strategy that depletes only those cells that have not introduced the desired genome modifications. Our approach specifically protects the engineered cells, blocking the entry of the lethal toxic molecule. We find that by introducing DT-resistant mutations into HBEGF and selecting for edited cells with DT, we observe a substantial increase in a simultaneous, second-site gene editing event in these cells. This principle holds true for a variety of genome engineering events mediated by DNA base editors and Cas9 nuclease, including HBEGF locus-specific biallelic insertion of a DNA cassette. Finally, we demonstrate that our DT-HBEGF selection system is applicable both in vitro, in therapeutically relevant cell types, such as human-inducible pluripotent stem cells (hiPSCs) and primary human T cells, as well as in vivo in mice with a humanized liver.

Results
HBEGF locus mutagenesis generating diphtheria toxin resistance. DT interacts with the EFG-like domain of HBEGF 2 . Replacement of the mouse EGF-like domain with corresponding human domain causes mouse cells to become sensitive to DT 2 . To induce mutations in the human EGF-like domain that would render human cells insensitive to DT, we used the DNA base editors cytidine base editor 3 (CBE3) and adenosine base editor 7.10 (ABE7.10; Fig. 1a) 34,35 . We designed 14 sgRNAs spanning the amino acids that differ between mouse and human at this locus (Fig. 1b). Each sgRNA was transiently co-expressed in HEK293 cells with either CBE3 or ABE7.10 to introduce C-to-T or A-to-G mutations, respectively 34,35 . The transfected cells were treated with a DT dose that elicits cell death unless the interaction with HBEGF has been disrupted. We monitored cell proliferation and observed that cells transfected with CBE3 and sgRNA7 or 10, and ABE7.10 with sgRNA5 or 6 continued to proliferate despite the presence of the toxin in the cell culture medium (Fig. 1c). The cells transfected with other combinations of plasmids, as well as the control cells did not survive the treatment (Fig. 1c).
We chose resistant cells transfected with CBE3/sgRNA10 or ABE7.10/sgRNA5, and sequenced the targeted HBEGF locus (Fig. 1d). Mutations introduced by CBE3/sgRNA10 resulted mainly in the substitution of glutamate 141 for lysine (Glu141Lys) in HBEGF, whereas~90% of the mutations introduced by the ABE7.10/sgRNA5 combination elicited a substitution of tyrosine 123 to cysteine (Tyr123Cys; Fig. 1d and Supplementary Fig. 1a). HBEGF Glu141 plays a key role in the DT-HBEGF interaction and the Glu141His substitution abolishes DT sensitivity, when expressed in mouse cells ( Supplementary  Fig.1a) [36][37][38] . HEK293 cells edited to express HBEGF Glu141Lys or HBEGF Tyr123Cys show wild-type levels of proliferation (Supplementary Fig.1c). We detected noticeable levels of indels induced by CBE3 and to a lesser extent with ABE7.10 ( Supplementary  Fig.1b), as previously observed 34,39 . Thus, the substitution of a single amino acid in hHBEGF is sufficient to prevent DT toxicity, suggesting this method could be used to select for genome editing events at the HBEGF locus.
Enrichment of cytidine and adenosine base editing using DT selection. Our approach selects for the survival of cells that are proficient for base editing. We therefore asked if such selection favors simultaneous base editing at another, unrelated genomic locus (co-selection 40 ; Fig. 2a). Using CBE3/sgRNA10, we tested for co-selection with sgRNAs targeting five independent genomic loci: DPM2 (dolichyl-phosphate mannosyltransferase subunit 2), EGFR (epidermal growth factor receptor), EMX1 (empty spiracles homeobox 1), PCSK9 (proprotein convertase subtilisin/kexin type 9), and DNMT3B (DNA methyltransferase 3 beta). The sgRNAs targeting DPM2 and PCSK9 were designed to introduce a premature stop codon 8 , and the sgRNA targeting EGFR was designed to generate a drug-resistant mutation in EGFR 15 . After cotransfection, we extracted genomic DNA from cells either exposed or not exposed to DT and analyzed the sgRNA-targeted DNA sequence composition. We observed a substantial increase in the C-T conversion rate across all tested sites in DT-selected cells (~4-7-fold), compared to nonselected cells (Fig. 2b). DT coselection with CBE3 in other cancer cell lines also yielded increased editing efficiency. Using our strategy, we obtained ã 13-fold increase in the C-T substitution rate at the PCSK9 locus in HCT116 cells and a~5-fold increase at the integrated BFP transgene in PC9 cells ( Supplementary Fig. 2). We subsequently tested if DT co-selection applies to the latest version of CBE, CBE4max 41 , and found a significant improvement in C-T conversion across three tested targets in DT-selected cells (~4-7-fold; Supplementary Fig. 3a).
To determine if ABE7.10/sgRNA5 editing also promotes coselection, we tested it with five sgRNAs targeting: EMX1, CTLA4 (cytotoxic T-lymphocyte-associated protein 4), IL2RA (interleukin 2 receptor subunit alpha), and two different sites in the AAVS1 locus (adeno-associated virus integration site 1). Analyzing the DNA sequences targeted by the sgRNAs with or without the DT treatment, we observed a substantial increase in A-G conversion across all tested targets in DT-selected cells, ranging from~6 to~13-fold (Fig. 2c). DT co-selection with the latest ABE version ABE8e (ref. 42 ) further improved the editing efficiency (~2-fold; Supplementary Fig. 3b).
We asked if our approach was able to co-select for genome modifications, such as insertions and deletions (indels), generated by the Streptococcus pyogenes Cas9 nuclease (SpCas9). We tested whether SpCas9 guided by sgRNA10 to introduce indels in the HBEGF locus would promote co-selection, using the four sgRNAs targeting DPM2, EMX1, PCSK9, and DNMT3B (above). Transfected cells subjected to DT treatment showed increased indel rates (>90%) at all four targets (DPM2, EMX1, PCSK9, and DNMT3B; Fig. 2d). Thus, DT-HBEGF selection is able to enrich for a range of genome editing events without the need for an external selection marker.
Enrichment of DNA insertion at the HBEGF locus. A major limitation of Cas9-mediated genomic engineering is the low efficiency with which targeted DNA insertion (DNA knock-in) is generated through homology directed repair (HDR) [43][44][45][46][47] and non-homologous end-joining (NHEJ) 46,[48][49][50] . To test whether our DT-HBEGF system could select for genomic DNA insertions without the use of an external selection marker 40 , we modified our DT selection strategy. Our idea was to engineer a DNA template for the targeted insertion in the intron 3 of the HBEGF gene. SpCas9 nuclease programmed with sgRNAin3 to target the intron 3 would confer resistance to DT only if the DNA template was integrated at this site, as it contains a splicing acceptor sequence followed by the HBEGF cDNA with all exons downstream of exon 3, together with a mutation that confers insensitivity to DT (Fig. 3a). In this modified DT selection strategy, we used the Glu141Lys substitution in the HBEGF cDNA (Fig. 1d).
Because the soluble form of HBEGF acts as a ligand for EGFRs, activating downstream signal pathways 51 , we therefore determined whether the HBEGF Glu141Lys protein might perturb this signaling. We purified soluble, recombinant wild-type HBEGF and HBEGF Glu141Lys from bacteria ( Supplementary Fig. 4a), and assayed their ability to activate the mitogen-activated protein kinase (MAPK)/extracellular signal-regulated kinase signaling pathway downstream of EGFR in serum-starved HEK293 cells. The addition of either wild-type or mutant recombinant protein to the cell culture medium resulted in similar levels of signaling, Fig. 1 Base editing at the HBEGF locus induces resistance to diphtheria toxin. a Schematic of our toxin-based selection scheme. b sgRNA sites targeted by CBE3 or ABE7.10, and used to screen for mutations in HBEGF that elicit resistance to DT. cDNA is the DNA sequence of the EGF-like domain of human HBEGF; hHBEGF is the corresponding amino acid sequence; mHBEGF is the aligned amino acid sequence of the mouse HBEGF homolog. Matching amino acids in mHBEGF are shown by a dot; unmatched amino acids are annotated. sgRNAs highlighted in red and blue were chosen to introduce DT-resistant mutations with CBE3 and ABE7.10, respectively. c Heatmap presenting the viability of HEK293 cells after DT selection for the depicted combination of base editors and sgRNAs. d  Fig. 4b) 51 , indicating that HBEGF Glu141Lys similarly to wildtype HBEGF.
We next designed DNA fragments that would serve as DNA repair templates to introduce (a) a DT-resistant HBEGF and (b) couple its expression to the expression of a reporter gene encoding a red (mCherry) or green fluorescent protein (GFP) using self-cleaving peptides 52 . We assayed different DNA template donors for targeted insertion, including plasmid DNA, double-strand DNA (dsDNA), and single-strand DNA (ssDNA). We also examined the main modes of DNA repair that direct the DNA insertion, because we used DNA fragments with or without homology arms or flanking sgRNA cutting sites. This design allows us to promote integration into HBEGF locus by homologous recombination (HR) 43,47,53 , NHEJ 50 , or homologymediated end joining (HMEJ) 54 (Fig. 3b). Each template was cotransfected with plasmids encoding SpCas9 and sgRNAin3 into HEK293 cells to generate cells with the inserted DNA fragment in intron 3 of the HBEGF gene. Since the expression of mCherry or GFP is coupled to the HBEGF gene, we expected only cells with the correct insertions to express fluorescent proteins. We therefore quantified fluorescent cells by flow cytometry. The number of the mCherry-or the GFP-positive cells increased substantially after DT selection in all experimental variants tested, regardless of the DNA template used (Fig. 3b). In particular, the plasmid template containing homology arms and sgRNA cutting sites (pHMEJ) or the plasmid template containing only homology arms (pHR) achieved nearly 100% of knock-in after the DT selection (Fig. 3b). We compared the precision of the DNA knock-in that were derived from pHR or pHMEJ. Genotyping strategy with PCR confirmed a dominant band corresponding to HDR repair in both samples ( Supplementary Fig. 5). For pHMEJ specifically, we detected an additional, faint band indicating NHEJ-mediated repair ( Supplementary Fig. 5).
Cells resistant to DT after base editing showed biallelic mutation at the HBEGF locus (Fig. 1b). We therefore reasoned that cells surviving DT treatment have the biallelic insertion of the DNA cassette, given that one intact HBEGF allele would sensitize cells to the toxin. In order to test this hypothesis, we designed two pairs of PCR primers, one pair amplifying the 5′ junction of the knock-in sequence (PCR1) and the other pair amplifying the wild-type sequence of HBEGF intron 3 (PCR2;  We performed the PCR analysis on cells repaired with the pHMEJ template either with or without DT selection. Both samples showed a band for homologous knock-in (PCR1); however, we only detected the wild-type band in the nonselected samples, and not the DT-selected samples (Fig. 3d), indicating that cells showed biallelic knock-in after DT selection. We also analyzed the cells using flow cytometry and measured the fluorescence of mCherry. DT selection yielded a highly homogenous population of mCherry-positive cells, unlike the nonselected control ( Supplementary Fig. 4c).
We compared our DT selection method to antibiotic-based selection methods used to enrich for cells with genomic integration of a transgene. We designed an alternative pHMEJ template that included DT-resistant HBEGF Glu141Lys , a  commonly used puromycin-resistance gene and mCherry (Fig. 3e). We tested this DNA template for genomic insertion by selecting cells with either DT or puromycin, followed by flow cytometry analysis. Both populations consisted of nearly 100% mCherry-positive cells, however, DT-selected cells showed a substantially higher mean fluorescence intensity compared to puromycin-enriched cells (Fig. 3e). Genotyping the HBEGF locus for the presence of the transgene insertion revealed that DT selection resulted in biallelic insertion (Fig. 3f), unlike the puromycin-selected cells, which contained monoallelic insertions.
Collectively, these data demonstrate that the DT-HBEGF system enables the selection of precise genomic integration events and that the HBEGF locus constitutes a selectable locus for such biallelic genomic integration. With only minimal modifications introduced to the human genome, the DT-HBEGF selection system provides an efficient alternative for the generation of a genetically homogenous population of cells. We name the system "Xential" for "recombination (X) at a locus conditionally essential for cell survival" hereafter.
Enrichment of knock-out and knock-in by Xential. Several recent studies have demonstrated co-selection of CRISPR-Cas9 nuclease editing events using endogenous genes 14,21 . However, they rely on introducing random mutations in the gene used for enrichment, yielding a heterogenous population of cells, raising safety concerns if applied in a therapeutic setting. Because Xential should generate a homogeneous population of cells with a precise insertion in HBEGF intron 3, we reasoned that it could also provide an alternative to co-select for both knock-out and knockin events at another genomic locus. To test this idea, we assayed the enrichment of Xential-coupled knock-out events, using the four sgRNAs targeting DPM2, EMX1, PCSK9, and DNMT3B (Fig. 2d). Each sgRNA was co-transfected with SpCas9, sgRNAin3, and the pHMEJ template into HEK293 cells, followed by DT selection and subsequent genomic DNA analysis (Fig. 4a). We observed a substantial improvement in the editing efficiency for all targets in the DT-selected cells compared to the nonselected cells (~4-14-fold; Fig. 4b). All DT-resistant cells maintained the expression of mCherry ( Supplementary Fig. 6a).
To test the use of Xential for co-selection of knock-in events at other genomic loci (Fig. 4c), we introduced a C-terminal GFP tag into a gene encoding histone protein H2B (HIST1H2BC). We designed pHR and pHMEJ DNA repair templates for both HBEGF and HIST1H2BC loci, and co-transfected each of them separately with SpCas9 and sgRNAs into HEK293 cells. The knock-in efficiency was analyzed using flow cytometry by calculating the percentage of GFP (HIST1H2BC-GFP) or mCherry (HBEGF-2a-mCherry). Regardless of the DNA template donor, we obtained substantially improved knock-in efficiency after DT selection (Fig. 4d). For the pHR template, Xential improved co-selection efficiency sixfold. For the pHMEJ template, the efficiency increased up to~5-fold, reaching~50% overall (Fig. 4d). By increasing the ratio of the sgRNA and DNA template for tagging the HIST1H2BC locus to that of editing HBEGF locus, we were able to further improve the efficiency of knock-in after DT selection ( Fig. 4d and Supplementary Fig. 6b), suggests that knock-in efficiency at targeted locus mediated by Xential is dosedependent.
We next investigated if the coedited cells are more prone to genomic translocation. To this end, we employed a droplet-digital PCR (ddPCR) assay designed to detect the monocentric translocation from the HIST1H2BC to the HBEGF locus. When only two sgRNAs were used, we observed an increase in genomic translocation in-between the loci. Interestingly, a combination of the two sgRNAs with corresponding DNA repair templates (Xential) vastly suppressed the translocations. This data suggests that the Xential co-selection reduces genomic rearrangements, therefore, provides a safety advantage over previous indels-based co-selection systems ( Supplementary Fig. 6c).
To determine if Xential could be used to co-select for the insertion of small DNA fragments, such as oligonucleotides, we tested Xential co-selection for knock-in of a DNA oligo at the CD34 locus. We observed an increase in the percentage of knockin cells after DT selection (>35-fold), suggesting Xential improves the integration of DNA cassettes regardless of their size and form ( Supplementary Fig. 6d). Thus, Xential promotes precise DT-resistant modification of the HBEGF locus, allowing the introduction of two genes of interest simultaneously, or the introduction of one gene of interest together with a second gene knock-out event.
Enrichment of base editing and DNA insertions in human iPSCs. Having demonstrated the effectiveness of Xential coselection in HEK293 cells, we asked if it works equally well in non-cancer-transformed cells. We chose hiPSC, because of their relevance to disease modeling, their therapeutic potential, and the difficulty of genome manipulation [55][56][57] . We used two sgRNAs for co-selection with CBE and ABE, one targeting EMX1, a locus widely tested in genome editing research, and the other targeting CTLA4, a gene studied for its role in immune signaling 58 . We optimized the experimental timeline for hiPSC, shortened it to 7 days from transfection to the derivation of DT-resistant cells (Fig. 5a). We transiently co-transfected the sgRNAs together with CBE3/sgRNA10 or ABE7.10/sgRNA5 into hiPSCs, and analyzed the targeted genomic DNA sequence. We observed a substantial increase in editing efficiency upon DT selection (greater thañ 20-fold) at all tested sites for both CBE and ABE. For example, the hiPSCs resistant to DT showed~90% CBE3 modified reads at the EMX1 locus (from 5% for control cells) and~20% ABE7.10 modified reads at the CTLA4 locus (0.8% in control cells; Fig. 5b, Fig. 3 Enrichment of DNA knock-in at the HBEGF locus. a Schematic of the knock-in enrichment strategy. b The knock-in of various templates (left) and their corresponding efficiencies (right). The mCherry/GFP percentage of each sample was analyzed by flow cytometry with (DT) or without (untreated) DT selection. Repair templates were designed to be incorporated into the targeted site through homology-mediated end joining (pHMEJ and dsHMEJ), homologous recombination (pHR, dsHR, ssHR, and dsHR2), or nonhomologous end joining (pNHEJ). These templates were provided as plasmids (pHMEJ, pHR, or pNHEJ), double-stranded DNA (dsHR, dsHMEJ, and dsHR2), or single-stranded DNA (ssHR). c Schematic of the genotyping strategy. The PCR1 primer pair detect the insertion; PCR2 detects wild-type cells in the population. d PCR analysis of cell populations obtained from experiment (b), representative results were shown from three independent biological replicates. e Comparison of puromycin and DT-enriched knock-in populations. Upper panel: the repair template consists of a puromycin resistant gene and a mCherry gene linked to the mutated HBEGF gene. The lower left panel shows the representative mCherry histogram of edited HEK293 cell populations without or with different treatments. Neg control represents cells transfected with control sgRNA (no target loci), instead of sgRNAin3. Cells were analyzed by flow cytometry. The lower right panels show corresponding knock-in efficiencies and mean fluorescence intensities of each population. f PCR analysis of each population of cells obtained from experiment (e), representative results were shown from three independent biological replicates. The presented values and error bars reflect mean ± s.d. of n = 2 or 3 independent biological replicates. GOI gene of interest, T2A T2A self-cleaving peptide, SA splicing acceptor, HA homology arm, pA polyA sequence. Source data of Fig.  3b, d, e, f are provided as a Source data file. c). Similar to our experiments in cancer cells, DT co-selection further improved the efficiency of the latest base editors CBE4max and ABE8e also in hiPSC (>74% modified reads for all tested sites, Supplementary Fig. 7a, b).
To test if Xential facilitates DNA insertion in the genome of hiPSCs, we co-transfected hiPSCs with SpCas9, sgRNAin3, and the pHMEJ template containing the DT-resistant HBEGF variant linked to mCherry. Flow cytometry revealed that 25% of cells were mCherry positive in the absence of selection. This number increased to nearly 100% after DT selection (Fig. 5d). We corroborated this data using genotyping PCR to detect the DNA insertion and the wild-type HBEGF intron 3 (Fig. 5e). We could not detect any residual wild-type band in the targeted HBEGF after DT selection, suggesting most of the selected pool of hiPSCs contain biallelic insertions of the DNA cassette (Fig. 5e). All sgRNAs used for DT selection (sgRNA5, sgRNA10, and sgRNAin3) were analyzed in silico and in vitro for off-target sites in hiPSC (Supplementary Table 1). We detected less than 0.1% of modifications (0.1% as NGS detection of limit) 59 at all selected sites that confirmed specificity of all sgRNAs used for co-selection ( Supplementary  Fig. 8).
To address the impact of the HBEGF E141K on the differentiation of hiPSC, we differentiated the Xential-modified hiPSC to three germ layers: mesoderm, endoderm, and ectoderm. We monitored the expression of key pluripotency-and lineageassociated genes 60 , and did not observe any statistically significant changes between the Xential hiPSC and the wild-type hiPSC ( Supplementary Fig. 9).
Finally, we set out to demonstrate the translational potential of our Xential method. To this end, we chose to install a safety switch with Xential in hiPSC 61 . We engineered the pHMEJ plasmid to encode thymidine kinase from Herpes Simplex Virus (HSV-TK), and inserted it with Xential method to hiPSC. After DT selection and cell expansion, we tested the sensitivity of the cell pool to ganciclovir-a synthetic substrate for the viral TK that ultimately inhibits DNA replication 61 . The cells expressing HSV-TK did not survive the ganciclovir treatment (>1 µM), unlike the control cells expressing mCherry (Fig. 5f, g).
Enrichment of base editing in primary T cells. Genome editing is being used to explore the therapeutic potential of primary T cells, a highly clinically relevant cell type 62 . However, genome editing in T cells suffers from low efficiencies 10,11 . To determine whether the DT-HBEGF selection system could be used to facilitate the engineering of primary human T cells, we tested for co-selection of cytidine base editing events. We designed three sgRNAs that introduce premature stop codons into PDCD1 (programmed cell death protein 1), CTLA4, and IL2RA, all of which are involved in immune regulation 11,58,63 . Each sgRNA was co-electroporated with purified CBE3 protein and synthetic sgRNA10 as an assembled ribonucleoprotein complex (RNP) into isolated CD4 + T cells. Analysis of the targeted genomic loci showed~1.7-fold increase in base editing efficiency after DT  selection, compared to nonselected cells (Fig. 6). Thus, the DT-HBEGF selection system works effectively in non-cancertransformed cells, such as human iPSCs and primary T cells, suggesting it may offer a method for enriching engineered cells for therapeutic purposes.

Enrichment of base editing events in vivo by co-selection.
Transgenic mice expressing human HBEGF under a tissuespecific promoter have been developed as a tool to study tissue/ cell function in vivo 64 , given DT ablates only the cells expressing the hHBEGF protein 64 . This transgenic mouse model enabled us to test DT-HBEGF selection in vivo. We used DT-based coselection with CBE3 in a humanized mouse model expressing hHBEGF under the liver cell-specific albumin promoter. As a target for genome editing, we choose the mouse Pcsk9 gene, using a previously validated sgRNA that introduces a premature stop codon 8 . The Pcsk9 sgRNA was delivered together with CBE3 and the sgRNA targeting human HBEGF using adenovirus AdV8 (Fig. 7a). Two weeks after AdV8 injection mice were treated with DT and divided into two groups. The control, non-enriched background was sacrificed at 24 h, before DT toxicity is observed 64 . The enriched group was sacrificed 4-11 days after DT treatment ( Fig. 7a and Supplementary Fig. 10) terminated at 11 days presented mild-to-moderate liver damage, histologically. Genomic DNA extracted from liver tissues showed 2.5-2.8-fold increase in base editing efficiency at the human HBEGF transgene and at the Pcsk9 gene, compared to control mice, indicating that co-selection of Cas9-driven genome editing events can be achieved using a bacterial toxin in vivo.  In this study, we leveraged the specific interaction between DT and its receptor HBEGF, and the toxin's potency in inducing cell death 30 to develop a powerful co-selection genome editing system. Our approach relies on the DNA editing of the HBEGF gene, by Cas9 or DNA base editors, to induce mutations that prevent the toxin-receptor interaction, such that the toxin can no longer be taken up by or kill the edited cell. HBEGF editing co-selects for second-site editing, presumably because edited cells are permissive to DNA manipulation at other loci.
In comparison with other selection methods, our toxin-based method offers five key advantages.
First, it provides a universal solution to enrich for a variety of editing events in cells, including single-nucleotide polymorphisms, small insertions and deletions, and precise large-fragment knock-ins. These edits can be achieved without the need for an exogenous selection marker, unlike other selection methods involving Cas9 nuclease or base editors 13,15,16 .
Second, while most selection methods require the introduction of loss-of-function indels at the selection locus 14,20,24 , we designed and engineered specific mutations in HBEGF that render it resistant to the toxin. These substitutions do not perturb HBEGF function and we could not find any detectible effect on cell fitness. Furthermore, we minimize the risk of unexpected DNA rearrangements by using either base editing or a precise DNA template-based knock-in strategy at the HBEGF locus [65][66][67] .
Third, a variant of our selection method, Xential, relies on the insertion of a DNA fragment into the HBEGF locus that (1) introduces a specific mutation into HBEGF and (2) uses its transcription to express a gene of interest. Given the expression of a single wild-type allele of the HBEGF receptor would render cells sensitive to DT, Xential selects only for cells with the biallelic insertion of the transgene at the HBEGF locus, thus producing a homogenous population of cells. In addition, our experiments in human cells, including cancer and pluripotent stem cells, suggest that gene insertions at the HBEGF locus are potentially suitable for applications in cell therapy. The Xential selection strategy can be used for the rapid validation of genetic variants. Because many genetic variants can be inserted into the HBEGF locus relatively quickly, the effect of a number of mutations can be easily studied without the need of generating clonal lines. Unlike transgene selection methods based on an antibiotic-resistant gene, Xential does not introduce bacterial protein into the cell and, therefore, has a low immunogenicity potential, and results would not be confounded by other proteins expressed simultaneously for selection.
Fourth, we demonstrate that toxin-based selection for Cas9driven genome editing events occurs efficiently in vivo, in humanized mice. We hypothesize that similar strategies can be used in the future to select edited human cells in vivo, for the purpose of facilitating the generation of xenograft models 68 , or for producing large quantities of edited cells in animals 69,70 .
Fifth, toxins are often large, modular biomolecules where the toxin subunit can be uncoupled from the targeting module 25 . This modularity could allow facile modulation of the toxin target, through coupling to an antibody or a cytokine specific for the desired cell type 29,71 . We anticipate that such chimeric toxins, or similarly antibody-drug conjugates 72 , could be used for selection strategies such as that described here. Overall, such an extension of our method would not only expand the spectrum of targets engaged in selecting/co-selecting both in vitro and in vivo, but also provide translational applications. For example, simultaneous modification of membrane receptors, such as PD-1, CTLA4, or TCR 9,11,58 , while selecting for and introducing a therapeutically relevant transgene could improve the therapeutic efficacy of the engineered cells 2,3,11 . Customized toxins designed to target these receptors could provide a direct selection method for the desired engineered cells. Overall, our methodology should be of utility to a broad range of cell and gene therapy applications, and to the generation of disease models.

Methods
Plasmids. Plasmids expressing SpCas9 were constructed using a codon-optimized SpCas9 with a nuclear localization signal fused to a T2A peptide and puromycin acetyltransferase in the pVAX1 backbone. Two version of SpCas9 plasmids were constructed to drive the expression of SpCas9 under the control of the CMV (CMV-SpCas9) or EF1α promoter (EF1α-SpCas9). Plasmids expressing the CBE3 were synthesized employing the previously published sequence 34 , and subcloned into the pcDNA3.1(+) vector backbone by GeneArt. Two versions of the plasmids were constructed to control CBE3 expression under CMV (CMV-CBE3) or EF1α promoter (EF1α-CBE3). ABE7.10 sequences were obtained from the original publication 35 , and cloned into the pcDNA3.1(+) vector backbone. Individual sequence components were ordered from Integrated DNA Technologies and assembled using Gibson Assembly Cloning Kit (New England Biolabs). ABE7.10 plasmids were cloned either with CMV (CMV-ABE7.10) or EF1α promoter (EF1α-ABE7.10). Plasmids expressing CBE4max or ABE8e under the control of EF1α promoter (EF1α-CBE4max or EF1α-ABE8e) were synthesized by GenScript.
Plasmids expressing sgRNAs were cloned by replacing the target sequence of the template plasmid 73 . Complementary primer pairs containing the target sequence (5′-AAAC-N20-3′ and 5′-ACCG-N20-3′) were annealed (95°C 5 min, then ramp down to 25°C at 1°C/min) and assembled with AarI digested template using T4 ligase (New England Biolabs). All primer pairs are listed in Supplementary Data 1. The plasmid expressing sgRNA targeting BFP or the plasmid expressing sgRNA targeting EGFR and CBE3 was described in our previous publication 15 .
The plasmids used as DNA repair templates for the HBEGF or HIST1H2BC loci were synthesized by GenScript and modified, using Gibson Assembly Cloning Kit (New England Biolabs). Individual sequence components were ordered from Integrated DNA Technologies. Template plasmids for the HBEGF locus was designed to contain a splicing acceptor sequence 74 , followed by the mutated sequence of the HBEGF exon 4 linked to the mCherry coding sequence with a selfcleaving peptide (T2A) 52 . The plasmids used for the tagging in the HIST1H2BC locus were designed to contain a GFP coding sequence followed by a self-cleaving peptide with the coding sequence of blasticidin deaminase. For both loci, pHMEJ and pHR were designed to contain left and right homology arms flanking the insertion sequence, while pNHEJ was designed to contain no homology arms (Supplementary Data 2). pHMEJ were designed to contain one sgRNA cutting site flanking each homology arm, while pHR did not (Supplementary Data 2). For comparing the puromycin selection with the DT selection, a self-cleavage puromycin resistant protein coding sequence was inserted between the HBEGF exon sequence and self-cleavage mCherry coding sequence (pHMEJ_PuroR, Supplementary Data 2). For the safety switch gene delivery, the mCherry gene was replaced with the HSV-TK gene (synthesized by GenScript, Supplementary Data 2). dsDNA templates were prepared by PCR amplification of the plasmid pHMEJ with primers listed in Supplementary Data 1, followed by purification with MAGBIO magnetic SPRI beads. PCR amplification was performed using Phusion Flash High-Fidelity PCR Master Mix (Life Technologies). ssDNA templates were prepared using the Guide-it™ Long ssDNA Production System (Takara Bio) with primers listed in Supplementary Data 1. Final products were purified by MAGBIO magnetic SPRI beads and analyzed by Fragment Analyzer (Agilent). The oligo template used for CD34 locus was ordered from IDT as PAGE purified oligo (Supplementary Data 1).
Cell transfections. Twenty hours prior transfections 1.25 × 10 5 or 6.75 × 10 4 HEK293, HCT116, and PC9-BFP cells were seeded into 24-well or 48-well plates, respectively. Transfections were performed with FuGENE HD transfection reagent (Promega) using a 3:1 transfection reagent to plasmid DNA ratio. For 24-well plate formats, the amount and weight ratios of transfected DNA are listed in Supplementary Table 2 and Supplementary Table 3. For 48-well plate formats, the amount of DNA was reduced by half.
The iPSCs cells were transfected with FuGENE HD using a 2.5:1 transfection reagent to DNA ratio and a reverse transfection protocol. For transfections, 4.2 × 10 4 cells were seeded per well in 48-well format directly onto prepared transfection complexes as described in Supplementary Table 4.
The CD4+ T cells were electroporated with RNPs using the 10 μL Neon transfection kit (MPK1096, Thermo Fisher). CBE3 proteins were produced using the method described before 76 . An extra purification step was performed on a HiLoad 26/600 Superdex 200 pg column (GE Healthcare) with a mobile phase consisting of 20 mM TrisCl pH 8.0, 200 mM NaCl, 10% glycerol, and 1 mM TCEP. Purified CBE3 protein was concentrated to 5 mg/mL in a Vivaspin protein concentrator spin columns (28932363, GE Healthcare) at 4°C, before flash freezing in small aliquots in liquid nitrogen. RNPs were prepared as follows; 20 μg CBE3 protein, 2 μg of target sgRNA and 2 μg of selection sgRNA (TrueGuide Synthetic gRNA, Life Technologies), and 2.4 μg electroportation enhancer oligos (HPLCpurified, Sigma; Supplementary Data 1) were mixed and incubated for 15 min. Cells were washed with PBS and resuspended in buffer R at a concentration of 5 × 10 7 /mL. A total of 5 × 10 5 cells were electroporated with RNPs using the following settings: voltage: 1600 V, width: 10 ms, and pulse number: 3. After electroporation cells were incubated over night in 1 mL of RPMI medium complemented with 10% heat-inactivated FBS in a 24-well plate. The next day cells were collected, centrifuged at 300 × g for 5 min, resuspended in 1 mL of complete growth medium containing 500 U/mL IL-2 (PHC0026, Prepotech) and split into five wells of a round-bottom 96-well plate.
Diphtheria toxin treatments in vitro. Transfected HEK293, HCT116, and PC9-BFP cells were selected with 20 ng/mL DT at days 3 and 5 after transfections. iPSCs were treated with 20 ng/mL DT from day 3 after transfections. DT-supplemented growth medium was exchanged daily until negative control cells died. Transfected CD4+ T cells, were treated with 1000 ng/mL DT at days 1, 4, and 7 after electroporation.
Recombinant HBEGF purification from bacteria. The HBEGF gene fragment encoding a soluble human wild-type HBEGF (amino acids 63-119) or the E141K mutant were cloned to pET32a (Novagen) and expressed in BL21(DE3) strain. Briefly, the recombinant proteins were induced by adding 0.4 mM IPTG for 24 h at OD600 0.6. The collected cells were lysed in lysis buffer (20 mM Tris-HCl, 500 mM NaCl, and 1% Triton X-100) and sonicated. Both proteins containing Trx-6xHis tags were purified over Ni-NTA column and elute with imidazole (PBS, pH 7.5, 500 mM imidazole) 77,78 . The precleared lysate, the SDS-Page gel electrophoresis followed by the Coomasie staining was used to assess the purity of the protein purification.
Recombinant HBEGF activity assay. HEK293 cells were plated in six-well plate (1.5 × 10 6 ) and grown for 24 h. The attached cells were washed three times with PBS and cultured for 12 h in DMEM media without FBS (serum starvation). Subsequently, DMEM containing the recombinant HBEGF (10 ng/mL) or the elution buffer used for purification was added to cells followed by 5 min incubation at 37°C. The cells were lysed (25 mM Hepes pH 7.4, 150 mM NaCl, 1%Triton, protease inhibitors (Roche), and phosphatase inhibitors (Roche)) on ice and protein was extracted for SDS-PAGE followed by western blot 79 .
Cell viability and proliferation assays. Cell viability was analyzed using the AlamarBlue cell viability reagent (Thermo Fisher) or CellTiter-Glow (Promega) according to the manual. Fluorescence emission was recorded with a SpectraMax iD3 Multi-Mode Microplate Reader (Molecular Devices). To determine whether introduced mutations into the HBEGF locus affect the proliferative capacity, we evaluated cell growth of HEK293 wild-type cells and the produced HBEGF-mutant sublines. To monitor proliferation curves, 2000 cells were seeded per well of a 96well plate and cell confluence was recorded every 24 h for 7 days, using the Incucyte S3 live-cell analysis system (Essen BioScience). For the experiments with the HSV-TK safety switch in hiPSCs, ganciclovir (Sigma, SML2346) was included in the cell culture media at 0.1, 1, and 10 µM concentration for 3 days, followed by a 3-day recovery.
PCR analysis. PCR analysis was performed to discriminate between successful knock-in into HBEGF intron 3 (PCR1) and the wild-type sequence (PCR2). PCR reactions were carried out in 20 μL volume using 1.5 μL of extracted genomic DNA as template. Phusion Flash High-Fidelity PCR Master Mix (Thermo Fisher) and the recommended protocol was applied with a final primer concentrations of 0.5 µM. Primer pair PCR1_fwd and PCR1_rev was used for PCR1 to detect knock-in junctions (annealing temp: 62°C, elongation time: 1 min), and primer pair PCR2_fwd and PCR2_rev was used for PCR2 to detect the wild-type HBEGF intron (annealing temp: 64.5°C, elongation time: 5 s). Sequences of primer pairs are provided in Supplementary Data 1. For PCR2, the elongation time was set to 5 s to favor amplification of the wild-type HBEGF intron 3 PCR product (280 bp) over the integrant PCR product (2229 bp). PCR products were analyzed through agarose gel electrophoresis. For further analysis of the junction between inserted DNA and genomic DNA in Xential engineered cells, PCR was performed using the Left_F/ Left_R primer pair (forward insertion, PCR_L) or the Left_F/Right_F primer pair (reverse insertion, PCR_Lr). Conditions for the PCR reactions and sequences of primers are provided in Supplementary Data 1.
Flow cytometry analysis. The frequency of cells expressing mCherry and GFP was assessed with a BD Fortessa (BD Biosciences), and flow cytometry data were analyzed with the FlowJo software (Three Star).
Genomic DNA extractions and next-generation Amplicon sequencing. Genomic DNA was extracted from cells 3 days after transfections or after completed DT selection using QuickExtract DNA extraction solution (Lucigen), according to the manual. Amplicons of interest were analyzed from genomic DNA samples on a NextSeq platform (Illumina). In brief, genomic sites of interest were amplified in a first round of PCR using primers that contained NGS forward and reverse adapters (Supplementary Data 1). The first PCR was setup using NEBNext Q5 Hot Start HiFi PCR Master Mix (New England Biolabs) in 15 μL reactions, with 0.5 μM of primers and 1.5 μL of genomic DNA as template. PCR was carried out applying the following cycling conditions: 98°C for 2 min, 5 cycles of [98°C for 10 s, annealing temperature for each pair of primers for 20 s (calculated for genomic binding regions of primers by NEB Tm Calculator), and 65°C for 10 s], then 25 cycles of [98°C for 10 s, 98°C for 20 s, and 65°C for 10 s], followed by a final 65°C extension for 5 min. PCR products were purified using HighPre PCR Clean-up System (MagBio Genomics) and correct PCR product size, and DNA concentration was analyzed on a Fragment Analyzer (Agilent). Unique Illumina indexes were added to PCR products in a second round of PCR using KAPA HiFi Hotstart Ready Mix (Roche). Indexing primers were added in a second PCR step and 1 ng of purified PCR product from the first PCR was used as template in a 50 μL reaction volume. PCR was performed applying the following cycling conditions: 72°C for 3 min, 98°C for 30 s, then ten cycles of [98°C for 10 s, 63°C for 30 s, and 72°C for 3 min], followed by a final 72°C extension for 5 min. Final PCR products were purified using HighPre PCR Clean-up System (MagBio Genomics) and analyzed by Fragment analyzer (Agilent). Libraries were quantified using Qubit 4 Fluorometer (Life Technologies), pooled and sequenced on a NextSeq instrument (Illumina).
Bioinformatics. NGS sequencing data were demultiplexed using bcl2fastq software, and individual FASTQ files were analyzed using a Perl implementation of the Matlab script described previously 34 . For the quantification of indels or base editing frequencies, sequencing reads were scanned for matches to two 10 bp sequences that flank both sides of an intervening window, in which indels or base edits might occur. If no matches were located (allowing maximum 1 bp mismatch on each side), the read was excluded from the analysis. If the length of the intervening window was longer or shorter than the reference sequence, the sequencing read was classified as an insertion or deletion, respectively. The frequency of insertions or deletions was calculated as the percentage of reads classified, as insertion or deletion within the total analyzed reads. If the length of this intervening window exactly matched the reference sequence the read was classified as not containing an indel. For these reads, the frequencies of each base at each locus was calculated in the intervening window and was used as the frequencies of base edits. For off-target analysis, a list of in silico predicted candidate sites was generated for sgRNA5, sgRNA10, or sgRNAin3 using Cas-OFFinder 80 , respectively (Supplementary Data 3). Top three candidates were selected for each sgRNA for NGS analysis. Sequencing data were analyzed by CRISPResso2 (ref. 81 ).
Cytidine base editing and DT treatment of mice humanized for hHBEGF expression. All mouse experiments were approved by the AstraZeneca internal committee for animal studies and the Gothenburg Ethics Committee for Experimental Animals (license number: 162-2015+) compliant with EU directives on the protection of animals used for scientific purposes. Experimental mice were generated as double heterozygotes by breeding Alb-Cre mice (016833, The Jackson NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-20810-z ARTICLE Laboratory) to iDTR mice (Expression of transgene, human HBEGF, is blocked by loxP-flanked STOP sequence) on the C57BL/6NCrl genetic background. Mice were housed in negative pressure IVC caging, in a temperature controlled room (21°C) with a 12:12 h light-dark cycle (dawn: 5:30 a.m., lights on: 6:00 a.m., dusk: 5:30 p.m., lights off: 6 p.m.) and with controlled humidity (45-55%). Mice had access to a normal chow diet (R36, Lactamin AB, Stockholm, Sweden) and water ad libitum.
For base editing, 6-month-old mice, six male, and six female, were randomized into two groups with equal male and female mice in each group. Adenoviral vectors expressing CBE3, sgRNA10, and sgRNA targeting mouse Pcsk9 (1 × 10 9 IFU particles per mouse) were intravenously injected. Two weeks after virus administration, all mice received DT (200 ng/kg) intraperitoneally. Control mice were terminated 24 h after DT injection. Experimental mice were terminated 11 days after DT injection. Four mice were terminated prior to experimental endpoint as the humane endpoint of the ethics license was reached. At necropsy, liver tissues were collected for morphological and molecular analyses.
Histology analysis. At necropsy, mice were euthanized under isoflurane anesthesia and liver collected in 4% neutral-buffered formalin for assessment. Tissue was embedded in paraffin and prepared as 5 μm thick sections. Sections were stained for hematoxylin and eosin for morphological characterization. All histological slides were blinded and examined using light microscopy (Carl Zeiss Microscopy GmbH, Jena, Germany) by an experienced board-certified pathologist. Severity grades (0-5) were assigned according to standard grading criteria as per Schafer et al. 82 with 0 = lesion not detected, 1 = minimal, 2 = mild, 3 = moderate, 4 = marked, and 5 = severe.
Translocation analysis. HEK293 cells were transfected with different combinations of sgRNAin3, sgRNA targeting HIST1H2BC and pHMEJ repair templates described in Supplementary Table 5. Genomic DNA was isolated from these cells with or without DT selection with Gentra Puregene Cell Kit (Qiagen) and was diluted to 10 ng/μL for ddPCR analysis. A FAM-labeled ddPCR assay was designed for detecting the the balanced translocation between HBEGF and HIST1H2BC, using Primer 3 Plus 83 and was ordered as custom assay from BioRAD (sequence information in Supplementary Table 6). A Mastermix was prepared using a final concentration of 1× ddPCR Supermix for Probes, no dUPT (186-3024, Bio-Rad), 1× FAM-labeled HBEGF-HIST1H2BC assay (custom assay 10031276, BioRAD), 1× AP3B1-HEX labeled human reference assay (dHsaCP1000001, BioRAD), and 1/ 40 HaeIII (15205016, Invitrogen). A total of 20 μL Mastermix per well to be analyzed was prepared in ultrapure RNase and DNase free water (10977-035, Invitrogen) with 5 μL 10 ng/μL genomic DNA. An automated Droplet Generator (BioRAD) was used to generate droplets in a new semi-skirted 96-well PCR plate (30129504, Eppendorf). After droplet generation, the PCR plate was placed in a C1000 Touch™ Thermal Cycler (Bio-Rad, cat no. 185-1197) for PCR amplification, as detailed in Supplementary Table 6. The droplet reading was performed with the QX 100 Droplet reader (Bio-Rad, cat. no. 186-3001), using ddPCR™ Droplet Reader Oil (Bio-Rad, cat. no. 186-3004). Data acquisition and analysis was performed using the software QuantaSoft (Bio-Rad) and the "RED" program. The fluorescence amplitude threshold was set manually as the midpoint between the average fluorescence amplitude of the four droplet clusters (Translocation-positive, AP3B1positive, positive for both targets, and empty droplets). The same threshold was applied to all the wells of the ddPCR plate.
Trilineage differentiation assay. Differentiation potential of hiPSCs and HBEGFmutant pools into the three germ layers was assayed with the STEMdiff Trilineage Differentiation Kit (STEMCELL Technologies). In brief, cells were plated onto Cellartis DEF-CS 500 COAT-1 (Cellartis) coated six-well plates, and treated with endoderm or mesoderm differentiation media for 5 days or ectoderm differentiation media for 7 days.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Data supporting the findings of this study are presented within the article and supplementary figures. NGS data are available in the NCBI Sequence Read Archive database (BioProject accession code PRJNA684443). Additional details and data to support the findings of this study are available from the corresponding authors upon reasonable request. Source data are provided with this paper.