Split selectable markers

Selectable markers are widely used in transgenesis and genome editing for selecting engineered cells with a desired genotype but the variety of markers is limited. Here we present split selectable markers that each allow for selection of multiple “unlinked” transgenes in the context of lentivirus-mediated transgenesis as well as CRISPR-Cas-mediated knock-ins. Split marker gene segments fused to protein splicing elements called “inteins” can be separately co-segregated with different transgenic vectors, and rejoin via protein trans-splicing to reconstitute a full-length marker protein in host cells receiving all intended vectors. Using a lentiviral system, we create and validate 2-split Hygromycin, Puromycin, Neomycin and Blasticidin resistance genes as well as mScarlet fluorescent proteins. By combining split points, we create 3- and 6-split Hygromycin resistance genes, demonstrating that higher-degree split markers can be generated by a “chaining” design. We adapt the split marker system for selecting biallelically engineered cells after CRISPR gene editing. Future engineering of split markers may allow selection of a higher number of genetic modifications in target cells.

S electable markers, such as antibiotic resistance or fluorescent protein genes, are often used in genetic engineering to isolate cells with desired genotypes 1 . However, there are a limited number of well-characterized antibiotic resistance genes for use in eukaryotic cells and fluorescent proteins whose spectra can be unambiguously differentiated by commonly used equipment is similarly limited. Researchers often run into the problem of not having enough choices of selectable markers if they wish to incorporate multiple transgenes into a cell. On the other hand, selection with multiple antibiotics at the same time is often harsh to cells. "Selectable marker recycling" can provide a work-around but is unwieldy, requiring multiple rounds of transgenesis, selection and removal of markers 2 .
To allow multiple transgene selection with a single scheme, we create here split antibiotics resistance and fluorescent protein genes. In this system, a gene encoding an antibiotic resistance or fluorescent protein is split into two or more segments and fused to inteins ("markertrons") that can be rejoined by protein transsplicing 3 (Fig. 1). Each markertron is inserted onto a transgenic vector carrying a specific transgene. Delivery of transgenic vectors containing a set of markertrons yields cells that harbor either a subset or a complete set of the markertrons. Only cells with a complete set of markertrons produce a fully reconstituted marker protein via protein splicing and thus passes through selection while cells with partial sets of markertrons are eliminated, achieving co-selection of cells containing all intended transgenes.

Results
Intein-split antibiotic resistance (Intres) genes. We began by engineering 2-markertron intein-split resistance (Intres) genes for double transgenesis. Since flanking residues and local protein folding can affect efficiency of intein-mediated trans-splicing, we set out to identify split points in each of the four commonly used antibiotic resistance genes compatible with two well-characterized split inteins derived from NpuDnaE 4,5 and SspDnaB 6 . To facilitate assessment of the effectiveness of double transgenic selection, we cloned markertrons onto lentiviral vectors expressing TagBFP or mCherry fluorescent proteins as test transgenes (Fig. 2a). Viral preparations were transduced into U2OS cells, which were then split into replicate plates with non-selective or selective media. Following appropriate passages for antibiotics selection, the two cell cultures were analyzed by flow cytometry. For Hygromycin resistance (Hygro R ) gene, one "native" SspDnaB split point (SspDnaB-200 = G200:S201; Plasmid pair 5,6) with flanking residues "GS" and one "native" NpuDnaE split point (NpuDnaE-89 = Y89:C90; Plasmid pair 3, 4) with "YC" residues were tested ( Supplementary Fig. 1a). Both enabled successful selection when both N-and C-markertrons were transduced yielding >95% BFP + mCherry + double transgenic cells in selected cultures compared to <40% double-positive cells in nonselected culture ( Fig. 2b; Plasmid pairs 3, 4 and 5, 6). Cells transduced with either of the single markertrons did not survive Hygromycin selection. In contrast, double transgenesis with conventional full-length non-split Hygro R vectors only allowed for~20% enrichment of BFP + mCherry + cells (Plasmid pairs 97,98) at lower titers and for up to~50% at higher titers. We screened three additional potential split points (NpuDnaE-52 = 52 S:53 C; Plasmid pair 7,8), (NpuDnaE-240 = 240 A:241 C; Plasmid pair 9,10), and (NpuDnaE-292 = 292 R:293 C; Plasmid pair 11,12) for NpuDnaE with the obligatory cysteine residue on the C-extein junction and a residue on the N-extein junction reported to support substantial trans-splicing activities 7 . We also tested six additional NpuDnaE split points (NpuDnaE-69, 131, 171, 218, 259, and 277) by inserting an "artificial" cysteine on the C-extein junction to support splicing at ectopic sites yielding additional split points. In total, eight out of eleven split points tested supported Hygromycin selection (Fig. 2b). Two of the Hygro Intres designs (NpuDnaE-131 and 292) failed to provide resistance in two of the four replicate experiments at lower titers, while three designs (NpuDnaE-218, 259, and 277) failed to provide resistance in any experiments. These positions may reside within less efficient splicing sequence and structural contexts or may disrupt folding of the Hygro R protein upon reconstitution. Indeed, western blot analysis using terminally tagged markertron fragments revealed that among split points 52, 68, 89, 131, and 171, trans-splicing is least active at split point 131 (Supplementary Fig. 1b,lane 5). This is consistent with its failure to consistently provide resistance at a lower titer (Fig. 2b). In addition, the insertion of the artificial cysteine at the NpuDnaE-69, 131, and 171 C-markertrons is required for protein splicing mediated by NpuDnaE intein at these positions (Compare lanes 2/3, 5/6, and 7/8), consistent with a well-established requirement 7 . Nonetheless, the six successful designs validate our screening strategy and demonstrate that Hygro R is amenable to splitting at different positions spanning a large portion of the protein.
Three-split Hygromycin Intres for triple transgenesis. With the split points identified for 2-markertron Intres genes, we set out to engineer higher degree split markers. We tested combinations of splits points to partition a marker gene into three or more markertrons to allow for co-selection of more than two "unlinked" transgenes with one antibiotic (Fig. 4a, b). To identify pairs of split points that would allow such an "Intres chain", we cloned 3-split markertrons into three lentiviral vectors each carrying one of three fluorescent transgenes TagBFP, EGFP, or mCherry, that will allow us to assess effectiveness of selection by flow cytometry (Fig. 4c). Since the Hygromycin resistance gene is the longest and provides the most split points for testing, we focused on engineering 3-split Hygromycin Intres. We tested two 3-split Hygromycin Intres using two intervening NpuDnaE  Horizontal asterisks indicate statistical significance by one-way ANOVA test on the percentages of double-positive cells in the selected cultures of the specific split marker vs those in the nonsplit marker (n.s., non-significant; *p < 0.05, **p < 0.01, ***p < 0.001). Vertical asterisks indicate statistical significance by paired two-sided t-test on the percentages of double-positive cells in the selected cultures vs non-selected cultures within each transfection group (n.s., non-significant; *p < 0.05, **p < 0.01, ***p < 0.001) NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-12891-2 ARTICLE inteins (i.e., homogeneous intein), two using NpuDnaE for the first intein and SspDnaB for the second intein, as well as two using SspDnaB for the first intein and NpuDnaE for the second intein (i.e., heterogeneous "orthogonal" inteins) (Fig. 4d). The four heterogeneous-intein 3-split Hygromycin Intres enabled 95-100% triple transgenic selection and the two homogeneousintein Hygro Intres enabled 74-99% triple transgenic selection in Hygromycin-selected cultures compared to <20% in non-selected cultures. Samples with "leave-one-out" transduction did not yield any viable cells after Hygromycin selection while cells transduced with non-split Hygromycin vectors yielded only 7-17% triple transgenic cells after selection. The observation that 3-split Intres designs using two orthogonal inteins yielded more consistent results than those using the same inteins for the two split points suggest that the use of the same inteins for joining multiple split points may result in artifacts caused by combinatorial splicing that generates "misjoined" fragments. To facilitate the use of 3split Intres, we created Gateway compatible lentiviral vectors with three of the 3-split Hygromycin Intres ( Supplementary Fig. 8a). Three sets of these vectors were each tested by recombining TagBFP (as transgene 1), EGFP (as transgene 2) and mCherry (as transgene 3) into the N-, M-, and C-Intres Gateway destination vectors. Lentiviruses derived from the resultant vectors were used to transduce U2OS cells, which were then split into Hygromycin selective or non-selective media ( Supplementary Fig. 8b). Two weeks after selection, cells were analyzed by flow cytometry. All three sets of 3-split Hygromycin Intres plasmids support triple transgenic cell selection of >97% compared to <40% in the nonselected cultures ( Supplementary Fig. 8c).

Application of Intres in CRISPR-Cas-mediated knock-in.
Another potential application of split selectable markers is to facilitate genome engineering and editing via the CRISPR-Cas system 10 . Although gene knockout based on NHEJ-mediated insertions/deletions (indels) occurs at high frequency, precise editing and knock-in based on homology directed repair (HDR) using exogenous repair templates are inefficient 11 . We tested whether split selectable markers can be used to select for cells with CRISPR-mediated biallelic knock-in at the AAVS1 locus 12 .
We constructed targeting constructs with homology arms flanking the target site, and splice acceptor-2A peptide to trap the markertrons within intron one of the host gene PPP1R12C. However, we did not obtain any live cells after CRISPR-Cas knock-in experiments in HEK293T cells using these targeting constructs and two weeks of antibiotic selection. We suspected that the endogenous promoter of the host gene PPP1R12C might not drive sufficient expression of markertrons to reconstitute enough antibiotic resistance protein to counter the antibiotic. We thus tested an alternative strategy to express Intres markertrons using the TetO promoter which allows activity to be tuned by doxycycline (dox  containing Cas9 and sgRNA targeting AAVS1, and the different pairs of targeting constructs (TC) into HEK293T cells, split into triplicate doxycycline-containing media without antibiotics, with Blasticidin, or with Hygromycin at the subsequent passages. Two weeks after selection, we analyzed the cultures for biallelic targeting by flow cytometric measurement of GFP and RFP fluorescence ( Supplementary Fig. 9e). As expected, non-selected cultures harbored a small fraction (<1%) of biallelic knock-in GFP+/RFP+cells (Supplementary Fig. 9e; Selection = None). Selection of antibiotics where corresponding FL antibiotic resistance genes were present on targeting constructs yielded <30% biallelic knock-in cells (Supplementary Fig. 9e; Blast: TC a, c, d; Hygro: TC a, b, c). In contrast, selection by antibiotics where corresponding Intres are present on the targeting constructs yielded 75% (Supplementary Fig. 9e; Blast Intres: TC b) and 88% (Supplementary Fig. 9e; Hygro Intres: TC d) biallelic knock-in cells. Selection for an additional two weeks allowed split Blast and Hygro TCs to achieve 96.5% and 97.0% biallelic knock-in, respectively ( Supplementary Fig. 9f, g). We next tested biallelic engineering in KOLF2-C1 human induced pluripotent stem cells (hiPSCs), which are karyotypically normal with a stable diploid genome 13 (Fig. 5). The full-length non-split Blast targeting constructs (Fig. 5a) and 2-split Blast Intres targeting constructs (Fig. 5b) were tested for selection of biallelically modified clones.   . The intervening split intein catalyzes the joining of the split fragments reconstituting the full selectable marker. b A design of a k-split selectable marker via an "intein chain" mechanism. The selectable marker is partitioned into k fragments that are reconstituted through protein trans-splicing mediated by intervening split inteins. c Split points identified from 2-split selectable markers were used in combination to produce 3-split selectable markers that were cloned into lentiviral vectors with different fluorescent reporters. Cells were then transduced with viruses prepared from these vectors, split into selective or non-selective media. After selection, the cultures were analyzed by flow cytometry. d 3-split Hygromycin (Hygro) Intres. Top schematic shows the split points tested for Hygro R , with residue numbers of the last amino acid of the N-terminal fragments indicated above circle or square lollipops, representing NpuDnaE and SspDnaB inteins, respectively. Six designs of 3-split Hygromycin Intres were tested, each indicated with a numbered line with circle or square indicating the two split points used for each design. Column plot below shows the percentages of triple transgenic (BFP + GFP + mCherry + ) cells from the non-selective (white portion) and selective (total column height = white + blue) cultures for the 3-split Hygromycin Intres indicated by the numbers below. Horizontal asterisks indicate statistical significance by one-way ANOVA test on the percentages of triple-positive cells in the selected cultures of the specific split marker vs those in the non-split marker (n.s., non-significant; *p < 0.05, **p < 0.01, ***p < 0.001). Vertical asterisks indicate statistical significance by paired two-sided t-test on the percentages of triple-positive cells in the selected cultures vs non-selected cultures within each transfection group (n.s., non-significant; *p < 0.05, **p < 0.01, ***p < 0.001) Purified Cas9 proteins were complexed with synthetic sgRNA to form Cas9 ribonucleoprotein (RNP) and co-nucleofected with the targeting constructs into KOLF2-C1, followed by dox-induction and antibiotic selection. Surviving colonies were picked into separate wells for establishing single-cell clones. Genotyping PCR revealed that targeting using non-split Blast resistance gene generated only 8% biallelic clones, while targeting using Blasticidin Intres yielded exclusively (100%) biallelically modified clones (Fig. 5c, d), showing both fluorescent signals (Fig. 5e) indicative of the targeting by each targeting construct at the two alleles of AAVS in these hiPSCs.
Selection of four or more transgenes with Intres. The utility of Intres may become more apparent in cases where more than three transgenes are to be selected. As we have observed in our 3-split Hygromycin Intres engineering exercise that the use of a set of orthogonal inteins represent a better design for a more robust split marker, we tested four other inteins (gp411, gp418, NrdJ1, IMPDH1) 14 in splitting Hygro R or Puro R . We identified additional functional splits of Hygro R and Puro R at different positions ( Supplementary Figs. 10 and 11). Some of these additional Intres were further adapted to the Gateway cloning system (Supplementary Figs. 12 and 13). To directly observe protein splicing as   well as to confirm these inteins are indeed orthogonal, we conducted western blot analysis of protein trans-splicing between Nmarkertrons N-terminally tagged with 3xFLAG-epitope and Cmarkertrons C-terminally tagged with HA-epitope (Supplementary Fig. 14). As expected, while cognate markertrons with matching N-and C-inteins supported reconstitution of the fulllength Hygro R (lanes 3,5,6), markertrons with unmatched N-and C-inteins did not yield full-length Hygro R (lanes 7,8). To introduce and select cells with four or more transgenes, one approach is through sequential transduction/selection of two or more sets of 2-split Intres vectors. By subjecting cells to two rounds of 2split Intres transduction/selection (Hygro → Puro or Puro → Hygro) with each round carrying two transgenes, we obtained quadruple transgenic cells (Supplementary Fig. 15). These results demonstrated that four transgenes can be sequentially introduced, and that the Intres system is compatible with sequential cell engineering. Another way to introduce four or more transgenes is with higher-degree split Intres markers. By combining the multiple inteins and positions tested for Hygro R , we designed and tested 6-split Hygro Intres marker (Supplementary Fig. 16).
While cultures transduced with all markertrons yielded viable cells, leave-one-out cultures missing any one of the markertrons did not produce any viable cells after selection. This result demonstrates that up to at least 6 transgenic vectors can be selected simultaneously by one selection scheme using a split selectable marker.
Proviral copy number analysis. We validated Intres lentiviral vectors in additional cell lines (HEK293T and HeLa) (Supplementary Fig. 17). To ask whether split markers require a substantially higher copy number than non-split markers to support selection, we conducted proviral copy number analysis in nonselective and selective cultures of cells transduced with non-split Hygro R or split Hygro Intres markers ( Supplementary Fig. 18) in U2OS, HEK293T and HeLa cells. In general, we observed 1.3-3.1fold proviral copy numbers in the split marker cultures compared to the non-split cultures. Since the two-split markers require the presence of the two different viral genomes hosting the two markertrons to reconstitute a full resistance protein, it is expected to have~2-fold equivalence of viral integration to support selection.

Discussion
In this study, we have engineered split antibiotic resistance and fluorescent protein genes that allow selection for two or more "unlinked" transgenes. By inserting unnatural residues at selectable markers, we showed that additional high-efficiency split points could be utilized, expanding the positions available for engineering. We demonstrated that split selectable markers could be incorporated into lentiviral vectors or gene targeting constructs in CRISPR-Cas9 genome editing experiments for positive selection of cells with double transgenesis or biallelic knock-ins. By combining two splits points, we showed that 3-split markers could be generated to allow higher degree transgenic selection. By conducting sequential transduction/selection with two-split markers, or by combining even more split points we showed the potential to use split selectable markers to select for 4 vectors with two antibiotics or up to 6 vectors with one antibiotic respectively. It is intriguing to anticipate future work to design even higher-degree split selectable markers and to explore the limit of this system for "hyper-engineering" of cells.

Methods
Cloning. To generate a test plasmid for each markertron, we first generated a Gateway donor plasmid containing its ORF and then recombined into lentiviral destination vector with TagBFP2 (Plasmid 94: pLX-DEST-IRES-TagBFP2), EGFP (Plasmid 95: pLX-DEST-IRES-EGFP), or mCherry (Plasmid 96: pLX-DEST-IRES-mCherry) reporters, which were derived from pLX302 (Gift from David Root; Addgene: #25896) by removing Puromycin resistance gene and inserting IRESfluorescent genes downstream of the Gateway cassette. The markertron-ORF Gateway donor plasmids were generated either by a nested fusion PCR procedure to combine intein with the coding sequence of fragments of the selectable marker followed by insertion into the pCR8-GW-TOPO plasmid by sequence-and ligation-independent cloning (SLIC), or PCR-amplifying the relevant fragment of the selectable marker followed by insertion into "scaffold" plasmids (Plasmids 27~32) containing the intein sequences by SLIC. DNA sequences encoding inteins were codon optimized for Homo sapiens, and synthesized as GBlocks (IDT). Selectable marker fragments were amplified from plasmids containing these markers. Plasmids created in this study are listed in Supplementary Table 1 with links to webpages for plasmid sharing and GenBank sequence files. Virus Production. A viral packaging mix of pLP1, pLP2, and VSV-G were cotransfected with each lentiviral vector into Lenti-X 293T cells (Clontech/Takara # 632180), seeded the day before in 6-well plates at a concentration of 1.2 × 10 6 cells per well, using Lipofectamine 3000. Media was changed 6 h after transfection then incubated overnight. 28 h post transfection, the media supernatant containing virus was filtered using 45 μM PES filters then stored at −80°C until use.
Transduction, transfection, flow cytometry, and microscopy. The day prior to transduction, U2OS, HEK293T, or HeLa cells were seeded into 12-well plates at a density of 1.5 × 10 5 cells per well. Prior to transduction, media was changed to media containing 10 μg/mL polybrene, 1 mL per well. In all, 25 μL (or indicated otherwise) of each respective virus (50 μL total for experimental samples with two viruses or 75 μL total for experimental samples with three viruses) was added to each well and incubated overnight. Media was changed 24 h post-transduction. Four days post-transduction, cells were split into duplicate plates. Five days posttransduction, media with antibiotics (130 μg/mL Hygromycin, 2 μg/mL Puromycin, 700 μg/mL G418, or 6 μg/mL Blasticidin) was added to each respective well of one replicate plate (the other remained under no selection). Antibiotics selection continued for 2 weeks before analysis with flow cytometry. For flow cytometry, cells were trypsinized, suspended in media then analyzed on a LSRFortessa X-20 or FACSymphony flow cytometers (BD Bioscience). Fifty thousand events were collected each run. Examplary gating strategy is presented in Supplementary Fig. 19. Microscopy images were taken with the iRiS Digital Cell Imaging System (Logos Biosystems). For transfection for the CRISPR experiment in HEK293T, 600 ng of total plasmids, in equal ratios, were mixed with 100 μL of DMEM and 1.5 μL of attractene (QIAGEN), incubated at RT for 10 min then added to each well and incubated overnight. Media was changed 24 h post-transfection. Two days post transfection, cells were split into duplicate plates with media containing doxycycline (2 μg/mL). Three days post transfection, media with doxycycline and antibiotics was added to each respective well of one replicate plate (the other remained under no selection).
Human iPSC culture and nucleofection. Quantitation of proviral copy number in genomic DNA. Proviral copy number was measured using Lenti-X Provirus Quantitation Kit (Takara). To perform the analysis, genomic DNA was isolated from transduced cells with NucleoSpin Tissue Genomic DNA Purification (Takara). Serial dilutions of each gDNA sample was subjected to qPCR amplification alongside dilutions of a provirus control template (provided in kit), which was used to generate a standard curve. Since the viral fragments in gDNA and the control template would be amplified with different PCR sensitivities, the provirus copy number was finally calculated based on the standard curve and correlated with a correction factor (provided in manual by Takara).
Crystal violet assay. After virus infection, cells were seeded at 10-15% confluency into 12-well plates in parallel and cultured in hygromycin selection media. Media was changed every 3 days during selection. Crystal violet staining were applied on day0, day3, day5, day7 as well as day14. If cells were greater than 80% (as in the case of sample 1 of Supplementary Fig. 16) confluent on day7, they were split at a 1:20 ratio. For Crystal violet staining, each well was stained with 500 μl 0.1% crystal violet (Sigma) for 10 min at room temperature, then washed gently with 500 μl DPBS for three times before the photographs were taken with an iPhone.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request. Plasmids created in this study are listed in Supplementary Table 1 with links to webpages for plasmid sharing and GenBank sequence files. The source data underlying Fig. 5 and Supplementary Figs. S1b, S14b as well as raw plot numbers are provided as a Source Data file.