We generated synthetic protein components that can detect specific DNA sequences and subsequently trigger a desired intracellular response. These modular sensors exploit the programmability of zinc-finger DNA recognition to drive the intein-mediated splicing of an artificial trans-activator that signals to a genetic circuit containing a given reporter or response gene. We used the sensors to mediate sequence recognition−induced apoptosis as well as to detect and report a viral infection. This work establishes a synthetic biology framework for endowing mammalian cells with sentinel capabilities, which provides a programmable means to cull infected cells. It may also be used to identify positively transduced or transfected cells, isolate recipients of intentional genomic edits and increase the repertoire of inducible parts in synthetic biology.
Cys2His2 zinc-finger proteins (ZFs) have long been recognized for their potential as artificial DNA-binding proteins. Various methods, such as partially randomized libraries or single-finger modular assembly1,2, may be used to design ZFs to bind any DNA sequence, enabling a range of applications. For example, when attached to the FokI endonucleolytic domain, ZF pairs can facilitate genomic editing3. Through sequence-enabled reassembly, ZFs joined to each domain of split β-lactamase can function as in vitro diagnostic tools4. Also, fusion to the viral trans-activating domain VP64 generates synthetic transcription factors (TFs) that drive engineered regulatory circuits in yeast or mammalian cells, furthering our understanding of transcriptional cooperativity in eukaryotes5,6. Additionally, ZFs fused to chromatin regulators have been used to investigate epigenetic control, revealing synergistic activation and spatial regulation7.
Whether ZFs or other genome-engineering tools such as transcription activator–like effectors (TALEs) or clustered, regularly interspaced, short palindromic repeats (CRISPR)-Cas are used, the tethered, delivered activity acts locally on the DNA target itself (e.g., endonucleolytic cleavage or transcriptional modulation)8. There are, however, instances in which it is desirable that the detection of a specific endogenous or synthetic DNA sequence trigger a tailored response. Responses could include an amplified report signal that indicates provirus presence, a chromosomal transposition or an intentional genomic alteration. Other desired responses might be immune system recruitment to the site of infection or tumor or kill-switch activation that induces apoptosis to prevent the spread of a pathogen. Accordingly, we sought to forward-engineer a system that could utilize the programmable nature of ZFs for target detection and deliver the recognition signal to an easily modifiable response circuit.
To bridge the gap between DNA recognition and the trans response, we used inteins, which are peptides that splice together their flanking regions, referred to as exteins9,10. In pioneering work on conditional protein splicing, the contiguous Saccharomyces cerevisiae SceVMA intein was artificially split, rendering the separate halves inactive such that they required a condition for colocalization and reassembly11. After one member of the rapamycin-binding heterodimeric pair, FRB-FKBP, was fused to each of the split SceVMA intein-extein halves, rapamycin binding served as a condition for their colocalization12. The repertoire of trans-splice−inducing conditions has expanded and now includes temperature, light and protein-scaffold activation13,14,15. DNA recognition can also induce splicing, as shown in a previous study wherein ZFs were fused to a split intein-luciferase construct and their binding signal was used to measure drug-induced DNA demethylation16.
Here we sought to use DNA-sensing modules as dimerizing domains (Fig. 1). Programming two ZF-based sensors to bind adjacently in the targeted DNA sequence would allow for conditional intein colocalization and trans-splicing of a response factor that activates any gene installed in a response circuit. Below we describe the engineering of a modular sequence-recognition system capable of producing a customizable response signal and demonstrate its ability to mediate sequence recognition–induced apoptosis and virus detection in infected human cells.
Modular ZF sensor design and screening
We used a library of ten tridactyl ZFs designed using oligomerized pool engineering (OPEN) and tested it for intracellular binding (Supplementary Fig. 1 and Supplementary Note 1)1,5. To generate two hexadactyl ZF sensors of high affinity and specificity, we first applied an extended linker that permits finger multimerization without excessive DNA strain and merged two pairs of the tridactyl ZFs17,18. We fused the hexadactyl ZFs to VP64 and transformed the sequence encoding this construct into synthetic TFs. We prepared reporters for each ZF-TF encoding GFP modulated by a single copy of the two original 9–base pair (bp) sites with 0−2 unrelated base pairs between them (Supplementary Fig. 2a). We transiently cotransfected human embryonic kidney (HEK) 293FT or HeLa cells with each ZF-TF and its reporter, assessing intracellular binding by measuring GFP fluorescence via flow cytometry. The hexadactyl ZF-TFs were markedly more active than their tridactyl counterparts (Supplementary Fig. 2b,c). To assess whether the hexadactyl ZFs could overcome potential binding inhibition imposed by intein fusion in a relevant, intracellular context, we attached the C-terminal intein (IC) or N-terminal intein (IN) domains, C-terminally mounted VP64, and retested activation (Fig. 2 and Supplementary Fig. 3a,b). Although binding of the hexadactyl ZF-intein proteins was encumbered by intein fusion, the chimeras nevertheless achieved a much greater output than the tridactyl-intein fusions. We found that a 2×(GGGS)3 linker between the ZF and intein domains yielded higher GFP activation than the 1× or 3× forms, and we used this linker thereafter (Supplementary Fig. 3c−e).
We chose ZF9 (Supplementary Fig. 1b,c) as the split-response factor because it performed well in tested cell lines, and its operator yielded low basal expression. We selected GFP as a reporter for response-circuit characterization. To improve the dynamic range of GFP activation, we multimerized ZF9 operators (Supplementary Fig. 4a). The output signal strength increased with the number of sites, basal expression decreased (in the absence of ZF9-TF, resulting in a change of ∼300-fold using the 6× operator circuit), and there was no cross-activation of the response circuit by direct binding of the sensor components (Supplementary Fig. 4b−d).
Split-extein design, construction and validation
We hypothesized that an effective way to split the ZF9-TF extein would be to separate its DNA-recognition and trans-activation properties, also allowing for the introduction (and retention) of a splice junction (SJ) that could promote splicing without impairing activity of ZF9-TF19,20. In the SJ design, we adopted several extein residues that natively flank both sides of the SceVMA intein or from two previously split proteins (luciferase and TEV protease), reasoning that the intein might successfully ligate the ZFs and VP64 in this context (Fig. 3a and Supplementary Fig. 5a)12,21.
First, we ensured that the retained SJs did not interfere with ZF9-TF's ability to activate GFP expression by expressing the ZF-TF variants in their post-splice forms (Supplementary Fig. 5b). Next, we devised a cis-splicing test platform to gauge whether the intein could splice the split extein and produce a functional TF. We created sequences encoding split-extein domains fused to the corresponding divided inteins, directly linked to one another via a flexible linker (Fig. 3b)22. All three variants underwent successful cis-splicing, resulting in GFP activation (Fig. 3c and Supplementary Fig. 5c). To confirm that cis-splicing yielded a product of the anticipated size, we isolated and immuoblotted protein. Flag-tag and HIS-tag probing of the N and C termini revealed a product that migrated with the ZF9-TF positive control for each of the variants but not for three splice-null mutants, which displayed greater accumulation of the unspliced precursor (Fig. 3d and Supplementary Fig. 5d−f)22. Semiquantitative analysis, by comparison of splice-product signal intensity to that of increasing amounts (volumes) of the ZF9-TF positive control, displayed >20% cis-splice efficiency (Supplementary Fig. 5g).
Intracellular DNA detection and trans-splicing
To form the complete sensors, we created constructs encoding fusions of the three SJ variants of the split extein with their hexadactyl sensing components. We transfected cells with a plasmid encoding the sensor pair, a plasmid carrying the target sequences, and a plasmid containing the response circuit (Fig. 4a). Flow cytometry analysis showed substantial fluorescence output for all three variants, indicating target-sequence detection (Fig. 4b and Supplementary Fig. 6a). Fluorescence measured in the absence of target sequence, in the absence of the N- or C-terminal sensor, or with a C-terminal deletion mutant (lacking the IC domain) was comparably low, suggesting extremely low levels of spontaneous trans-splicing, and confirming earlier reports and that both intact intein domains are required to generate a response (Supplementary Fig. 6b)12. Sensor pair V2 (containing the luciferase SJ) displayed the greatest activity, and we chose it for further characterization. Targets containing 0-, 4-, 8- or 12-bp gaps (nonrelated base pairs) between the two 18-bp binding sites produced response signals, with the contiguous 36-bp site displaying the highest output (Supplementary Fig. 6c). Increasing amounts (nanograms) of target plasmid resulted in a dose-response effect correlating between plasmid amount and signal strength (Fig. 4c).
Sensor binding strength is essential to overcome off-target dilution but could possibly limit the interchange of the post-splice and pre-splice sensors on the DNA. We reasoned that with a moderate decrease in affinity but not specificity, interchange might be improved. We introduced eight R-to-A mutations expected to lower ZF-binding affinity into the ZF backbones and compared output signals of wild-type and mutant sensor pairs23. Equimolar target plasmids containing a gradient of target sequence instances (0−4 and 8) produced a correlative effect for wild-type and mutant sensors, but the output of the latter was higher, suggesting improved interchange (Fig. 4d). We also assessed the difference between a nonreplicative and replicative target and found the signal to be substantially higher in the latter case (Supplementary Fig. 7a,b and Supplementary Note 2). To demonstrate the platform's extensibility, we constructed a second sensor pair (ZF3/ZF2 and ZF4/ZF5) corresponding to a second target sequence, and tested for orthogonality between the two pairs. Each pair detected its specific sequence while ignoring the other's, highlighting the platform's robustness and orthogonality (Fig. 4e).
DNA sequence recognition−induced apoptosis
Gene-directed enzyme prodrug therapy involves the metabolism of a nontoxic prodrug into a cytotoxic form24. One example is the NTR–CB 1954 pair, wherein the toxicity of CB 1954 (5-(aziridin-1-yl)-2,4-dinitrobenzamide) is dependent upon its reduction by a bacterial nitroreductase (NTR), which transforms it into an agent of DNA interstrand cross-linking and apoptosis25. Diffusion into neighboring cells leads to increased potency through a 'bystander' effect. We used this dual-component method to test the system's ability to link sequence recognition with triggered apoptosis. With GFP replaced by NTR in the response circuit and samples grown in prodrug-supplemented medium, NTR expression and the ensuing CB 1954 reduction should, in principle, occur only in the presence of the target sequence26.
We plated HEK293FT cells in medium containing a range of prodrug concentrations and transfected them (at a 50% rate, relative to a control) with either the sensor system containing an 'empty' response circuit or the full system, in the presence or absence of target sequences (Fig. 5a and Supplementary Fig. 8a). We measured apoptosis using flow cytometry after 48 h and 72 h via annexin V–FITC and propidium iodide (PI) staining. Cells with the empty response circuit were not notably affected by the prodrug, and cells transfected with the full system but no target were only moderately affected. However, target detection led to markedly elevated levels of apoptotic cells, displaying a clear trend in line with the drug gradient. Microscopy showed that at 96 h, control populations were excessively confluent (Fig. 5b and Supplementary Fig. 8b). Conversely, target-positive samples at 16 μM or 32 μM CB 1954 displayed severe or complete cell death, providing evidence of sequence recognition and a bystander effect. To demonstrate a dual-output response, we generated a construct encoding GFP-IRES-NTR, which simultaneously showed target reporting and apoptosis (Supplementary Fig. 8c).
Virus detection in infected mammalian cells
To test the system's capability for detecting viral infection, we adopted adenovirus type 5 (Ad5) as a model27. We generated virus with constitutive expression of blue fluorescent protein (BFP) to track infected cells. We prepared two substrains with target sequences of sensor pair 1 or 2 (T1 or T2) embedded into the viral genome adjacent to the packaging signal (ψ) (Fig. 6a). We transfected replication-permissive HEK293FT cells with either of the sensor pairs and then infected them with virus containing T1, T2 or neither target. The results showed that both sensor pairs detected only virus carrying their specific targets (Fig. 6b,c). Finally, we condensed the expression, reporting and marker elements into a single construct. Only cells receiving both the detection system and virus displayed a substantial GFP response (Supplementary Fig. 9). Together, these studies demonstrate the feasibility of using this system for sequence-specific DNA detection in mammalian cells.
Designable DNA-binding proteins have been developed to regulate gene expression in endogenous pathways and insulated exogenous gene networks. Using such proteins, we developed a system that is sufficiently robust to facilitate sequence detection in mammalian cells but is also sufficiently modular to allow for its application in a range of scientific settings, including synthetic biology, diagnostics and basic research.
The recognition stage can be adjusted to detect any target sequence by installation of the appropriate ZF component. Alternate DNA-binding technologies could be tested, including TALEs and the CRISPR-Cas system28. Another option would be to use designer RNA-binding proteins to detect pathogens with RNA-based genomes29. Cells infected with a particular pathogen, retaining viral latency or undergoing carcinogenesis could be culled through the expression of proapoptotic proteins, cell death−activating surface molecules or cytokines for immune-system recruitment30,31,32,33.
This system has the potential to address many research challenges. For example, it could be adapted to identify or rescue only cells that have received a construct during transfection or transduction, using sensors programmed to recognize the vector backbone sequence. Sensors could be standardized, integrated into stable cell lines and paired with specific delivery vectors, streamlining the delivery process and preserving packaging space.
Another possible use of this platform regards the CRISPR-Cas system, which is commonly used to introduce mutations or replace genomic sequences via homology-directed repair8. An intrinsic part of such studies is the identification and isolation of cells containing the edit. PCR or DNA sequencing can be used to measure frequencies but cannot be applied to live cells. Furthermore, to establish a clonal cell line containing the edit, multiple lines must be screened. Instead, cells could be equipped with sensors designed to bind in the insertion or deletion site, allowing for easy identification of cells that have undergone the desired editing on a single-cell basis, showing gain or loss of the report signal. This approach could greatly benefit studies of genome editing, increasing throughput and efficiency.
Similarly, this concept could be used to produce clonal cell lines harboring HIV provirus, which are essential for studying viral latency. Such cell lines are screened through a tedious process that involves infection, serial dilution and viral activation to identify clones that contain the provirus34. The use of a sense-and-respond system programmed to recognize the known proviral sequence and produce a reporter signal, allowing for the rapid identification of positive cells and their subsequent characterization, could markedly shorten this process.
This system could also contribute to studies of chromosomal aberrations leading to cancer and other diseases. In some instances, fluorescence in situ hybridization may be applied, and in others genomic DNA may be subjected to PCR to indicate the presence of chromosomal inversions35. However, PCR-based methods cannot be used to study real-time biology or disclose the aberration frequency in cell populations. Instead, by designing sensors whose proximity would be dependent on the occurrence of a specific chromosomal aberration, researchers could study the biology of living cells and also measure the impact of genetic predisposition or carcinogens.
The study of whole-genome 3D architecture relies on high-throughput chromosomal-confirmation capture techniques (3C, 4C, 5C or Hi-C) to map long-range interactions36. Stages include DNA cross-linking, digestion, deep-sequencing, microarrays and the computational reconstruction of the 3D structure. With appropriate sensor design, a proximity-based DNA sense-and-respond approach could confirm the validity of proposed long-range interactions, in living cells, also providing insight about whole-genome real-time dynamics.
The sensors presented here may also expand the limited range of orthogonal inducible components for synthetic biology researchers, especially in cell-free platforms. Although engineered TFs can bind any sequence, providing limitless operators, the lack of inducible parts constrains the level of complexity and tunability synthetic circuits can achieve. Here each sensor pair corresponds to its own binding site, and numerous pairs can be generated. In the form of short dsDNA duplexes, their binding sites could be used as inducers to control the reconstitution of orthogonal split ZF-TFs, yielding highly tunable and layered gene networks.
General cloning procedures.
Information on the cloning and deposition of plasmids constructed during this study can be found in Supplementary Note 3 and Supplementary Table 1. All PCR-amplification stages used Phusion polymerase (NEB), according to the manufacturer's protocol. All restriction enzymes used throughout this study were type II, NEB enzymes. Alkaline phosphatase (CIP) was used to prevent linear plasmid religation to reduce background (except during the ligation of annealed oligos, described below). Bacterial selection of NEB 10-beta electro-competent cells during cloning was performed on 1% LB-agar plates with 120 μg/ml ampicillin or 50 μg/ml kanamycin.
OPEN ZF library and split SceVMA intein.
The ten tridactyl ZFs used in this study were developed using OPEN, as previously described5 (Supplementary Note 1 and Supplementary Table 2). In all screening, VP64 was C-terminally fused. The SceVMA intein was simultaneously amplified and divided from gDNA purified from S. cerevisiae, as previously described11. IN and IC consist of residues 1−184 and 390−454, respectively, in the full SceVMA sequence. Glycine/serine insolating linkers were introduced by gBlock synthesis (Integrated DNA Technologies (IDT)) (Supplementary Table 3). For split extein and splice junction design, see Supplementary Table 4.
Bacterial strains, overexpression and electrophoretic mobility shift assay.
For cloning, 10-beta electrocompetent E. coli (NEB) cells were used, and for protein overexpression, MG1655 Pro was used. ZF-intein fusion proteins were expressed using the pZE21 plasmid, allowing for kanamycin selection and induction with 100 ng/ml anhydrotetracycline (aTc). ZF5-IC and IN-ZF5 fusion proteins were N-terminally Flag-tagged, extracted with B-Per II (Thermo Scientific), purified with α-Flag M2 affinity gel, and eluted with Flag peptide (Sigma-Aldrich). Purified protein was subjected to electrophoretic mobility shift assay using biotinylated ZF5 DNA duplex (IDT), a streptavidin-HRP conjugate (Thermo Scientific, 21130), chemiluminescent substrate (Bio-Rad) and read with a Gel Logic system (Carestream Mol. Imaging).
Plasmid construction, oligo annealing and ligation (oligo drop).
The reporter constructs and synthetic response circuit were based on pGL4.26 (Promega). Gene encoding luciferase was replaced with that encoding GFP. pVITRO1/MCS/Neo (InvivoGen) was used to express ZF-TFs and sensor chimers in human cells. pBW121 served as a cotransfection marker with mCherry driven by pCAG. It also functioned as a target-carrier plasmid (with and w/o SV40 Ori). All instances in which ZF operator sequences and sensor target sequences were ligated into reporter plasmid or target-carrying plasmid, respectively, were performed via 'oligo drop': single-stranded DNA oligos were designed to contain post-cut overhangs once annealed to their complimentary strand, compatible with ligation into a linearized plasmid. Lyophilized oligos were resuspended in annealing buffer (10 mM Tris-HCl, pH 7.5, and 50 mM NaCl), equal volumes of sense and antisense DNA were mixed at 100 °C, cooled to 24 °C over 60 min, diluted and ligated into linearized plasmid. Tandem repeats of operator or target sequences were directionally cloned in cycles, using oligos designed to contain a full 5′ cut site within which would replace the abolished (by design) 5′ ligation site at cloning cycle.
Cell culture, cell lines and transfection.
The three cell lines used in this study are HeLa (ATCC CCL2), HEK 293FT (donated by members of the Weiss laboratory, MIT) and 293AD (Cell BioLabs). HeLa and HEK 293FT were selected for their convenience as transformed cell lines (also HEK 293FT expresses the SV40 large T-antigen allowing transient replication of plasmids containing the SV40 ori). 293AD has enhanced adhering qualities for Ad5 production. Cells were not tested for mycoplasma contamination or authenticated. Cells were cultured in 4.5 g/l glucose, L-glutamine DMEM (Corning) supplemented with 10% FBS, penicillin 100 I.U./ml, and streptomycin 100 μg/ml. The standard transfection protocol was as follows, briefly: 100,000−200,000 cells were plated in 12-well format and transfected 24 h later with 1−1.5 μg combined DNA using polyethylenimine (PEI) as a transfection reagent. 48 h after transfection, cells were examined using a Nikon Eclipse Ti epifluorescence microscope, trypsinized and prepared for flow cytometry analysis.
Mammalian protein extraction and immunoblots.
48 h after transfection, total protein was extracted using RIPA (Thermo Scientific) from HEK 293FT cells expressing cis-splice test proteins or ZF9-TF, tagged with Flag or 3xHIS at the N or C terminus, respectively. Protein was loaded on a Mini-PROTEAN TGX gel, transferred to a 0.45 μm nitrocellulose membrane using a Trans-Blot SD cell and normalized via α-Tubulin (abcam). Membranes were probed with either α-Flag-M2 (Sigma-Aldrich, F3165) or α-HIS (GenScript, A00174-40) primary antibodies and secondary HRP-conjugated antibodies (Bio-Rad, #1721011 and #1706515), and imaged with a Gel Logic system (Carestream Mol. Imaging).
Sequence recognition-induced apoptosis via CB 1954/NTR.
CB 1954 (Sigma-Aldrich) was resuspended in DMSO to 0.2 M and a working stock (500 μM) was prepared by dilution into DMEM before each experiment. Cells were transfected as described above and supplemented with 0−32 μM prodrug. After 48 h or 72 h, cells were collected, including growth media, and subjected to Annexin V-FITC and PI staining according to manufacturer's instructions (BD Biosciences), followed by flow cytometry. Cells were also analyzed via light microscopy at 96 h.
Adenovirus preparation, titration and infection.
Ad5 was prepared using the RAPAd CMV adenoviral expression system (Cell BioLabs). Briefly, TagBFP was cloned downstream of a CMV promoter as an infection marker. Target sequence (0−8 copies) for sensor pair 1 or 2 was ligated into the viral genome adjacent to the Ad5 ψ. Crude viral extract was prepared via cotransfection of the linearized viral plasmids into a 293AD line (Cell BioLabs), and after collection, was amplified in the same cells and titrated through serial dilution and plaque counting. For transfection/infection experiments, HEK 293FT cells were transfected with the sensor system as described above and after 24 h, infected with virus at a multiplicity of infection of 5. 48 h later, cells were trypsinized, fixed and analyzed by flow cytometry.
Flow cytometry and data analysis.
All transfections were measured with a BD FACSAria II flow cytometer (BD Biosciences). Transfection/infections were measured using a BD Fortessa High Throughput Sampler. Means of fluorescence distributions were calculated with FlowJo. In transfections, GFP arbitrary units were collected from cells gated to mCherry and in transfection/infection experiments, gated to mCherry and BFP fluorescence. All experiments were repeated the indicated number of times and data are displayed either as an average of biological triplicates or duplicates with s.d., or as fold change with error propagation calculations. Owing to experimental variation, although data trends may be compared between experiments, absolute values should be compared within single experiments.
Equipment and settings for blot imaging and microscopy are listed in Supplementary Note 4.
We thank S. Modi for helpful discussions and critical reviews of the manuscript. HEK293FT cells were donated by members of the R. Weiss laboratory (MIT, Cambridge, Massachusetts, USA). This work was supported by funding from the Defense Advanced Research Projects Agency (grant DARPA-BAA-11-23), the Defense Threat Reduction Agency (grant HDTRA1-14-1-0006) and the Air Force Office of Scientific Research (AFOSR-BRI grant FA9550-14-1-0060).
Integrated supplementary information
Supplementary Figures 1–9, Supplementary Tables 1–4 and Supplementary Notes 1–4