DiB-splits: nature-guided design of a novel fluorescent labeling split system

Fluorogen-activating proteins (FAPs) are innovative fluorescent probes combining advantages of genetically-encoded proteins such as green fluorescent protein and externally added fluorogens that allow for highly tunable and on demand fluorescent signaling. Previously, a panel of green- and red-emitting FAPs has been created from bacterial lipocalin Blc (named DiBs). Here we present a rational design as well as functional and structural characterization of the first self-assembling FAP split system, DiB-splits. This new system decreases the size of the FAP label to ~8–12 kDa while preserving DiBs’ unique properties: strong increase in fluorescence intensity of the chromophore upon binding, binding affinities to the chromophore in nanomolar to low micromolar range, and high photostability of the protein-ligand complex. These properties allow for use of DiB-splits for wide-field, confocal, and super-resolution fluorescence microscopy. DiB-splits also represent an attractive starting point for further design of a protein-protein interaction detection system as well as novel FAP-based sensors.

Split proteins are engineered proteins which can be reconstituted from two or more parts via non-covalent interactions. The central idea is that reconstitution is followed by regain of a specific function which is abolished in the separated parts. Theoretically, many proteins can be divided into such fragments. In practice, the identification of a functional split protein is still nontrivial, although some success in direct evolution-based 1 , as well as computational 2,3 design of split systems has been recently demonstrated.
Split proteins were first employed when using ubiquitin for in vivo protein-protein interaction detection 4 . Successful cleavage of the reporter protein, dihydrofolate reductase, fused to the C-terminal fragment of ubiquitin was happening only when both the C-terminal and mutated N-terminal fragments of ubiquitin were expressed as fusions to a leucine zipper homodimerization domain but not when expressed individually.
Later, this concept was applied to a number of other proteins. Many of them were enzymes like dihydrofolate reductase 5 , ß-lactamase 6 , thymidine kinase 7 , or luciferase 8 . This allows for real time and quantitative analysis of protein interactions in vitro as well as in model organisms. The desire for more user-friendly methods for detecting protein-protein interactions in complex environments and for identification of their precise cellular localization in combination with enormous progress in fluorescent microscopy techniques prompted the creation of fluorescent split proteins. This included split versions of green fluorescent protein (GFP) 9,10 , its differently colored derivatives and homologs [11][12][13] , far-red emitting phytochrome-based fluorescent proteins 14 , or even dual split reporters 15 .
When used for protein-protein interaction detection, spontaneous self-association of split proteins is highly undesirable. Such self-association events will contribute to the false positive signal and decrease the overall sensitivity of the method. However, spontaneously self-complementing fluorescent split pairs were found to be useful. Their usage allows for substantial decrease of the tag size that is required to be fused to the protein of interest. Therefore, it diminishes potential influence of the tag on the protein of interest behavior 12 .
Fluorogen-activating proteins (FAPs) are a group of unrelated proteins capable of binding to non-protein ligands (fluorogens) and increase the fluorescence quantum yield and/or change spectral properties of these ligands. Some of these FAPs like miniSOG 16 , IFP1.4 17 , iRFP 18 , and UnaG 19 find their ligands (flavin mononucleotide, biliverdin, and bilirubin) readily available in mammalian cells. Other FAPs like various dye-binding antibodies 20,21 , FAST 22 , DiBs 23 , and de novo computationally designed mFAPs 24 require an exogenous supply of the chromophore. The latter group of FAPs provides multiple benefits. First, available synthetic molecules show a wide range of

Results and discussion
DiB3 domain-swapped crystal structure. The lipocalin fold contains a single eight-stranded continuously hydrogen-bonded antiparallel β-barrel complemented by an α-helix. This common fold has been observed for other lipocalin protein family members 32 , previously characterized wild type Blc (wtBlc) protein [33][34][35] , as well as another Blc mutant, DiB1, that has been co-crystallized with the M739 (Supplementary Fig. S1) 36 ligand (Muslinkina et al., manuscript submitted). Our attempts to structurally characterize other DiB proteins 23 in apo and bound states resulted in obtaining protein crystals of DiB3 in the apo form at low pH conditions (pH 3.5) which diffracted to 1.6 Å. The asymmetric unit contains only one protein chain. However, it forms a biological assembly (dimer) with a crystallographic symmetry mate. The intertwined dimer is caused by domain swapping: each of the two Blc-like eight-stranded beta-barrel folds is created by the N-terminus of one of two polypeptide chains and the C-terminus of the other (Fig. 1A). As observed in other domain-swapped structures 37 , the overall lipocalin fold is preserved in the domain-swapped structure (Cα rmsd 1.1 Å), except for the region that connects the exchanging parts of the protein (the hinge region, residues 109-113).
In silico analysis. While DiB3 was previously successfully used for in cellulo protein labelling 23 , we proposed that the observed domain swapping was driven mainly by the very low pH conditions of the crystallization buffer rather than the private DiB3 mutations (V74F and L141Q). To further evaluate this assumption, we calculated the interaction energies between N-and C-termini fragments of the wtBlc, DiB1, and DiB3 proteins using Rosetta 38 . Despite the fact that wtBlc protein seems to be slightly more stable, DiB3 is not an outlier ( Supplementary Fig.  S2), suggesting that the introduced mutations are unlikely to cause the domain swapping.
Independent from the reason that caused domain swapping in the crystal, we reasoned that the DiB3 protein as well as other mutants might have two relatively autonomous and stable parts (residues 1 to 108 and 114 to 177). www.nature.com/scientificreports www.nature.com/scientificreports/ Such property is crucial for the successful creation of a split system. Usually designing a split protein involves laborious screening of multiple protein sites in order to select an appropriate cutting point 11,39,40 . In our case, however, the obtained DiB3 structure is pointing to a potential cleavage site.
In vitro evaluation of the proposed split system. First, we tested whether N-and C-termini fragments, created by separation of a protein chain in the hinge region, are indeed capable to form the lipocalin-like structure when brought together. We fused each of two fragments of DiBs1-3 with one of two leucine zipper peptides as shown on Fig. 1B. We assessed, whether the pairs retained their ability to bind fluorogens and increase their fluorescence brightness if co-expressed in bacteria. For this we added the fluorogen M739 ( Supplementary Fig. S1) directly to bacterial suspension, spun down, and examined the pellets under fluorescent microscope. The pellets of all three tested pairs were visibly fluorescent so we proceeded with qualitive assessment (Supplementary Fig.  S3) and further with quantitative characterization of the complexes using purified proteins. Split-Zip system properties. Split-Zip proteins showed properties similar to the "parental" full-length variants upon addition of the M739 fluorogen including binding affinities, fluorescence spectra maxima, and extinction coefficients ( Table 1, Supplementary Figs. S4 and S5). That suggests that when pulled to each other by leucine zippers' interaction the fragments can successfully restore the lipocalin fold. Interestingly, in two out of three cases (DiB1-split-Zip and DiB2-split-Zip) the apparent binding affinity of the proteins to the ligand slightly increased compared to the corresponding full-length proteins. It might be a result of slowing down the dissociation of the chromophore due to steric hindrance caused by the leucine zippers. Another reason might be a conformational restriction of the residues of the formerly highly flexible loop (residues 109-113) 35 which locks side chains in the preferable conformation for ligand binding. The opposite effect, weaker binding seen in the case of the red-shifted DiB3-split-Zip:M739 complex, supports the previously suggested hypothesis of the alternative binding mode of the ligand in that complex (Muslinkina et al., manuscript submitted).
The behavior of a split system in the absence of additional "attracting force" like leucine zippers or other interacting proteins determines the range of its possible applications. If the assembly of the split protein is conditional (fails to self-assemble spontaneously), it can be used for investigation of protein-protein interactions 27,[41][42][43] . On the other hand, in the case of spontaneous self-assembly, the split form of the labelling system allows for a reduction in the size of the labelling tag that needs to be added to a protein of interest. Hence, it minimizes tags' influence on that protein 10,12,44 . Therefore, as a next step we checked the ability of the DiB N-and C-termini fragments to self-assemble. For this we deleted the leucine zippers as shown on Fig. 1C to obtain His-tagged N-termini fragments and untagged C-termini fragments. We co-expressed these new constructs (further referred here as split) and performed immobilized metal ion affinity purification. The affinity tag was present only on the N-termini fragments. Nevertheless, both parts of the split system were co-purified indicating that the assembly occurs spontaneously (Supplementary Figs. S6, S7).
Split system properties. Self-assembling split proteins retain their ability to bind and increase fluorescence of the fluorogen M739 (Table 1, Supplementary Figs. S4 and S5), although in all three cases we observed a somewhat reduced affinity for M739 by a factor of 2-3. This might be caused by an increased flexibility of the split system causing a higher entropic cost of ligand binding. In all but one case the spectral properties of both DiB-split-Zip and DiB-split systems remained similar to the properties of the full-length protein complex: emission and excitation maxima varied by no more than 5 nm. Only DiB3-split excitation maximum was 12 nm shorter than the one of DiB3 ( Supplementary Fig. S4). The reason for this spectral shift has yet to be explored.

Structural analysis of the DiB2-split protein.
To further confirm the recovery of the lipocalin fold by split DiBs, we crystalized the DiB2-split protein. Crystals diffracted to approximately 2 Å and contained three "split" molecules per asymmetric unit ( Fig. 2A). The oligomeric state of the wtBlc has been previously studied and there are some evidence for its existence as a monomer 35 as well as a functional dimer 34 . However, according to our knowledge, the possibility of a trimer formation was never investigated before. According to size-exclusion www.nature.com/scientificreports www.nature.com/scientificreports/ chromatography conducted during purification routine of split proteins there were no signs of oligomerization of any kind ( Supplementary Fig. S7). Additionally we used the Protein Interfaces, Surfaces and Assemblies (PISA) server 45 to assess the biological significance of the observed interfaces between "split" molecules in the trimer. This analysis also suggested that neither of the quaternary structures except for the dimers formed by the N-and C-termini fragments are stable in solution. Thus, we assumed that the observed trimer is solely a result of crystal packing.
The main difference between superimposed monomers from the asymmetric unit is observed in the termini of the two β-strands adjacent to the cleavage site (Fig. 2B). These residues are involved in crystal packing interactions in the DiB2-split crystal. This region, also known as the E/F loop, is capable of adopting multiple conformations even in the full length proteins based on previously obtained structures 35 (Muslinkina et al., manuscript submitted). The residues of the hinge loop and two adjacent β-strands are also responsible for the main difference between wtBlc and DiB2-split monomers (Fig. 2C). The hinge loops of proteins capable of domain swapping are believed to be in an energetically unfavorable conformation in the monomeric state in order to promote domain swapping 37 . The observed differences between structures might be partially caused by releasing the tension in the fold through polypeptide chain cleavage.
Despite these minor structural deviations, the overall lipocalin fold as well as specific intramolecular β-barrel stabilizing interactions, which became intermolecular interactions after the splitting, are well preserved in the DiB2-split structure ( Supplementary Fig. S8). This confirms that the split system is capable of spontaneous correct self-assembly to form the functional protein. This finding aligns with the observed photophysical properties.
In vivo evaluation of the proposed split system. Out of three DiB-split proteins which we characterized in vitro, the DiB2-based split revealed the most favorable properties including high expression levels, stability, and brightness. That is why it was chosen for further evaluation of the DiB-split system for imaging in living cells. For in vivo testing we first created two constructs: fusion proteins of the DiB2 fragments with the blue fluorescent protein TagBFP (TagBFP-splitN 1-109 and TagBFP-splitC 110-177 ). We assessed the behavior of these constructs in separate transfections. Cells transfected with the TagBFP-splitC 110-177 construct alone showed uniform distribution of the blue fluorescence signal throughout cytoplasm and nuclei of the cells (Supplementary Fig.  S9B). Separately expressed TagBFP-splitN 1-109 fusion protein, however, promoted generation of multiple aggregates inside cells ( Supplementary Fig. S9A).
Alternative split point. After inspection of the DiB2-split crystal structure we suggested that the aggregation of the separately expressed splitN 1-109 fragment might be caused by disturbance of multiple key core interactions caused by removal of the next N-terminus β-strand ( Supplementary Fig. S10). We hypothesized that the shift of the split point one β-strand (amino acids 110-125) further to the C terminus of the protein might resolve this problem. We created two new constructs, TagBFP-splitN 1-125 and the complementary TagBFP-splitC 126-177 , and tested their behavior. As previously, expression of the C terminal part (TagBFP-splitC 126-177 ) produced uniformly distributed blue fluorescent signal ( Supplementary Fig. S9D). We also discovered significant improvement in behavior of the separately expressed TagBFP-splitN 1-125 fusion protein. Only cells with very high level of expression showed some residual signs of aggregation ( Supplementary Fig. S9C).
Widefield imaging. Next, we tested the ability of the split system to function in vivo. We examined two pairs of available constructs which can form a full-length protein structure: (1) TagBFP-splitN 1-125 + TagBFP-splitC 126 -177 and (2) TagBFP-splitN 1-125 + TagBFP-splitC 110-177 . Upon addition of the M739 chromophore, recovery of the DiB2-specific fluorescent signal in green channel was observed only in the second pair ( Supplementary Fig. S11). We speculate that this can be caused by somehow compromised integrity of the new splitN 1-125 fragment. For example, because of presence of an alternative conformation of the new C-terminus. Combination of two protein fragments with partial sequence overlap seems to allow for more efficient assembly or/and longer half-time of the functional complex.
We further assessed the performance of DiB2-split self-assemblies in living cells by transient cotransfection of splitN 1-125 and splitC 110-177 fragments in fusion with either histone H2B or vimentin, and conventional  (Fig. 3A). In the green detection channel, the DiB2-split:M739 complex signal was visible predominantly in the nucleus indicating effective attraction of the C-terminal part of the split to the nuclei and successful self-assembly of the split system (Fig. 3B). Similar results were obtained with the second set of fusion proteins (H2B-splitC 110-177 + TagBFP-splitN 1 -125 , Fig. 3C,D). In case of vimentin fusion, the fluorescence signal from assembled DiB2-split colocalized with the signal of TagBFP-labeled C-terminal part of the split (Fig. S12). Therefore, we conclude that considerable fraction of the DiB2-split self-assembles spontaneously in living cells from its freely diffusing halves. www.nature.com/scientificreports www.nature.com/scientificreports/ Super-resolution imaging. We further assessed the performance of the DiB2-split system in single-molecule localization super-resolution imaging setup. Similarly to DiB2 (Supplementary video 1), DiB2-split can be used as a protein-PAINT tag: bursts of fluorescence from individual protein-ligand interactions are clearly detectable (Fig. 3E, Supplementary video 2). Super-resolution reconstruction (Fig. 3G) of labeled vimentin shows compatibility of the DiB2-split with this imaging technique and clear resolution improvement over the widefield image ( Fig. 3F-J). DiB2-split exhibited higher single-molecule brightness than that of DiB2 (median photon counts per single-molecule event equal to 614 and 501, respectively, Fig. 3L), which ensures super-resolution in live cells with stable number of localizations per frame (Fig. 3K) and higher localization precision (14 nm vs 17.8 nm for DiB2, Fig. 3M). Details of vimentin structure as small as ~30 nm were resolved in live-cell protein-PAINT with either DiB2 or DiB2-split ( Supplementary Fig. S13) in short (~1 min) acquisitions.

Conclusions and future directions
In this study, inspired by the obtained domain-swapped crystal structure of the DiB3 protein in its apo state, we designed and characterized a novel DiB-split FAP system. The two fragments of the split proteins were able to spontaneously reassemble fully restoring fluorogen-activating and spectral properties of the "parental" full-length DiBs. Crystallization of one of these proteins, DiB2-split, further corroborate the preservation of the lipocalin fold by the system. The DiB2-split was tested in vivo and was found to give bright and specific fluorescent signal indistinguishable from the one of the full-length protein.
This DiB-split system presents a proof-of-principle demonstration of the potential of the lipocalin scaffold to create a split system. It is immediately applicable as a protein-PAINT 23 label of a smaller size for super-resolution imaging in living cells. The decrease of the tag size provided through the split can diminish influence of the tag on the protein of interest. In vitro data suggest near complete assembly of DiB2 from split fragments, making DiB-splits a feasible replacement for full-length DiBs or fluorescent proteins in cases where the size of the molecular tag matters. Moreover, while DiB-split fluorescence does not require post-assembly chromophore maturation unlike self-assembling fluorescent proteins 10,12 , it can be used for immediate detection of different biological processes such as protein expression and early trafficking events or as a faster reporter of protein solubility.
DiB-splits as well as the parental full-length DiBs have lower signal-to-noise ratio in comparison with FAST family probes. However, FAST localization density decreased rapidly during data acquisition time in single-molecule localization microscopy regime and successful super-resolution imaging using FAST required protocol modifications and usage of oxygen scavengers 46 . Both systems do not require oxygen for their function and might be used in oxygen-deficient systems as it was previously shown with FAST 47 .
DiB-splits would benefit from further optimization of the location of the split point. The original N-terminus fragment of the system (splitN 1-109 ) as well as the elongated one (splitN 1-125 ) can be redesigned for better stability. Moreover, mutagenesis of the new N-and C-termini could increase binding affinities for the fluorogen. In addition, there is potential for the design of a variety of other fluorescent tools. For example, redesign of the intramolecular interface of the DiB-split proteins reported here to promote higher independence of the N-and C-termini fragments can result in a new FAP-based tool for protein-protein interactions detection. Such tool would complement the existing mEos3.2 48 and PAmCherry1-based 49 super-resolution imaging compatible BiFC labels providing additional benefit of fast and oxygen level independent measurements. Spatial proximity of the natural N-and C-termini of the protein makes DiB proteins a promising starting point for the design of DiB-based circularly permuted proteins. While the discovered split point is close to the ligand binding site, such circularly permuted proteins represent a promising starting point for DiB-based biosensors design. Successful permutation might also allow for the creation of a new self-assembling split system with smaller parts analogous to self-complementing split fluorescent GFP11 10 and sfCherry11 12 tags.

Molecular cloning.
Plasmids pMRBad-Z-CspGFP (Addgene plasmid #40730) and pET11a-Z-NspGFP (Addgene plasmid #40729) 50 were a gift from Brian McNaughton and were used to create Blc-split-Zip vectors. First, we amplified N-fragments (residues 11 to 109) and C-fragments (residues 110 to 177) of the DiB mutants 23 as well as leucine zipper peptides with adjacent upstream or downstream portions of the vector from the pMRBad-Z-CspGFP and pET11a-Z-NspGFP plasmids, correspondingly, and the upstream portion of the pET11a-Z-NspGFP plasmid including His-tag. Second, we used the overlap PCR to create DNA fragments containing leucine zipper peptides fused with N-or C-fragments of the DiB mutants flanked by upstream and downstream portions of the vector. These fragments were digested with BamHI and XbaI or NcoI and BsrGI restriction enzymes and ligated in the original vectors. These vectors were further used to create split fragments without leucine zipper peptides. For this we amplified N-fragments of the DiB mutants with adjacent upstream portion of the vector introducing stop codon and BamHI restriction site instead of leucine zipper peptide coding sequence and C-fragments of the DiB mutants introducing start codon and NcoI restriction site instead of leucine zipper peptide. The PCR products were again digested with BamHI and XbaI or NcoI and BsrGI restriction enzymes and ligated in the original vectors.
DiB2-split fusions with H2B, vimentin, and TagBFP were generated by Golden Gate Assembly 51-53 . The resulted constructs' amino acid sequences are provided below. The linker sequences are underlined.   www.nature.com/scientificreports www.nature.com/scientificreports/ first purified using gravity flow columns with TALON metal affinity resin (Clontech) and further purified by size-exclusion chromatography on a HiLoad 16/600 Superdex 75 pg or Superdex 200 pg 10/300 GL column (GE Healthcare) pre-equilibrated with 50 mM sodium phosphate buffer, pH 6.0.

L I K T V E T R D G Q V I N E T S Q H H D D L E G D P P V A T G M A S S P T P P R G V T V V N N F D C K R Y L G T W Y E I A R F D H R F E R GLEKVTAT YSLRDDGGLNVINKGYNPDRGMWQQSEGKAYFTGAPTRAALKVSFFGPFYGGY NVIALDREYSG* >splitC 110-177 -TagBFP MGPFYGGYNVIALDREYRHALVC GPDRDYLWILSRTPTISDEVKQEML AVATREGFDVS K F I W V Q Q P G S G D P P VA T M S E L I K E N M H M K L Y M E G T V D N H H F K C T S E G E G K P Y E G T Q T M R I K V V E G G P L P F A F D I L A T S F L Y G S K T F I N H T Q G I P D F F K Q S F P E GFT WERV T T YED GGVLTATQDT SLQD GCLIYNVKIRGVNFT SNGPVMQKKTLGWEAFTE
Protein concentration calculation. Protein concentrations were estimated using the Bradford dye-binding method-based 54 colorimetric assay (Bio-Rad) and bovine serum albumin standard. Single point absorption measurements (595 nm) were performed using FlexStation 3 microplate reader (Molecular Devices). All measurements were performed in triplicate.
Fluorescence spectra detection. Horiba Jobin Yvon Fluoromax-3 fluorometer was used to detect full fluorescence excitation and fluorescence emission spectra for excitation/emission maxima evaluation.
Quantum yield measurements. Fluorescence quantum yield was measured relative to a known standard keeping all instrumental conditions identical. Previously characterized DiB:M739 complexes as well as free M739 chromophore were used as standards. Absorbance spectra were detected using double-beam Shimadzu UV-1800 UV/Vis spectrophotometer. Fluorescence spectra were measured using Horiba Jobin Yvon Fluoromax-3 fluorometer.
Chromophore binding analysis. Titrations were performed and analyzed as previously described 23 using FlexStation 3 microplate reader (Molecular Devices). In brief, constant amount of the chromophore solution was added to protein solutions of different concentrations. The full fluorescence emission spectra were collected using wavelength close to protein-chromophore complex excitation spectrum maximum wavelength. Fluorescence intensity at complex emission spectrum maximum wavelength was extracted and used to determine apparent dissociation constants (K d ). Diffraction data were collected at the Life Sciences Collaborative Access Team beamline 21-ID-G at the Advanced Photon Source, Argonne National Laboratory. The diffraction data were processed using xia2 software suite 55 . The crystal structures were solved by molecular replacement with MOLREP 56 using the wtBlc structure (PDB ID 1QWD) as a search model. Models building and iterative refinement were performed with Coot 57 and REFMAC 58  Cell culture and transient transfection. HEK293 and HeLa Kyoto cells were grown in Dulbecco's modification of Eagle's medium (DMEM) (PanEco) supplied with 50 U/ml penicillin and 50 µg/ml streptomycin (PanEco), 2 mM L-glutamine (PanEco) and 10% fetal bovine serum (HyClone, Thermo Scientific) at 37 °C and 5% CO 2 . For transient transfections FuGENE 6 reagent (Roche) was used. Immediately before imaging DMEM was replaced with HHBS media (Hanks Buffer supplemented with 20 mM Hepes).

Crystallization, Data Collection
Fluorescence microscopy. Widefield fluorescence microscopy was performed with the Leica DMI6000B inverted microscope, Zyla 5.5 sCMOS camera (Andor), CoolLED pE-300 light source, GFP and BFP filter sets. Single-molecule localization super-resolution imaging of living cells was performed with Nanoimager S (ONI). Imaging in HILO mode was performed with 1.1 kW cm −2 of 488 nm laser light intensity. Typical acquisitions were 10 000 frames taken at a frequency of 30 Hz.
Super-resolution images processing. Localizations during acquisition were detected using NimOS 1.6.1.9898 (ONI, UK). Super-resolution image reconstruction was performed using default values of photon, precision and sigma filters in NimOS. Data analysis was performed using a custom-made Python script. Image resolution was determined by decorrelation analysis plugin 59 .

Data availability
The crystal structures reported in this paper have been deposited to the Protein Data Bank under accession numbers 6UKK and 6UKL. All other relevant data are included with the manuscript.