Modular Protein Ligation: A New Paradigm as a Reagent Platform for Pre-Clinical Drug Discovery

Significant resource is spent by drug discovery project teams to generate numerous, yet unique target constructs for the multiple platforms used to drive drug discovery programs including: functional assays, biophysical studies, structural biology, and biochemical high throughput screening campaigns. To improve this process, we developed Modular Protein Ligation (MPL), a combinatorial reagent platform utilizing Expressed Protein Ligation to site-specifically label proteins at the C-terminus with a variety of cysteine-lysine dipeptide conjugates. Historically, such proteins have been chemically labeled non-specifically through surface amino acids. To demonstrate the feasibility of this approach, we first applied MPL to proteins of varying size in different target classes using different recombinant protein expression systems, which were then evaluated in several different downstream assays. A key advantage to the implementation of this paradigm is that one construct can generate multiple final products, significantly streamlining the reagent generation for multiple early drug discovery project teams.

Inteins are the protein equivalent to introns in RNA. As internal sequences in a precursor protein they catalyze a multi-step biochemical reaction that results in their excision from a precursor sequence and ligation of two extein flanking sequences to form a new functional protein. Inteins were initially discovered in the early 1990s in lower organisms. Their potential for protein engineering was quickly realized, which resulted in the subsequent identification of many additional inteins in the coming years. Inteins have been exploited for protein engineering either as split inteins known as protein trans-splicing (PTS) or through expressed protein ligation (EPL), a technique that utilizes genetically engineered inteins to join two protein (peptide) fragments via a native peptide bond [1][2][3][4] . In this study we focused on the latter technique, EPL. Historically EPL has been used to install various site-specific protein modifications to enable detailed analysis of signaling pathways, enzyme regulation, and protein-protein interactions 5,6 . In addition, this technique has been utilized for segmented isotopic labeling for protein NMR 7,8 or as a cleavable purification tag 3,9 . Genetically engineered versions of Sce VMA1 and Mxe GyrA are commonly used for C-term labeling on proteins of interest (POI) 10,11 . The C-term Asn in these inteins has been mutated to Ala blocking the final excision step. This allows the inteins to catalyze a N-S acyl shift at the junction between the C-term of the POI and the N-term cysteine residue of the intein (Fig. 1a) but progress no further. A normal peptide bond can reform unless an external thiol is added to form a more stable thioester. Upon addition of the thiol, the intein is released from the C-term of the POI, and this new thioester is susceptible to attack from an N-term cysteine linked to a respective functional moiety (termed R group) via an amide bond. The resulting thioester undergoes a spontaneous S-N acyl shift to form a stable peptide bond between the POI and the newly attached cysteine peptide/protein fragment (Fig. 1a) 12 .
A standard model for reagent generation has been the creation of multiple constructs for different end uses, with the potential for redesign through successive rounds of trial and error. MPL has the potential to provide a combinatorial platform for one construct -multiple products to support different screening/biophysics platforms, thus reducing timelines for hit identification and subsequent hit qualification. To lay the foundation for this labeling platform, we developed a library of cys-lys dipeptides chemically labeled on the ε-amino group of the lysine and cys-peptide tags. Proteins of various sizes in different target classes were evaluated, with most of the protein targets expressed in E coli. The C-terminal residue of these protein targets is critical for optimal ligation; often an alanine is introduced to improve ligation efficiencies. We demonstrated the feasibility of recombinant expression for the intein fusion proteins in both insect and mammalian expression systems including the secretion of target proteins into the growth media.

Results
As demonstrated in Table 1, irrespective of class, most targets can be successfully labeled through expressed protein ligation. However, the choice of intein, as observed with Ogg1 in Fig. 2, can impact the ligation process. The inefficiencies in the EPL reaction observed with the flagHisOgg1VMA1-CBD construct were not overcome with scouting for optimal reaction conditions. However, switching the VMA intein to the GyrA intein improved ligation efficiencies to >95% (Fig. 2b). LC/MS confirmed the successful ligation of the labeled di-peptides, as well as the non-labeled control generated by classic DTT mediated cleavage (Fig. 2c). Activity analysis of the final purified products showed no differences in catalysis between the EPL generated Ogg1 compared to the control Ogg1 (Fig. 2d).
Hit discovery platforms are supported with chemical labels as well as specific peptide tags. Typically, peptide tags are engineered into the desired expression construct. EPL provides a path to streamline these reagents in a timely manner when classic expression and purification can be challenging. As demonstrated in Fig. 3, the Kelch domain of murine Keap1 was expressed with a C-terminal lanthanide (Ln) peptide 13,14 , HisAviThr murine KEAP1(322-624)-Ln-peptide. The final purified murine KEAP1(322-624)-Ln-peptide was truncated and did not include the lanthanide peptide based on LC/MS analysis (Fig. 3a). Generation of the murine HisThrKeap1(322-624)GyrA-CBD enabled the production of intact murine Keap1 (322-624)-C-Ln-peptide as confirmed by LC/MS (Fig. 3b). This new construct streamlined the generation of murine Keap1 (322-624)-C-10His for SPR studies. Initial SPR assay development for the Kelch domain utilized traditional amine-coupling methods. This SPR assay format was sufficient for the initial hit to lead medicinal chemistry efforts, but as the program transitioned to lead optimization and the compounds became more potent, the initial SPR assay became limited. The throughput of the initial SPR assay decreased as lead optimization of the initial hits increased the binding affinity of the compounds, while also optimizing for slow dissociation rates. Regeneration conditions were unacceptable for Kelch protein stability, requiring a new surface for each SPR experiment. However, the successful generation of murine Keap1(322-624)-C-10His via EPL enabled SPR surface regeneration (Fig. 3c) resulting in a ~10-fold increase in assay throughput (Fig. 3d), and significant cost savings on sensor chips. Murine Keap1(322-624)GyrA-CBD, a single construct, was used to generate 4 different murine Keap1(322-624) proteins containing one of the following C-terminal modifications: biotin, lanthanide binding peptide, fluorescein, or a 10xHis affinity tag.
To demonstrate the utility of MPL in insect cells, we compared the intracellular baculoviral expression of Sirt1(183-664; referred to as mini-Sirt1), fused with either the VMA or GyrA inteins. Unlike Ogg1, the ligation of mini-Sirt1 with both inteins was highly efficient. Final product yields (Table 1) and purity (Fig. 4a) were equivalent with both inteins and the ligation peptides used. Activity analysis confirmed that the C-term labeling by EPL did not impact the deacetylase activity of mini-Sirt1 compared to full-length Sirt1 under the given experimental conditions (Fig. 4b). SPR studies demonstrated that similar binding kinetics could be measured for 3 compounds for both the EPL biotin labeled mini-Sirt1 and mini-Sirt1 biotin labeled via the Avi-tag ( Fig. 4c; data shown for one compound). In parallel, a fluorescein labeled mini-Sirt1 was generated to evaluate on target compound aggregation 15 .
The versatility of MPL was further expanded with the mammalian secreted expression of sCD73, a hydrolase that forms a non-covalent dimer, containing 4 intramolecular disulfide bonds per chain. The previous reported use of FcGyrA from baculoviral secretion to form Fc-small molecule conjugates informed on our selection of GyrA for all tested secretion systems in this study 16 . sCD73 was transiently over-expressed in HEK293 cells using a BacMam encoding a GyrA-CBD fusion protein. The GyrA-CBD did not negatively impact the ability of sCD73 to be secreted into the growth media. The EPL reaction for the intracellular expressed proteins was typically done in the presence of 200 mM MESNA, but this proved detrimental to sCD73 (data not shown). Therefore, the MESNA concentration was reduced to 36 mM to generate biotin and fluorescein labeled sCD73. The labeled CD73 protein(s) migrated as the expected dimer by SEC using a Superdex 200 column, with no observed aggregate peak (Fig. 5a). The EPL fluorescein labeled sCD73 was the superior final product for MST analysis when compared to the NHS-Red labeled sCD73, as evidenced by the improved quality of the binding isotherm and subsequent K d determination (Fig. 5b,c). Furthermore, the fluorescein labeled sCD73 was used to determine the dissociation constant (K d ) for an analogue of CD73's AMP substrate (AMP-CP; Fig. 5d). Again, a single sCD73 (Table 1) intein construct was successfully modified with five different ligands, enabling multiple qualification assays for target MOA, that proved challenging to produce using traditional recombinant protein production methods. The use of inteins and subsequent ligations can both positively and negatively affect final recombinant protein yields when compared to traditional tagging strategies. For example, if we did not use MPL, we were unable to express and purify an intact, recombinant Kelch domain of murine Keap1 fused to the lanthanide peptide. Expression and purification of Ogg1 in bacteria and mini-Sirt1 using baculovirus had similar protein yields when comparing MPL to more traditional recombinant protein production strategies. We were able to obtain yields of 2 mg/g cell paste for FlagHis6-Ogg1 expressed in E. coli. (data not shown) compared to yields of 2.2-2.6 mg/g cell paste using inteins (Table 1). Comparable yields were also seen with mini Sirt1: 1-1.5 mg/g cell paste using inteins (Table 1) and 2 mg/g cell paste with no intein 17 . Two examples of inteins negatively affecting final protein yields were our experience with full-length Sirt1 18 in bacteria and secreted CD73 in a mammalian system. The use of inteins decreased final protein yields by 10-fold in both cases. Hence, the use of the MPL platform is broadly enabling and can be extended across most recombinant expression systems (bacteria, insect, and mammalian), and can even be engineered for secretion into the growth media enabling biopharmaceutical and other higher yield applications.

Discussion
In drug discovery the paradigm has been to generate multiple constructs in the evaluation of multiple hit ID and hit qualification platforms. There is significant trial and error inherent in this process. In addition to multiple construct designs, nonspecific chemical labeling to support these platforms has significant potential to impact protein yield and quality. In this study, we introduce MPL, a powerful, combinatorial approach to quickly and site specifically functionalize protein targets for multiple downstream applications using a single construct. We illustrate the utility of this approach by using 2 inteins (GyrA and VMA) to introduce 7 different modifications to 11 different protein targets, resulting in 34 different functionalized proteins of interest. These functionalized proteins www.nature.com/scientificreports www.nature.com/scientificreports/ were then tested in 8 downstream applications including: biophysical assays (SPR, MST, and compound aggregation), structural biology (protein NMR), biochemical activity characterization, and hit identification (ELT). Compared to traditional approaches for recombinant protein generation, our intein-based MPL platform resulted in the reduction in the number of constructs generated by half, in addition to significant cost and time savings.
This MPL platform of one construct to generate multiple final products to enable drug discovery efforts (Table 1), provides a new, broadly applicable paradigm for combinatorial reagent generation, with plenty of room for continued growth. Expansion of the functionalized di-peptide library presented here, will greatly increase the number of downstream applications and assays that can be used with this approach. Furthermore, during the preparation of this manuscript, Dempsey et al. published a simple method for converting NHS-esters into thioesters which can be used to stoichiometrically modify N-Cys containing proteins 19 . This technique enables N-term labeling of targets sensitive to C-term modifications and can further expand the scope of MPL to include dual modifications at the N-and C-termini. In this scenario, using the 7 modifications explored in this manuscript, a single N-Cys/C-intein construct could yield 56 different protein reagents bearing modifications at either the N-term, C-term, or a combination of both. Dual modification of a single protein, for example with a FRET pair, will enable additional downstream assays, specifically those tailored toward protein dynamics and conformational change.

Recombinant Ogg1 expression, purification, and ligation(s). DNA encoding full length human
Ogg1 with a tobacco etch virus (TEV) cleavage site on the N-term and a C-terminal Ala followed by the VMA intein and chitin binding domain (CBD) was synthesized at Genscript and inserted into pENTR1a vector. pDEST T7 flagHisTEVOgg1VMA-CBD was generated through a GATEWAY LR reaction of the pENTR1a TEVOgg1VMA-CBD with pDEST T7 flagHis6 and transformed into pRR692/BL21STAR(DE3). Freshly transformed cells were scaled in 2X YT Broth with Ampicillin (100ug/ml) and Chloramphenicol (35ug/ml) and induced overnight with 0.1 mM IPTG at 20 °C. FlagHisTEVOgg1VMA-CBD was released from E. coli cell paste by two passes through a Microfluidizer at 10,000 psi, and captured onto chitin resin (New England BioLabs) from clarified lysate supernatant. Chitin resin was thoroughly washed prior to the overnight room temperature incubation with 2 mM C

Keap1 expression, purification, and ligation(s).
The synthesized DNA of the lanthanide peptide 13,14 was inserted between the NotI and XhoI sites of pET15b His avi murine KEAP1(322-624) 21 . pET15b His avi murine KEAP1(322-624) lanthanide was freshly transformed BL21star(DE3) T5R, scaled at 37 °C in LB broth with 1% glucose, and induced for 20 h at 15 °C with 0.5 mM IPTG. HisAviThrombin(Thr)murine KEAP1(322-624) lanthanide was released from the E. coli cell paste by three passes through a Microfluidizer at 12,000 psi, and captured onto Ni NTA Superflow (Qiagen). Murine KEAP1(322-624) lanthanide was released from the Ni NTA Superflow by thrombin, then sized on Superdex 200. DNA encoding murine HisThr Keap1(322-624)GyrA-CBD with an alanine inserted at the C-term of Keap1(322-624) prior to the GyrA was synthesized at Genscript and inserted into pET24b. The pET24b murine HisThrKeap1(322-624)GyrA-CBD was freshly transformed into BL21star(DE3) T5R, scaled at 37 °C in LB broth with 1% glucose, and induced for 20 h at 16 °C with 0.5 mM IPTG. Murine HisThrKeap1(322-624)GyrA-CBD was released from the E. coli cell paste by two passes through a Microfluidizer at 12,000 psi, and captured onto chitin resin from clarified lysate supernatant. Washed chitin beads were incubated with 0.36 mM to 2 mM peptides CYIDTNNDGWYEGDELLA-amide (C-ln-peptide) or CHHHHHHHHHHH (C-10His), respectively (21 st Century Peptides) in 36 mM to 80 mM MESNA at pH = 7.5 for 23 hours at room temperature. murine HisThrKeap1(322-624)peptide in the chitin unbounds was dialyzed to reduce MESNA, digested with thrombin to remove his-tag, and sized on Superdex 200 for the murine Keap1(322-624)C-ln-peptide and Superdex 75 for murine Keap1(322-624)C-10His. SPR for Keap1. SPR of Keap1 was performed in HBSN buffer with 0.005% P20, 1% DMSO, and 0.5 mM TCEP. An NTA chip (GE Healthcare) was conditioned with 0.5 M EDTA (60 s) followed by 10 mM NaOH (30 s). The NTA surface was pre-equilibrated with 0.5 mM NiCl2 (15 s) and Keap1 was captured by injecting a 100 nM solution of murine Keap1(322-624)C-10His (30 s). Compound titrations were run using a 5 point 3-fold dilution single cycle kinetics method with a top concentration of 100 nM. Association and dissociation times were set to 60 and 500 s respectively. Prior to titrating compounds, a series of buffer injections were carried out to serve as blanks. The flowrates for conditioning, Keap1 capture, and compound titrations were 30, 10, and 50 uL/min respectively. To ensure an un-complexed target at the start of each titration, following each compound titration, Keap1 was removed from the surface using EDTA, the surface was pre-treated with NiCl2, and Keap1 was re-captured as described above. IDO1 expression, purification, cleavage and ligation. DNA encoding human IDO1 with a TEV cleavable N-term 6Hisflag tag and C-term alanine followed by GyrA-CBD was synthesized at Genscript and inserted into pet24b vector. The pET24b-HisflagTEVIDO1GyrA-CBD was freshly transformed into BL21(DE3)*, cell growth was carried out at 37 °C in E.coli production broth (LB Broth or Terrific Broth Complete) with carbon source (1% glucose/glycerol), 50 µg/ml kanamycin, and induced for 20 h at 15 °C with 0.5 mM IPTG in broth with The plate was loaded onto an Agilent RapidFire v. 3.4/Sciex 4000 Q-Trap RF-MS instrument for analysis [C18 SPE cartridge, 0.2% w/v formic acid in 100% water as the aqueous eluent and 1 mM ammonium acetate in 80% acetonitrile/ 20% water as the organic eluent]. For data analysis the extracted ion chromatogram areas of both the OAADPr product and D-OAADPr internal standard 18 were recorded. The ratio of OAADPr product and D-OAADPr internal standard were plotted at each assay time point to derive the relative reaction rate for each SIRT1 construct. SPR for Sirt1. Biotinylated Sirt1 samples were diluted to 45 ug/ml and captured onto neutravidivin coated chips, which were generated by amine coupling neutravidin to a CM5 chip (GE healthcare). Compound binding was measured under the following conditions, HBS-N with 5 uM ZnCl2, 1 mM DTT, 0.005% P20, 1% DMSO, using an 8-point dose response titration with a top concentration of 12.5 uM followed by 8 2-fold serial dilutions and one blank. DMSO solvent correction and double referencing were used to correct the data before fitting to a 1:1 kinetic model to obtain the binding (k a ) and dissociation (k d ) rates and equilibrium K d . RIPK1 expression, purification, and ligations. DNA encoding RIPK1(1-294)C34A.C127A.C233A. C240A 25 with an N-term flagHis tag followed by TEV cleavage site and C-term alanine followed by GyrA-CBD was synthesized at Genscript and inserted into pFASTBAC1. Plasmid DNA was transformed into DH10Bac cells, the recombinant Bacmid (white colony) was grown up overnight and the recombinant Bacmid DNA was prepared using Qiagen mini prep kit. SF9 cells were transfected with Bacmid DNA using Fugene transfection reagent to make Baculovirus. Baculoviral infected insect cells were generated and protein was expressed in SF9 cells. Transferrin Receptor (TFR1) residues L122-F760 with a gp67 signal sequence followed by N-term 6hisTEV and C-term GyrA-CBD was synthesized at Genscript and inserted into pFASTBAC1. Plasmid DNA is transformed into DH10Bac cells, the recombinant Bacmid (white colony) is grown up overnight and the recombinant Bacmid DNA is prepped using Qiagen mini prep kit. SF9 cells are transfected with Bacmid DNA using Fugene transfection reagent to make Baculovirus. Baculoviral infected insect cells were generated and protein was expressed in SF9 cells. HisTEVTFR1(L122-F760)GyrA-CBD was captured from conditioned medium by cOmplete His-Tag resin (Roche), eluted with 250 mM imidazole after an extensive resin wash, followed by capture onto chitin resin. The chitin beads were thoroughly washed prior to the addition of 1.4 mM C[K-PEG2-Biotin] in 57 mM MESNA at pH = 7.5. Ligations progressed at room temperature for ~15 hr. Biotin labeled TRF1 released from chitin resin was dialyzed prior to sizing on Superdex 200. CD73 expression, purification, and ligation(s). DNA encoding soluble CD73 truncated after residue S552 26 and tagged on the C-term with flagAGyrA-CBD was synthesized at Genscript and inserted into pHTB-V1mcs3. Plasmid DNA was transformed into DH10Bac competent cells, the recombinant Bacmid (white colony) was grown up overnight and the recombinant Bacmid DNA was prepped using Qiagen mini prep kit. SF9 cells were transfected with Bacmid DNA using Fugene transfection reagent to make P0 Baculovirus. The virus was amplified, and protein was secreted by transduced HEK293 cells. CD73flagAGyrACBD was captured by Chitin resin from the BacMam infected HEK293 conditioned medium treated with 20uM zinc sulfate. The chitin beads were thoroughly washed prior to the addition of 0.9 mM labeled di-peptide (C[K-PEG2-Biotin] or C[K-fluorescein]; 21 st Century Peptides) in 36 mM MESNA at pH = 7.5. Ligations progressed at room temperature for ~24 hr. The sCD73flagACK-label released from chitin resin was dialyzed and concentrated prior to sizing on Superdex 200.